The Need to Change the Paradigm of Control System Cyber Security – Article 2/3: Lack of Control System Cyber Incident Information Sharing
Author: Joe Weiss Managing Partner, Applied Control Solutions, LLC
Visit Article 1/3 Visit Article 3/3
OT cyber security depends on the ability to expeditiously identify cyber incidents/attacks. Yet, that is not happening for technological and other reasons. This paper identifies the issues associated with the lack of identifying and sharing information about control system cyber incidents.
As mentioned in Article 1/3, the definition of a cyber incident in NIST FIPS PUB 200 is: “An occurrence that actually or potentially jeopardizes the confidentiality, integrity, or availability (CIA) of an information system or the information the system processes, stores, or transmits or that constitutes a violation or imminent threat of violation of security policies, security procedures, or acceptable use policies.” This definition is relevant to the control system community with one critical modification: the definition needs to add the letter S (Safety). It is also important to note that the term “malicious” is not mentioned in the NIST definition. Effectively, this is Mission Assurance, which means cyber vulnerabilities are important if they can impact the mission.
The additional reasons for not using the term malicious is the lack of adequate control system cyber forensics as well as lack of sufficient control system cyber security technologies. In many cases, the only difference between an incident being malicious versus unintentional is the motivation of the individual involved. As of November 1, 2020, I have been able to identify more than 1,250 actual control system cyber incidents (actually, more than 12,000,000 when you count the individual cases involved in the diesel cheat scandals – https://www.controlglobal.com/blogs/unfettered/diesel-cheat-scandal-affects-almost-12-million-vehicles-an-industrial-strength-cyber-event/ ).
The incidents are international and cross most physical operations including electric, power, water, oil/gas, chemicals, manufacturing, pipelines, medical device, transportation, etc. There have been more than 1,500 deaths and more than $70BillionUS in direct damages. Many cases have not come from network problems or attacks but from compromises or problems with control system devices. The real safety and reliability impacts come from manipulating physics, not data. Yet, there is almost no control system cyber forensics below the Internet Protocol (IP) level and almost no training for the control system/safety engineers on relevant cyber security issues (e.g., don’t go to questionable sites).
As the Level 0,1 devices identified in Part 1 of this series have no cyber security, authentication, or cyber logging, it would be difficult to identify cyberattacks. Consequently, sophisticated attackers are making cyberattacks appear to be equipment malfunctions as they go unmitigated for cyber issues for months to years. Stuxnet was an example. This begs the question – Will there be a “Cyber Pearl Harbor? The answer is actually in two parts – probably yes, but we may not know it was cyber-related.
DHS ICSCERT and various cyber vulnerability disclosure sites have not provided indications of Level 0,1 device cyber vulnerabilities. The Common Vulnerability Scoring System (CVSS) assesses the severity of computer system security vulnerabilities.
However, this only applies to networks and not to control system field devices. Is it any wonder this can be a prime choice for sophisticated attackers? Targeting control system devices is essentially bypassing the OT cyber security Maginot Line. Until about 2003, control system cyber incidents were either unintentional such as the Olympic Pipeline rupture1 or insider attacks such as the Maroochyshire wastewater hack2. In 2003, as the level of connectivity increased and control systems were moving to Windows-based HMIs, Microsoft-based attacks such as Slammer, Blaster, and Conficker affected control systems. That is, control systems became the unintended “recipients” of Windows-based Denial-of-Service (DOS) attacks.
Until 2007, control system cyber “attacks” and demonstrations were almost all DOS events that did not cause equipment damage. As a result, many in industry simply didn’t take control system cyber security seriously (which unfortunately is still the case). Because of industry’s lack of taking control system cyber threats seriously, the Idaho National Laboratory (INL) conducted the Aurora vulnerability demonstration in March 2007 to demonstrate that a cyber event could cause kinetic damage just like sticks of dynamite. Aurora was a physics-based attack that caused physical damage3. Aurora was a game-changer as it demonstrated that cyber attacks were no longer just DOS attacks like in IT systems, but could be an existential threat to a modern country by damaging long-lead critical equipment that can take months to years to replace4. At this time, cyber attacks like Stuxnet, were still perceived to be so sophisticated they had to be performed by a nation-state.
In a physical-cyber world, threats are not just about protecting the network and consideration must also be given towards protecting the operational systems/process. Arguably, the first targeted cyber attack against control systems was in Maroochyshire, Australia. In 2000, a disgruntled SCADA contractor maliciously hacked the sewage discharge valves, leading to a series of major sewage spills. The valves used in sewage treatment systems are similar to I&C equipment used in other industrial facilities.
I believe there are many people in industry willing to share information about ICS cyber incidents. However, in too many cases, policy decisions will not allow them to do so. The fact is that legal departments too often are afraid that somehow disclosures will make them a target, or that this will be reflected in a lower stock price, or other irrational fears. These fears are irrational because properly done, disclosure should not cause these problems, but help prevent problems. In the 2011 timeframe, I had two engineers attend the ICS Cyber Security Conference to discuss actual ICS cyber incidents on a non-attributed basis5. Because they thought it was so important to share this information with their peers, they came even though their organization would not support their travel expenses.
However, the engineers shared their information in a non-attributable manner (they did not identify their organizations). The conference was in the Washington, DC area but none of the major industry organizations attended nor did NIST yet these organizations seem to be very vocal about the need to share information. In February 2013, NERC issued a lessons-learned report on four incidents. All were clearly cyber incidents but NERC did not to identify any of the incidents as being cyber-related.
The irrational fear of cyber incident disclosure is not only preventing the affected organization from sharing of information, often times even within their own organization, but also external organizations from hearing the information. It also means the security guidance being disseminated and the table-top exercises being required do not reflect what is actually occurring and can actually lead to organizations taking the wrong actions during an actual cyber incident.
Information sharing has been a continuing problem, particularly with ICSs. The issues are not just a reticence to share, but a lack of connecting the dots. Years ago, I was talking to a US Department of Defense (DOD) cyber security subject matter expert who just returned from an international trip. While overseas, he met some government representatives from that country. During one of the discussions the US DOD representative brought up ICS cyber security.
At the mention of ICS, one of the international government representatives mentioned they had suffered malicious cyberattacks against their electric grid in the 2006-2008 timeframe. What made that disclosure so interesting to me was that representatives from that country attended the 2007 and 2008 ICS Cyber Security Conferences. I was surprised this country was attending as I had not contacted them and had no idea why the Conference was important to them. What an interesting way to connect the dots.
On March 12, 2015, DHS’s ICS-CERT issued the ICS CERT Monitor (https://ics-cert.us-cert.gov/sites/default/files/Monitors/ICS-CERT_Monitor_Sep2014-Feb2015.pdf). The ICS-CERT report identified 245 total incidents in 2014. The report broke out the incidents by sector and by access vector. The report stated the majority of incidents had an “unknown” access vector which implies lack of appropriate monitoring.
The report identified the network access vector categorization which made up 62% of the incidents were traditional IT attack vectors which can affect control systems but doesn’t address control system-unique vectors such as unauthorized control system logic changes, unauthorized breaker control, etc.
I had the following questions about the ICS-CERT report data:
- Are there international cases?
- As there were few end-users monitoring their control system networks at the time, how many of the Network Scanning/Probing incidents come from monitoring the control system networks?
- How many of the control system incidents were from field control systems (controllers, sensors, actuators, analyzers, etc)?
- Of the 38% unknown access incidents, how many accessed the field control systems?
- How many of these incidents are from control systems directly connected to the Internet with no cyber security protection?
Arguably the most important question is how many of the control system incidents actually affected facility reliability and/or safety. The report stated that the majority of the incidents were from an “unknown” access vector but the organization was confirmed to be compromised. What does “compromised” mean? Did the compromise affect the reliability and/or safety of the facility? The same issues continue to date. The CS2AI-KMPG Control System Cyber Security Report 2020 was released in November 20206. The survey report is similar to most ICS/OT surveys I have seen in that is almost exclusively focused on OT networks without substantial input from control system or safety engineers. The control system Purdue Reference Model Level 0,1 devices weren’t included in the report even though they could be remotely accessed. The section on control system cyber incidents only addressed network incidents. The section on Impacts of control system incidents didn’t even have a category for physical impacts.
In another case, according to Claroty, more than 70% of ICS vulnerabilities disclosed in the first half of 2020 can be exploited remotely, highlighting the importance of protecting internet-facing ICS devices and remote access connections. The Claroty assessment of 365 ICS vulnerabilities was published by the National Vulnerability Database (NVD). During the first half of 2020,53 venders received 139 ICS advisories issued by the Industrial Control Systems Cyber Emergency Response Team (ICS-CERT)7. ICS vulnerabilities published by the NVD (2019) increased by 10.3% from 331, while ICS-CERT advisories increased by 32.4% from 105. Yet none of these addressed control system devices.
Control System Cyber Incidents
I have been collecting actual control system cyber incidents since 2000. There are so many incidents I could have addressed that have significant implications that I had to select a small sample. Unfortunately, control system cyber incidents continue to be ignored in the focus on OT networks. I encourage you to check out www.controlglobal.com/unfettered.
The criteria used for identifying incidents as control system cyber incidents is the NIST definition – electronic communications between systems or systems and people affecting Confidentiality, Integrity, or Availability.
There are minimal control system cyber forensics and logging for control system field devices and minimal training for Operational personnel to identify control system cyber incidents. Consequently, there are few publicly identified control system cyber incidents. There are common threads to many of the ICS cyber incidents beyond the traditional IT breakdowns given in the ICS CERT8 report. In the 2008-10 time-frame, I was under contract to MITRE supporting NIST to extend NIST800-53 for control systems. As part of that effort, we took three real public cases to demonstrate how the extended NIST 800-53 would be useful to the non-federal government organizations: the Maroochyshire wastewater SCADA attack, the Olympic Pipeline rupture, and the Brown Ferry 3 nuclear plant broadcast storm. All of these cases are in my book9 .
The Olympic Pipeline case is very similar to the 2010 Pacific Gas & Electric (PG&E) San Bruno natural gas pipeline rupture in many ways. Both involved SCADA maloperation, both killed people, and both led to the bankruptcies of their companies. From a control system (cyber) perspective, the following table demonstrates the commonalities:
Consequently, there is a need to connect the dots and provide guidance to industry. As these two cases weren’t viewed as malicious cyberattacks, they have been largely ignored by the cyber security community.
From a statistical perspective, 1,250 incidents over 20 years may not make up a statistically significant sample size. Therefore, it may not be possible to identify statistically significant trends or frequency. What can be said are control system cyber incidents continue to occur in industries globally.
The impacts from these incidents range from trivial to significant environmental damage to significant equipment damage to significant equipment/facility down-time to wide-spread electric outages, to deaths. It is not always evident which incidents are malicious and which are unintentional. However, it is the impacts that are important.
There is a need to use the knowledge from previous control system cyber incidents when developing cyber forensics and monitoring technologies, cyber security technologies, training, and to adjust requirements such as the NERC CIPs, US NRC Regulatory Guide 5.71/NEI-0809, and CFATS to address what has actually been happening.
Dragos has stated the 2016 Crashoverride/Industroyer malware used in the 2016 cyber attack marked the first time that engineering expertise was used in developing cyberattack methodologies because of the use of SCADA-specific protocols. In reality, it was Aurora and Stuxnet in the 2007-08 timeframe that marked the sea change in control system cyber security threats.
In Aurora and Stuxnet, the attacks were designed to physically damage equipment based on engineering “weaknesses”. The cyber methodologies were utilized based on what was needed to accomplish the engineering goals. Consequently, the cyber tools necessary could range from trivial to zero days depending on the functional need. This is different than the IT/OT approach of assessing network vulnerabilities. However, this engineering approach is still not well-appreciated and this is deadly dangerous.
In August 2008, the Siemens International User Group meeting had an INL presentation on hacking Siemens PLCs. It identified the PCS7 cyber vulnerabilities were exploited by Stuxnet but the attendees didn’t recognize the implications of the presentation (the presentation was originally on the Internet but removed). In the 2009-10 timeframe, Stuxnet was damaging centrifuges in Iran. For the first year, the damage was thought to be centrifuge malfunctions, not a cyberattack. The damage occurring in the banks of centrifuges was audibly obvious but cyber threats were not viewed as the cause until the July 2010 timeframe.
Following Stuxnet becoming public, control system cyber security changed dramatically as many in the attacker community which previously paid little attention to control systems (they were focused on stealing money and fame), pivoted to control systems. Metasploits (hacking tools) were developed for the major control system platforms and made available over the Internet. It no longer took the sophistication of a nation-state to attack control systems.
In the 2012-14 timeframe, control system supply chain attacks were attacked to use as “back doors” into the end-users’ control systems. Vendors compromised by the supply chain attacks included Telvent, Siemens, GE, and others. It has not been publicly documented how deep into the supply chain the hacks were occurring.
The June 2017 Triton attack on the Triconex safety systems (Triconex is Schneider-Electric equipment with Windows Operating systems) in a Saudi Arabian plant was a game changer in a number of ways. It was an attack against safety systems meaning the intent was to blow up the plant and kill people not cause a DOS. The plant tripped because of malware yet it was not detected as a cyberattack but as a malfunction. The Triton attack also demonstrated that sophisticated cyberattacks could be carried out against any control or safety system supplier (Stuxnet was against Siemens control system equipment with Windows operating systems).
The Triton attack demonstrated that neither control system cyber security nor process safety policies, procedures, or standards were adequate. Additionally, it raised the question as to how many “successful” control system cyberattacks are occurring if cyberattacks can be misidentified as malfunctions. Facility operators are trained to trust their sensors/displays. As both Stuxnet and Triton compromised Windows-based networks and associated operator displays, the need for out-of-band sensor monitoring systems becomes critical to have an independent, uncompromised view of the process.
The Triton attack was only against the safety system and needed a separate attack against the plant Distributed Control System (DCS) to be effective. This certainly raises the question as to whether integrated control and safety systems can be demonstrated to be cyber secure and safe when it only takes one attack to cause a catastrophic safety situation. The Triton attack also demonstrates the culture gap between IT/OT and Engineering/Safety is wide and growing.
Inability to detect cyber attacks
Network monitoring and threat detection were not sufficient to detect the 2017 Triconex cyberattack in Saudi Arabia. Luck and some mistakes kept the petrochemical plant from a dangerous explosion. Mistakes included the attackers’ inadvertently tripping the plant twice – in June and then again in August. (a plant is said to “trip” when it unintentionally ceases production.)
The focus on the analysis of the Triconex cyberattack, including those by US National Laboratories, were on the malware found in the safety systems during the August 2017 outage. However, the plant initially tripped in June 2017 – 2 months before the August 2017 outage when the malware was discovered.
The June plant tripped was caused by an Emergency Shutdown (ESD) controller. The plant DCS did not reflect unsafe conditions (the DCS doesn’t monitor cyber threats). The vendor (Schneider) was called to investigate and removed the affected ESD controller for analysis.
The ESD controller logs and diagnostics (physical not cyber) were checked and no anomalous conditions were found. There were safety alarms indicating the ESD controllers were in the “Program” mode. That was not a safety issue so the alarms were essentially ignored by the operators. As a controller in the “Program” mode is not a communication issue, it would not have been identified in the security/communication logs either. Additionally, mechanical testing found the controller to be fully functional. Consequently, the engineers considered it to be an unintentional malfunction and operations were restored.
There was no mention of any possible cyber security involvement in the June incident (cyber monitoring did not detect anomalous conditions). However, the controller tripped because of the Triconex system malware (even though the attackers didn’t want the plant to trip). The malware was missed even though the plant tripped!
Apparently, there were many red flags about the June 2017 incident. Specifically, not identifying the June trip as possibly being cyber-related was a missed opportunity that gave the attackers two additional months of unimpeded time to tune the attack tools. If the plant hadn’t tripped in August, it is possible the cyber compromise of the safety systems would not have been identified until it was too late. This has significant ramifications for the cyber security regulations that assume that cyberattacks will be expeditiously detected.
The culture gap between the networking organizations (whether IT or OT) and plant engineering is common and being reinforced by the continued discussions of IT/OT convergence. That is because the plant engineers and vendor staff who analyzed the controller and responded to the HMI alarms are NOT OT but engineering/Operations – and there is a BIG difference!
Consider the similarities with the Triconex cyberattack and Stuxnet. Both Stuxnet and the Triconex attacks compromised the Windows HMIs and engineering workstations.
For months, the centrifuges were being mechanically damaged with no apparent indication of anything but mechanical design problems. That is, the culture gap between the engineers and the cyber security organizations enabled the damage to continue for months until Stuxnet was “discovered”.
In Ralph Langner’s treatise, “To Kill a Centrifuge”10, Ralph asked if Stuxnet can be used as a blueprint for copycat attacks. I think you are seeing that blueprint followed in the Triconex attack. The Triconex attack demonstrated that hacking control/safety systems and controllers was not a Siemens-unique problem but an attack mechanism against control/safety systems regardless of vendor. Both Stuxnet and Triconex demonstrated the need for an out-of-band monitoring solution that would not be compromised by compromising Windows and the IP networks.
There were a number of very significant implications from the June 2017 plant trip:
- Relying on Windows for safety critical applications is questionable at best.
- The ability to identify a cyber attack is critical to the cyber security regulations in NERC CIPs and NEI-0809/Regulatory Guide 5.71 for nuclear plants. Both assume that cyber attacks can be detected which turned out to be a wrong assumption. This issue isn’t confined to nuclear plants, either.
Considering that Triconex safety systems are used for safety applications in nuclear plants and burner management in fossil power plants, how can you meet nuclear and fossil plant security and safety requirements if you can’t recognize a cyber attack? The same question can be asked of Safety Integrated System (SIS) standards such as ISA84/IEC 61511 in the process industry. Sophisticated cyber attacks against control and safety systems can impact any control and safety system vendor. Potentially, they might not be identified. The dependence on OT network monitoring to detect malware proved to be inadequate.
As sophisticated malware may be able to circumvent malware detection capabilities, real-time sensor health monitoring can be used as a systems-integrity check to understand if upset conditions are a malfunction or a possible cyber attack. A safety system can be, and has been, compromised to cause damage and death. In order for a hack to be successful, the attack might well suppress alarms that would indicate a safety condition’s approach. Such alarms are part of the compromised HMI. An out-of-band sensor monitoring program (monitoring the raw electrical signals BEFORE they become Ethernet packets) would provide confirmation that dangerous conditions are approaching so the operator can take manual actions if the safety system hasn’t already done so. An out-of-band monitoring system could also provide confirmation that the dangerous conditions have been mitigated.
- The lack of cyber security training of control system/plant engineers contributed to the lack of identifying upset conditions as possibly being cyber-related. In 2015, I supported the International Atomic Energy Agency (IAEA) on scenario-based training for engineers to be able to a recognize non-IP network-related upset conditions as possibly being cyber-related.
- Unfortunately, the lack of coordination / cooperation between engineering / Operations and cyber security/networking is alive and well. While at the Cyber War Games at the Naval War College in 2017, I met the senior director of physical security from a major utility. He assured me cyber security was not an issue because he met with the senior director of cyber security every day. However, when I asked how often he talked to the VP Power Production or VP Power Delivery his question was why? There was simply no thought that a trip of a power plant or the lights going out because of relays in the substation could be cyber-related. The culture gap between security/networking and Engineering must be addressed.
- Alarm management currently tends to reside in separate organizations and often even is separate buildings – Security Operation Centers (SOCs) and plant control rooms, for example. There is a need to coordinate SOC and network/security logs with equipment monitoring. Currently, there are alarms that are important to both Operations and security.
- However, they are not shared as noted by the alarms from the ESD controller being in the “Program” mode. Alarm management becomes a bigger issue as process sensors become smarter and more configurable. In this case, security and operations alarms can be of value to both security and Engineering or keep one side from understanding the true conditions.
- Sophisticated cyber attacks can be misidentified as malfunctions. This brings up the need for sensor health monitoring at the physics level which is an independent view of process conditions from the potentially compromised IP networks. The current focus on IT/OT convergence rather than reaching out to engineering will continue to lead to “blind spots” when it comes to detecting sophisticated cyber attacks such as Stuxnet, Triconex, and the hardware backdoors installed in facility equipment.
What should be done
- Recognize that control system and operation issues are critical and need to be addressed.
- Revise safety, security, and alarm management standards. This is long term and requires major coordination issues between industries, equipment manufacturers, and others.
- Develop process sensor health monitoring at the sensor level.
- Develop cross collaboration between networking and engineering. This can be done on a shorter tern but requires education of senior management.
1 Abrams, Marshall, Weiss, Joe, “Peril the Pipeline”, Intech, June 2008, https://www.isa.org/standards-and-publications/isa-publications/intech-magazine/2008/june/system-integration-peril-in-the-pipeline4 https://www.isa.org/templates/two-column.aspx?pageid=124560
2 Abrams, Marshall and Weiss, Joe, “Malicious Control System Cyber Security Attack Case Study: Maroochy Water Services, Australia”, MITRE Technical papers, August 2008
3 Staged Cyber Attack Reveals Vulnerability in Power Grid, https://video.search.yahoo.com/search/
4 Swearing, Michael, Brunasso, Weiss, Joe, and Huber, Dennis, Power, September 2013, “What You Need to Know (and Don’t) About the Aurora Vulnerability
5 https://www.controlglobal.com/blogs/unfettered/the-fallacy-of-not-sharing-ics-incident-information, 10/15/13.
6 CS2AI-KMPG Control System Cyber Security Report 2020, https://4cc59207-5dd9-460d-ace978e4f78ebca4.filesusr.com/ugd/6d64a8_b86aeb43fb364fef8ee2a14bffbd57a1.pdf
7 “Most ICS vulnerabilities disclosed this year can be exploited remotely”, Industry News, August 20, 2020, https://www.helpnetsecurity.com/2020/08/20/ics-vulnerabilities-exploited-remotely/
8 https://www.controlglobal.com/blogs/unfettered/actual-domestic-and-international-ics-cyber-incidents-from-common-causes, 2/8/15
9 Weiss, Joseph, Protecting Industrial Control Systems from Electronic Threats, Momentum Press, May 2010, ISBN: 978-1-60650-197-9
10 Lagner, Ralph, To Kill a Centrifuge, https://www.langner.com/wp-content/uploads/2017/03/to-kill-a-centrifuge.pdf
The Need to Change the Paradigm of Control Systems Cyber Security