IT security, a well-established domain, includes various solutions to protect against unauthorized access to organizational assets such as computers, networks, and data.
A trend of the last two decades brings IT technological advances into the Operational Technology (OT) systems, indicating the systems used in industrial manufacturing and critical infrastructures.
The so-called “IT-OT convergence” tries to provide efficiency in the rather slow and legacy OT systems by reducing their gap with the fast-paced IT world. However, apart from the advances, this convergence suddenly opens OT to a vast and advanced cyber-threat landscape.
The OT space
Starting from the beginning, let us clarify what OT is and why these advances were really needed in this space. OT is a term to indicate the technologies used to operate and orchestrate industrial processes, comprised by sensors as well as field devices as pumps, valves or transmitters.
OT systems were first introduced back in 1960 to be robust and operate continuously for long periods of time. Proprietary and legacy hardware with limited resources (e.g. CPU, memory) in combination with proprietary software were used to program their functionality.
Figure 1- IT and OT system layers
Unlike IT systems, updates for such systems are quite rare, as they cause downtime in the entire production. The presence of legacy technologiesmakes troubleshooting and maintenance extremely challenging for OT operators. Thus, manufacturers started to consider the presence of computers and IT networks to improve visibility of their systems and the remote maintenance. The security risks that came along with the use of IT technologies though were not considered at all. This was mainly due to the proprietary nature of the internal network, preventing unauthorized entities to hijack the communication, as they could not interpret the underlying commands. Let’s evaluate now the validity of this statement by having a closer look into the OT threats.
Along with all benefits, IT/OT convergence made OT devices accessible from the IT network by lateral movement. This was a trigger for malicious entities, as they realized the huge impact they would cause as well as the financial profit and the publicity they would gain. For example, consider a command that switches-off utility substations. An outcome would be a massive-scale blackout in an entire city.
Up until 2010 nobody considered security for OT systems. However, the appearance of Stuxnet would change this entirely. Stuxnet was the first incident on OT systems that destroyed 1000 nuclear centrifuges in the Natanz base (Iran), resulting in a 30% decrease in enrichment efficiency. Lessons learned from the attack indicate the difference with traditional IT threats. First, it is still believed that Stuxnet was built as a cyber-weapon by the U.S. and Israeli governments to derail the Iranian program for developing nuclear weapons. Second, forensics have discovered its signs in Natanz back in 2007, indicating a long period for understanding the environment as well as carefully planning such an advanced attack.
Forensics showed that the attackers were not only able to understand the commands that Siemens PLC controllers were transmitting for controlling the motor frequency of the centrifuges, but also to modify the controller logic while the system was continuously monitored. And since Stuxnet remains undetected until it has achieved its purpose, the insecure OT space has become an imminent attack territory for cyber-criminals.
Figure 2 – OT cyber-attacks following Stuxnet (2010-present)
OT cyber-threat detection
To allow real-time detection of such cyber-threats, analysts suggested several security measures. The most common among them is the so-called “air-gap” that isolates IT and OT systems. In practice, though, Stuxnet revealed that even with the presence of an “air-gap”, the cyber-threat surface is not eliminated. One reason for this is that software patches and updates must be delivered to OT devices, as Programmable Logic Controllers (PLC) or workstations to ensure their error-free functionality.
Cyber-threat detection in OT starts with visibility. Many utilities have limited view on the architecture and services running on their OT system. To address this issue, a new market was initiated for OT Intrusion Detection Systems (IDS). The market started with passive monitoring that relies on listening to network data and trying to reconstruct the network architecture. When passive monitoring is combined with profiling techniques for learning the network data, abnormal behaviors can be intercepted with no interruption on the industrial process.
Then, OT device vendors started to also include complementary security features as encryption through TLS to maintain a low product cost. Previous experience of advanced attacks, though, has shown that encryption can be still bypassed unless it is sufficiently strong. In this case it is also quite performance intensive for the rather resource-constrained OT devices. Its adoption though exposed a major weakness of passive OT monitoring: it requires a considerable amount of unencrypted data and enough time to learn the system behavior. And even then, false positives on the detected incidents are not entirely avoided.
Another detection paradigm that is followed relates to active probing instead of remaining passive. Active techniques try to build a fast network profile by asking OT devices about their characteristics (e.g. vendor, firmware, MAC address) through dedicated messages. Eventually, this turns out to be a mixture of both techniques, as sole active probing is not fully accepted in practice due to the overhead in device performance.
OT incident response
Even when all the necessary security and detection techniques are in place, the main challenge lies on incident response and orchestration. OT threat analysis shall not only focus on suspicious activities (e.g. unauthorized communication, abnormal behavior), but also on the semantics of the industrial processes that are running on critical devices. Taking the Stuxnet case again, though nothing unusual happened on the Siemens PLC or the network level, the motor frequency was still out of operational limits. The challenge becomes greater if we consider that an analyst should also investigate whether the suspicious activities are linked to an accident made by an operator or if it is a real cyber-threat.
A systematic approach to mitigate the increasing amount of advanced OT cyber-threats should start with a solid understanding of the system that needs to be protected. Then, a synergy of security and OT experts along with OT device vendors should join forces for identifying vulnerabilities at the device, system or network level. Finally, complementary security services should be in place to allow faster incident response by ensuring that they introduce minimal overhead on the OT device performance.
All these measures will lead to a Managed Detection & Response (MDR) technique to achieve high security levels for OT environments. OT MDR should start from the detection of alerts/incident indicators in the network traffic, as well as orchestrate the entire phase up to the incident response and recovery in order to maintain an uninterrupted system operation. However, the unmanageable costs in personnel and technologies for achieving this, has led many critical infrastructure owners to outsource their cyber-security strategy to Managed Security Service Providers (MSSPs), instead of developing it internally. OT MSSPs rely on their domain knowledge to assess the customer’s environment and provide 24/7 MDR through security technology integrations, automation-assisted actions and customer interaction relying on pre-agreed engagement rules. The synergy of MSSPs and critical infrastructure owners is a significant step towards the aforementioned approach for mitigation of advanced OT cyber-threats.
Figure 3 – Managed Detection and Response for OT environments