Impact of OT attacks: death, environmental disasters and collapsing supply-chains

February 21, 2022February 21, 2022 Håkon Olsen1 Comment

Securing operational technologies (OT) is different from securing enterprise IT systems. Not because the technologies themselves are so different – but the consequences are. OT systems are used to control all sorts of systems we rely on for modern society to function; oil tankers, high-speed trains, nuclear power plants. What is the worst thing that could happen, if hackers take control of the information and communication technology based systems used to operate and safeguard such systems? Obviously, the consequences could be significantly worse than a data leak showing who the customers of an adult dating site are. Death is generally worse than embarrassment.

No electricity could be a consequence of an OT attack.

When people think about cybersecurity, they typically think about confidentiality. IT security professionals will take a more complete view of data security by considering not only confidentiality, but also integrity and availability. For most enterprise IT systems, the consequences of hacking are financial, and sometimes also legal. Think about data breaches involving personal data – we regularly see stories about companies and also government agencies being fined for lack of privacy protections. This kind of thinking is often brought into industrial domains; people doing risk assessments describe consequences in terms such as “unauthorized access to data” or “data could be changed be an unauthorized individual”.

The real consequences we worry about are physical. Can a targeted attack cause a major accident at an industrial plant, leaking poisonous chemicals into the surroundings or starting a huge fire? Can damage to manufacturing equipment disrupt important supply-chains, thereby causing shortages of critical goods such as fuels or food? That is the kind of consequences we should worry about, and these are the scenarios we need to use when prioritizing risks.

Let’s look at three steps we can take to make cyber risks in the physical world more tangible.

Step 1 – connect the dots in your inventory

Two important tools for cyber defenders of all types are “network topologies” and “asset inventory”. If you do not have that type of visibility in place, you can’t defend your systems. You need to know what you have to defend it. A network topology is typically a drawing showing you what your network consists of, like network segments, servers, laptops, switches, and also OT equipment like PLC’s (programmable logic curcuits), pressure transmitters and HMI’s (human-machine interfaces – typically the software used to interact with the sensors and controllers in an industrial plant). Here’s a simple example:

An example of a simplified network topology

A drawing like this would be instantly recognizable to anyone working with IT or OT systems. In addition to this, you would typically want to have an inventory describing all your hardware systems, as well as all the software running on your hardware. In an environment where things change often, this should be generated dynamically. Often, in OT systems, these will exist as static files such as Excel files, manually compiled by engineers during system design. It is highly likely to be out of date after some time due to lack of updates when changes happen.

Performing a risk assessment based on these two common descriptions is a common exercise. The problem is, that it is very hard to connect this information to the physical consequences we want to safeguard against. We need to know what the “equipment under control” is, and what it is used for. For example, the above network may be used to operate a batch chemical reactor running an exothermic reaction. That is, a reaction that produces heat. Such reactions need cooling, if not the system could overheat, and potentially explode as well if it produces gaseous products. We can’t see that information from the IT-type documentation alone; we need to connect this information to the physical world.

Let’s say the system above is controlling a reactor that has a heat-producing reaction. This reactor needs cooling, which is provided by supplying cooling water to a jacket outside the actual reactor vessel. A controller opens and closes a valve based on a temperature measurement in order to maintain a safe temperature. This controller is the “Temperature Control PLC” in the drawing above. Knowing this, makes the physical risk visible.

Without knowing what our OT system controls, we would be led to think about the CIA triad, not really considering that the real consequences could be a severe explosion that could kill nearby workers, destroy assets, release dangerous chemical to the environment, and even cause damage to neighboring properties. Unfortunately, lack of inventory control, especially connecting industrial IT systems to the physical assets they control, is a very common problem (disclaimer – this is an article from DNV, where I am employed) across many industries.

An example of a physical system: a continuously stirred-tank reactor (CSTR) for producing a chemical in a batch-type process.

Step 1 – connect the dots: For every server, switch, transmitter, PLC and so on in your network, you need to know what jobs these items are a part of performing. Only that way, you can understand the potential consequences of a cyberattack against the OT system.

Step 2 – make friends with physical domain experts

If you work in OT security, you need to master a lot of complexity. You are perhaps an expert in industrial protocols, ladder logic programming, or building adversarial threat models? Understanding the security domain is itself a challenge, and expecting security experts to also be experts in all the physical domains they touch, is unrealistic. You can’t expect OT security experts to know the details of all the technologies described as “equipment under control” in ISO standards. Should your SOC analyst be a trained chemical engineer as well as an OT security expert? Or should she know the details of steel strength decreases with a temperature increase due to being engulfed in a jet fire? Of course not – nobody can be an expert at everything.

This is why risk assessments have to be collaborative; you need to make sure you get the input from relevant disciplines when considering risk scenarios. Going back to the chemical reactor discussed above, a social engineering incident scenario could be as follows.

John, who works as a plant engineer, receives a phishing e-mail that he falls for. Believing the attachment to be a legitimate instruction of re-calibration of the temperature sensor used in the reactor cooling control system, he executes the recipe from the the attachment in the e-mail. This tells him to download a Python file from Dropbox folder, and execute it on the SCADA server. By doing so, he calibrates the temperature sensor to report 10 degrees lower temperature than what it really measures. It also installs a backdoor on the SCADA server, allowing hackers to take full control of it over the Internet.

The consequences of this could potentially overpressurizing the reactor, causing a deadly explosion. The lack of cooling on the reactor would make a chemical engineering react, and help understand the potential physical consequences. Make friends with domain experts.

Another important aspect of domain expertise, is knowing the safety barriers. The example above was lacking several safety features that would be mandatory in most locations, such as having a passive pressure-relief system that works without the involvement of any digital technologies. In many locations it is also mandatory to have a process shutdown systems, a control system with separate sensors, PLC’s and networks to intervene and stop the potential accident from happening by using actuators also put in place only for safety use, in order to avoid common cause failures between normal production systems and safety critical systems. Lack of awareness of such systems can sometimes make OT security experts exaggerate the probability of the most severe consequences.

Step 2 – Make friends with domain experts. By involving the right domain expertise, you can get a realistic picture of the physical consequences of a scenario.

Step 3 – Respond in context

If you find yourself having to defend industrial systems against attacks, you need an incident response plan. This is no different from an enterprise IT environment; you also need an incident response plan that takes the operational context into account here. A key difference, though, is that for physical plants your response plan may actually involve taking physical action, such as manually opening and closing valves. Obviously, this needs to be planned – and exercised.

If welding will be a necessary part of handling your incident, coordinating with the industrial operations side better be part of your incident response playbooks.

Even attacks that do not affect OT systems directly, may lead to operational changes in the industrial environment. Hydro, for example, was hit with a ransomware attack in 2019, crippling its enterprise IT systems. This forced the company to turn to manual operations of its aluminum production plants. This bears lessons for us all, we need to think about how to minimize impact not just after an attack, but also during the response phase, which may be quite extensive.

Scenario-based playbooks can be of great help in planning as well as execution of response. When creating the playbook we should

describe the scenario in sufficient detail to estimate affected systems
ask what it will take to return to operations if affected systems will have to be taken out of service

The latter question would be very difficult to answer for an OT security expert. Again, you need your domain expertise. In terms of the cyber incident response plan, this would lead to information on who to contact during response, who has the authority to make decision about when to move to next steps, and so on. For example, if you need to switch to manual operations in order to continue with recovery of control system ICT equipment in a safe way, this has to be part of your playbook.

Step 3 – Plan and exercise incident response playbooks together with industrial operations. If valves need to be turned, or new pipe bypasses welded on as part of your response activities, this should be part of your playbook.

OT security is about saving lives, the environment and avoiding asset damage

In the discussion above it was not much mention of the CIA triad (confidentiality, integrity and availability), although seen from the OT system point of view, that is still the level we operate at. We still need to ensure only authorized personnel has access to our systems, we need to ensure we protect data during transit and in storage, and we need to know that a packet storm isn’t going to take our industrial network down. The point we want to make is that we need to better articulate the consequences of security breaches in the OT system.

Step 1 – know what you have. It is often not enough to know what IT components you have in your system. You need to know what they are controlling too. This is important for understanding the risk related to a compromise of the asset, but also for planning how to respond to an attack.

Step 2 – make friends with domain experts. They can help you understand if a compromised asset could lead to a catastrophic scenario, and what it would take for an attacker to make that happen. Domain experts can also help you understand independent safety barriers that are part of the design, so you don’t exaggerate the probability of the worst-case scenarios.

Step 3 – plan your response with the industrial context in mind. Use the insight of domain experts (that you know are friends with) to make practical playbooks – that may include physical actions that need to be taken on the factory floor by welders or process operators.

Catching bad guys in your system logs

January 30, 2022 Håkon OlsenLeave a comment

When attackers target our systems, they leave traces. The first place to look is really the logs. Hopefully the most important logs are being collected and sent to a SIEM (security incident and event management) system, but in any case, we need to know how to search logs to find traces of malicious activity. Let’s consider three very common attack scenarios:

• Brute-force attack on exposed remote access port (SSH or RDP)
• Establishing persistence through a cron job or a scheduled task
• Adding accounts or credentials to maintain persistence

Attackers leave footprints from their actions. The primary tool for figuring out what happened on a system, is log analysis.

Brute force

Brute-force attack: an attacker may try to gain access by guessing a password. This will be visible in logs through a number of failed logon attempts, often from the same ip address. If your system is exposed to the Internet, this is constantly ongoing. The attackers are not human operators but botnets scanning the entire Internet, hoping to gain access. An effective way of avoiding this is to reduce the attack surface and not expose RDP or SSH directly on the internet.

For Windows, failed logon attempts will generate event log entries with Event ID 4625. What you should be looking for is a number of failed attempts (ID 4625), followed by a successful attempt from the same ip address. Successful logins have Event ID 4624. You will need administrator privileges to read the Windows logs. You can use the Event Viewer application on Windows to do this, but if you want to create a more automated detection, you can use a PowerShell script to check the logs. You still need that administrator access though.

The Powershell command Get-WinEvent can be used to read Event logs. You can see how to use the command here. https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.diagnostics/get-winevent?view=powershell-7.2

You can also use Get-EventLog if you are on PowerShell 5, but that commandlet is not longer present in Powershell 7.

For attacks on SSH on Linux, you will find entries in the authpriv file. But the easiest way to spot malicious logon attempts is to use the command “lastb” that will show you the last failed logon attempts. This command requires sudo privileges. If you correlate a series of failed attempts reported by “lastb” with a successful attempt found in “authpriv” from the same ip address, you probably have a breach.

*lastb*: The last 10 failed login attempts on a cloud hosted VM exposing SSH on port 22 to the Internet.

Persistence

Let’s move on to persistence through scheduled tasks or cron jobs

The Event ID you are looking for on Windows is 4698. This means a scheduled task was created. There are many reasons to create scheduled tasks; it can be related to software updates, cleanup operations, synchronization tasks and many other things. It is also a popular way to establish persistence for an attacker. If you have managed to drop a script or a binary file on a target machine, and set a scheduled task to execute this on a fixed interval, for example every 5 minutes, you have an easy way to make malware reach out to a command and control server on the Internet.

There are two types of scheduled tasks to worry about here; one is running under the user account. This task will only run when the user is logged on to the computer. If the attacker can establish a scheduled task to run with privileges, the task will run without having a user being logged on – but the computer must of course be in a running state. Because of this, it is a good idea to check the user account that created the scheduled task.

For further details on threat hunting using scheduled task events, see the official documentation from Microsoft: https://docs.microsoft.com/en-us/windows/security/threat-protection/auditing/event-4698. There is also a good article from socinvestigation worth taking a look at: https://www.socinvestigation.com/threat-hunting-using-windows-scheduled-task/.

Cron jobs are logged to different files depending on the system you are on. Most systems will log cron job execution to /var/log/syslog, whereas some, such as CoreOS and Amazon Linux, will log to /var/log/cron. For a systemd based Linux distro, you can also use “journalctl -u cron” to view the cron job logs. Look for jobs executing commands or binaries you don’t know what is. Then verify what those are.

You do not get exit codes in the default cron logs, only what happens before the command in the cron job executes. Exit logs are by default logged to the mailbox of the job’s owner but this can be configured to log to a file instead. Usually seeing the standard cron logs is sufficient to discover abuse of this feature to gain persistence or run C2 communications.

Adding accounts

Finally, we should check if an attacker has added an account, a common way to establish extra persistence channels.

For Windows, the relevant Event ID is 4720. This is generated every time a user account is created, whether centrally on a domain controller, or locally on a workstation. If you do not expect user accounts to be created on the system, every Event ID like this should be investigated. The Microsoft documentation has a long list of signals to monitor for regarding this event: https://docs.microsoft.com/en-us/windows/security/threat-protection/auditing/event-4720.

On Linux, the command “adduser” can be used to add a new user. Creating a new user will create an entry in the /var/log/auth.log file. Here’s an example form adding a user called “exampleuser” on Ubuntu (running on a host called “attacker”).

Jan 29 20:14:27 attacker sudo: cyberhakon : TTY=pts/0 ; PWD=/home/cyberhakon ; USER=root ; COMMAND=/usr/sbin/useradd exampleuser
Jan 29 20:14:27 attacker useradd[6211]: new group: name=exampleuser, GID=1002
Jan 29 20:14:27 attacker useradd[6211]: new user: name=exampleuser, UID=1001, GID=1002, home=/home/exampleuser, shell=/bin/sh

Changing the password for the newly created user is also visible in the log.

an 29 20:18:20 attacker sudo: cyberhakon : TTY=pts/0 ; PWD=/var/log ; USER=root ; COMMAND=/usr/bin/passwd exampleuser
Jan 29 20:18:27 attacker passwd[6227]: pam_unix(passwd:chauthtok): password changed for exampleuser

Summary: we can detect a lot of common attacker behavior just by looking at the default system logs. Learning how to look for such signals is very useful for incident response and investigations. Even better is to be prepared and forward logs to a SIEM, and create alerts based on behavior that is expected from attackers, but not from regular system use. Then you can stop the attackers before much damage is done.

How to discover cyberattacks in time to avoid disaster

January 3, 2022January 3, 2022 Håkon OlsenLeave a comment

Without IT systems, modern life does not exist. Whether we are turning on our dishwasher, ordering food, working, watching a movie or even just turning the lights on, we are using computers and networked services. The connection of everything in networks brings a lot of benefits – but can also make the infrastructure of modern societies very fragile.

Bus stop with networked information boards — Cyber attacks can make society stop. That’s why we need to stop the attackers early in the kill-chain!

When society moved online, so did crime. We hear a lot about nation state attacks, industrial espionage and cyber warfare, but for most of us, the biggest threat is from criminals trying to make money through theft, extortion and fraud. The difference from earlier times is that we are not only exposed to “neighborhood crime”; through the Internet we are exposed to criminal activities executed by criminals anywhere on the planet.

We see in media reports every week that organizations are attacked. They cause serious disruptions, and the cost can be very high. Over Christmas we had several attacks happening in Norway that had real consequences for many people:

Nortura, a cooperative owned by Norwegian farmers that processes meat and eggs, was hit by a cyber attack that made them take many systems offline. Farmers could not deliver animals for slaughtering, and distribution of meat products to stores was halted. This is still the situation on 2 January 2022, and the attack was made public on 21 December 2021.
Amedia, a media group publishing mostly local newspapers, was hit with ransomware. The next 3 days following the attack on the 28th of December, most of Amedia’s print publications did not come out. They managed to publish some print publications through collaboration with other media houses.
Nordland fylkeskommune (a “fylkeskomune” is a regional administrative level between municapalities and the central government in Norway) and was hit with an attack on 23 December 2021. Several computer systems used by the educational sector have been taken offline, according to media reports.

None of the reports mentioned above have been linked to the log4shell chaos just before Christmas. The Amedia incident has been linked to the Printer Nightmare vulnerability that caused panic earlier in 2021.

What is clear is that there is a serious and very real threat to companies from cyber attacks – affecting not only the companies directly, but entire supply chains. The three attacks mentioned above all happened within two weeks. They caused disruption of people’s work, education and media consumption. When society is this dependent on IT systems, attacks like these have real consequences, not only financially, but to quality of life.

This blog post is about how we can improve our chances of detecting attacks before data is exfiltrated, before data is encrypted, before supply chains collapse and consumers get angry. We are not discussing what actions to take when you detect the attack, but we will assume you already have an incident response plan in place.

How an attack works and what you can detect

Let’s first look at a common model for cyber attacks that fit quite well with how ransomware operators execute; the cyber kill-chain. An attacker has to go through certain phases to succeed with an attack. The kill-chain model uses 7 phases. A real attack will jump a bit back and forth throughout the kill-chain but for discussing what happens during the attack, it is a useful mental model. In the below table, the phases marked in yellow will often produce detectable artefacts that can trigger incident response activities. The blue phases of weaponization (usually not detectable) and command & control, actions on objectives (too late for early response) are not discussed further.

Defensive thinking using a multi-phase attack model should take into account that the earlier you can detect an attack, the more likely you are to avoid the most negative consequences. At the same time, the earlier phases give you detections with more uncertainty, so you do risk initiating response activities that can hurt performance based on false positives.

Detecting reconnaissance

Before the attack happens, the attacker will identify the victim. For some attacks this is completely automated but serious ransomware attacks are typically human directed and will involve some human decision making. In any case, the attacker will need to know some things about your company to attack it.

What domain names and URL’s exist?
What technologies are used, preferably with version numbers so that we can identify potential vulnerabilities to exploit?
Who are the people working there, what access levels do they have and what are their email addresses? This can be used for social engineering attacks, e.g. delivering malicious payloads over email.

These activities are often difficult to detect. Port scans and vulnerability scans of internet exposed infrastructure are likely to drown in a high number of automated scans. Most of these are happening all the time and not worth spending time reacting to. However, if you see unusual payloads attempted, more thorough scans, or very targeted scans on specific systems, this can be a signal that an attacker is footprinting a target.

Other information useful to assess the risk of an imminent attack would be news about attacks on other companies in the same business sector, or in the same value chain. Also attacks on companies using similar technical set-ups (software solutions, infrastructure providers, etc) could be an early warning signal. In this case, it would be useful to prepare a list of “indicators of compromise” based on those news reports to use for threat hunting, or simply creating new alerts in monitoring systems.

Another common indicator that footprinting is taking place, is an increase in the number of social engineering attempts to elicit information. This can be over phone, email, privacy requests, responses to job ads, or even in person. If such requests seem unusual it would be useful for security if people would report it so that trends or changes in frequency can be detected.

Detecting delivery of malicious payloads

If you can detect that a malicious payload is being delivered you are in a good position to stop the attack early enough to avoid large scale damage. The obvious detections would include spam filters, antivirus alerts and user reports. The most common initial intrusion attack vectors include the following:

Phishing emails (by far the most common)
Brute-force attacks on exposed systems with weak authentication systems
Exploitation of vulnerable systems exposed to the Internet

Phishing is by far the most common attack vector. If you can reliably detect and stop phishing attacks, you are reducing your risk level significantly. Phishing attacks usually take one of the following forms:

Phishing for credentials. This can be username/password based single-factor authentication, but also systems protected by MFA can be attacked with easy to use phishing kits.
Delivery of attachments with malicious macros, typically Microsoft Office.
Other types of attachments with executable code, that the user is instructed to execute in some way

Phishing attacks create a number of artefacts. The first is the spam filter and endpoint protection; the attacker may need several attempts to get the payload past first-line of defences. If there is an increase in phishing detections from automated systems, be prepared for other attacks to slip past the filters.

The next reliable detection is a well-trained workforce. If people are trained to report social engineering attempts and there is an easy way to do this, user reports can be a very good indicator of attacks.

For brute-force attacks, the priority should be to avoid exposing vulnerable systems. In addition, creating alerts on brute-force type attacks on all exposed interfaces would be a good idea. It is then especially important to create alerts to successful attempts.

For web applications, both application logs and WEF logs can be useful. The main problem is again to recognize the attacks you need to worry about in all the noise from automated scans.

Detecting exploitation of vulnerabilities

Attackers are very interested in vulnerabilities that give them access to “arbitrary code execution”, as well as “escalation of privileges”. Such vulnerabilities will allow the attacker to install malware and other tools on the computer targeted, and take full control over it. The primary security control to avoid this, is good asset management and patch management. Most attacks are exploiting vulnerabilities where a patch exists. This works because many organizations are quite late at patching systems.

A primary defense against exploitation of known vulnerabilities is endpoint protection systems, including anti-virus. Payloads that exploit the vulnerabilities are recognized by security companies and signatures are pushed to antivirus systems. Such detections should thus be treated as serious threat detections. It is all well and good that Server A was running antivirus that stopped the attack, but what if virus definitions were out of date on Server B?

Another important detection source here would be the system’s audit logs. Did the exploitation create any unusual processes? Did it create files? Did it change permissions on a folder? Log events like this should be forwarded to a tamper proof location using a suitable log forwarding solution.

To detect exploitation based on log collection, you will be most successful if you are focusing on known vulnerabilities where you can establish patterns that will be recognizable. For example, if you know you have vulnerable systems that cannot be patched for some reason, establishing specific detections for exploitation can be very very valuable. For tips on log forwarding for intrusion detection in Windows, Microsoft has issued specific policy recommendations here.

Detecting installation (persistence)

Detecting installation for persistence is usually quite easy using native audit logs. Whenever a new service is created, a scheduled task is created, or a cron job, an audit log entry should be made. Make sure this is configured across all assets. Other automated execution patterns should also be audited, for example autorun keys in the Windows registry, or programs set to start automatically when a user logs in.

Another important aspect is to check for new user accounts on the system. Account creation should thus also be logged.

On web servers, installation of web shells should be detected. Web shells can be hard to detect. Because of this, it is a good idea to monitor file integrity of the files on the server, so that a new or changed file would be detected. This can be done using endpoint protection systems.

How can we use these logs to stop attackers?

Detecting attackers will not help you unless you take action. You need to have an incident response plan with relevant playbooks for handling detections. This post is already long enough, but just like the attack, you can look at the incident response plan as a multi-phase activity. The plan should cover:

Preparations (such as setting up logs and making responsibilities clear)
Detection (how to detect)
Analysis and triage (how to classify detections into incidents, and to trigger a formal response, securing forensic evidence)
Containment (stop the spread, limit the blast radius)
Eradication (remove the infection)
Recovery (recover the systems, test the recovery)
Lessons learned (how to improve response capability for the next attack, who should we share it with)

You want to know how to deal with the data you have collected to decide whether this is a problem or not, and what to do next. Depending on the access you have on endpoints and whether production disturbances during incident response are acceptable, you can automate part of the response.

The most important takeaway is:

Set up enough sensors to detect attacks early. Plan what to do when attacks are detected, and document it. Perform training exercises on simulated attacks, even if it is only tabletop exercises. This will put you in a much better position to avoid disaster when the bad guys attack!

safecontrols

Risk management, reliability and security

Tag: dfir