One of the vulnerabilities that are really easy to exploit is when people leave super-sensitive information in source code – and you get your hands on this source code. In early prototyping a lot of people will hardcode passwords and certificate keys in their code, and remove it later when moving to production code. Sometimes it is not even removed from production. But even in the case where you do remove it, this sensitive information can linger in your version history. What if your app is an open source app where you are sharing the code on github? You probably don’t want to share your passwords…
Getting this sensitive info out of your repository is not as easy as deleting the file from the repo and adding it to the .gitignore file – because this does not touch your version history. What you need to do is this:
Merge any remote changes into your local repo, to make sure you don’t remove the work of your team if they have commited after your own last merge/commit
Remove the file history for your sensitive files from your local repo using the filter-branch command:
Although the command above looks somewhat scary it is not that hard to dig out – you can find in the the Github doc files. When that’s done, there’s only a few more things to do:
Add the files in question to your .gitignore file
Force write to the repo (git push origin –force –all)
Tell all your collaborator to clone the repo as a fresh start to avoid them merging in the sensitive files again
Also, if you have actually pushed sensitive info to a remote repository, particularly if it is an open source publicly available one, make sure you change all passwords and certificates that were included previously – this info should be considered compromised.
There are two big trends in the power utilities business today – with opposing signs:
Addition of micro-producers and microgrids, making consumers less bound to the large grid operators
Increasing integration of power grids over large distances, allowing mega-powerplants to serve enormous areas
Both trends will have impact on grid resilience; the microgrids are usually connected to regional grids in order to sell surplus power, and the mega plants obviously require large grid investments as well. When we seek to understand the effect on resilience we need to examine two types of events:
Large-scale random event threatening the regularity of the power transmission capability
Large-scale attack by SCADA hackers that knock out production and transmission capacities over extended areas
We will not perform a structured risk assessment here but we will rather look at some possible effects of these trends when it comes to power regularity and (national?) security.
Recent events that are interesting to know about
Mega-plants and increasing grid integration
Power plants are in the wind, literally speaking. The push for renewables to come to the market is giving concrete large-scale investments. Currently we are seeing several interesting projects moving ahead:
Fosen Vind: Europe’s largest onshore wind park under construction in central Norway with a total capacity of 1000 MW: https://en.wikipedia.org/wiki/Fosen_Vind. This wind park requires investments in the core cable for the grid, and coincides with increased capacity in transfer lines between Scandinavia and continental Europe (Germany and the Baltics)
In addition to this, we see that NERC, the American organization responsible for the reliability of the power grids in the United States, Canada and parts of Mexico are working to include Mexico as a full member. This will very likely lead to increased integration of the power transmission capacities across the U.S.-Mexico border, at least at the organizational and grid management levels.
Random faults and large-scale network effects
What happens to the transmission capacity when random faults occur? This depends on the redundancy built into the network, and the capacities of the remaining lines when one or more paths fail. As more of the energy mix moves towards renewables we are going to be even more dependent on a reliable transmission grid; renewable energy is hard to store, and the cost of high-capacity storage will add to the energy price, making renewable sources less competitive compared with fossil fuels.
If we start relying on mega plants, this is also going to make us depend more on a reliable grid. The network effects would have to be investigated using methods like Monte Carlo simulations (RAM analysis) but what we should expect is:
Mega plants will require redundancy in intercontinental grid connections to avoid blackouts if one route is down
Areas without access to base load energy supply would be more vulnerable than those that can supply their own energy locally
Prices will fluctuate over larger areas when energy production is centralized
Micro-grids and micro-production should alleviate some of the increased vulnerability for small consumers (like private households) but are unlikely to be an effective buffer for industrial consumers
Coordinated cyber warfare campaigns
Recent international events have brought cyber warfare to the forefront of politics. Recently it was suggested at the RSA conference that deterrence through information sharing and openness does not work, and we are not able to deny the intrusion of state sponsored hackers, so we need to respond in force to such attacks, including armed military response in the physical world.
Recent cyberattacks in this domain have been reported from conflict zones. The reports receiving the most attention in media are those coming out of the Ukraine, where the authorities have accused Russia to be responsible for a series of cyber-attacks, including the one causing a major blackout in parts of Ukraine in December 2015. For a nice summary of the Ukrainian situation, see this post on the cybersecurity blog from SANS.
Increasing cooperation across national borders can increase or resilience but at the same time it will make effects of attacks spread to larger regions. Depending on the security architecture of the network as a whole, attackers could be able of compromising entire continents, potentially damaging the defense capabilities of those countries severely as population morale is hit by the loss of critical infrastructure.
What should we do now?
There are many positive outcomes of increased integration and very large renewable energy producers – but we should not disregard risks, including the political ones. When building such plants and the grids necessary to serve customers we need to ensure sufficient redundancy exists to cope with partial fallouts in a reasonable manner. We should also build our grids such that we have a robust security architecture, with auditable rules to ensure security management is on par across borders. This is the strength of NERC. Cyber resilience considerations should be made also for other parts of the world. Perhaps it is time to lay the groundwork for international conventions on grid reliability and security before we end up connecting all our continents to the same electrical network.
Sometimes we talk to people who are responsible for operating distributed control systems. These are sometimes linked up to remote access solutions for a variety of reasons. Still, the same people do often not understand that vulnerabilities are still found for mature systems, and they often fail to take the typically simple actions needed to safeguard their systems.
For example, a new vulnerability was recently discovered for the Siemens Simatic CP 343-1 family. Siemens has published a description of the vulnerability, together with a firmware update to fix the problem: see Siemens.com for details.
So, are there any CP 343’s facing the internet? A quick trip to Shodan shows that, yes, indeed, there are lots of them. Everywhere, more or less.
Now, if you did have a look at the Siemens site, you see that the patch was available from release date of the vulnerability, 27 November 2015. What then, is the average update time for patches in a control system environment? There are no patch Tuesdays. In practice, such systems are patched somewhere from monthly to never, with a bias towards never. That means that the bad guys have lots of opportunities for exploiting your systems before a patch is deployed.
This simple example reinforces that we should stick to the basics:
Know the threat landscape and your barriers
Use architectures that protect your vulnerable systems
Do not use remote access where is not needed
Reward good security behaviors and sanction bad attitudes with employees
Create a risk mitigation plan based on the threat landscape and stick to it practice too
IEC 61511 is undergoing revision and one of the more welcome changes is inclusion of cyber security clauses. According to a presentation held by functional safety expert Dr. Angela Summers at the Mary Kay Instrument Symposium in January 2015, the following clauses are now included in the new draft – the standard is planned issued in 2016:
Clause 8.2.4: Description of identified [security] threats for determination of requirements for additional risk reduction. There shall also be a description of measures taken to reduce or remove the hazards.
Clause 11.2.12: The SIS design shall provide the necessary resilience against the identified security risks
What does this mean for asset owners? It obviously makes it a requirement to perform a cyber security risk assessment for the safety instrumented systems (SIS). Such information asset risk assessments should, of course, be performed in any case for automation and safety systems. This, however, makes it necessary to keep security under control to obtain compliance with IEC 61511 – something that is often overlooked today, as described in this previous post. Further, when performing a security study, it is important that also human factors and organizational factors are taken into account – a good technical perimeter defense does not help if the users are not up to the task and have sufficient awareness of the security problem.
In the respect of organizational context, the new Clause 11.2.12 is particularly interesting as it will require security awareness and organizational resilience planning to be integrated into the functional safety management planning. As noted by many others, we have seen a sharp rise in attacks on SCADA systems over the past few years – these security requirements will bring the reliability and security fields together and ensure better overall risk management for important industrial assets. These benefits, however, will only be achieved if practitioners take the full weight of the new requirements on board.
Reliability engineers have traditionally focused more on hardware than software. There are many reasons for this; one reason is that traditionally safety systems have been based on analog electronics, and although digitial controls and PLC’s have been introduced throughout the 1990’s, the actual software involved was in the beginning very simple. Today the situation has really changed, but the focus in reliability has not completely taken this onboard. One of the reasons may be that reliability experts like to calculate probabilities – which they are very good at doing for hardware failures. Hardware failures tend to be random and can be modeled quite well using probabilistic tools. So – what about software? The failure mechanisms are very different – as failures in hardware are related to more or less stochastic effects stemming from load cycling, material defects and ageing, software defects or completely deterministic (we disregard stochastic algorithms here – they are banned from use in safety critical control system anyway).
Software defects exist for two reasons: design errors (flaws) and implementation errors (bugs). These errors may occur at the requirement stage or during actual coding, but irrespective of the time they occur, they are always static. They do not suddenly occur – they are latent errors hidden within the code – that will active each and every time the software state where the error is relevant is visited.
Such errors are very difficult to include in a probabilistic model. That is why reliability standards prescribe a completely different medicine; a process oriented framework that gives requirements to management, choice of methods and tools, as well as testing and documentation. These quality directed workflows and requirements are put in place such that we should have some confidence in the software not being a significant source of unsafe failures of the critical control system.
Hence – process verification and auditing take the place of probability calculations when we look at the software. In order to achieve the desired level of trust it is very important that these practices are not neglected in the functional safety work. Deterministic errors may be just as catastrophic as random ones – and therefore they must be managed with just as much rigor and care. The current trend is that more and more functionality is moved from hardware to software – which means that software errors are becoming increasingly important to manage correctly if we are not going to degrade both performance and trust of the safety instrumented systems we rely on to protect our lives, assets and the environment.
Safetey critical control systems are developed with respect to reliability requirements, often following a reliability standard such as IEC 61508 or CENELEC EN 50128. These standards put requirements on development practices and activities with regard to creating software that works the way it is intended based on the expected input, and where availability and integrity is of paramount importance. However, these standards do not address information security. Some of the practices required from reliability standards do help in removing bugs and design flaws – which to a large extent also removes security vulnerabilites – but they do not explicitly express such conceerns. Reliability engineering is about building trust into the intended functionality of the system. Security is about lack of unintended functionality.
Consider a typical safety critical system installed in an industrial process, such as an overpressure protection system. Such a system may consist of a pressure transmitter, a logic unit (ie a computer) and some final elements. This simple system meausres the pressure and transmits it to the computer, typically over a hardwired analog connection. The computer then decides if the system is within a safe operating region, or above a set point for stopping operation. If we are in the unsafe region, the computer tells the final element to trip the process, for example by flipping an electrical circuit breaker or closing a valve. Reliability standards that include software development requirements focus on how development is going to work in order to ensure that whenever the sensor transmits pressure above the threshold, the computer will tell the process to stop. Further the computer is connected over a network to an engineering station which is used for such things as updating the algorithm in the control system, changing the threshold limits, etc.
What if someone wants to put the system out of order, without anyone noticing? The software’s access control would be a crucial barrier against anyone tampering with the functionality. Reliability standards do not talk about how to actually avoid weak authentication schemes, although they talk about access management in general. You may very well be compliant with the reliability standard – yet have very weak protection against compromising the access control. For example, the coder may very well use a “getuser()” call in C in the authentication part of the software – without violating the reliability standard requirements. This is a very unsecure way of getting user credentials from the computer and should generally be avoided. If such a practice is used, a hacker with access to the network could with relaitve ease get admin access to the system and change for example set points, or worse, recalibrate the pressure sensor to report wrong readings – something that was actually done in the Stuxnet case.
In other words – as long as someone can be interested in harming your operation – your safety system needs security built-in, and that is not coming for free through reliability engineering. And there is always someone out to get you – for sports, for money or just because they do not like you. Managing security is an important part of managing your business risk – so do not neglect this issue while worrying only about reliability of intended functionality.