Does safety engineering require security engineering?

Safetey critical control systems are developed with respect to reliability requirements, often following a reliability standard such as IEC 61508 or CENELEC EN 50128. These standards put requirements on development practices and activities with regard to creating software that works the way it is intended based on the expected input, and where availability and integrity is of paramount importance. However, these standards do not address information security. Some of the practices required from reliability standards do help in removing bugs and design flaws – which to a large extent also removes security vulnerabilites – but they do not explicitly express such conceerns. Reliability engineering is about building trust into the intended functionality of the system. Security is about lack of unintended functionality.

Consider a typical safety critical system installed in an industrial process, such as an overpressure protection system. Such a system may consist of a pressure transmitter, a logic unit (ie a computer) and some final elements. This simple system meausres the pressure  and transmits it to the computer, typically over a hardwired analog connection. The computer then decides if the system is within a safe operating region, or above a set point for stopping operation. If we are in the unsafe region, the computer tells the final element to trip the process, for example by flipping an electrical circuit breaker or closing a valve. Reliability standards that include software development requirements focus on how development is going to work in order to ensure that whenever the sensor transmits pressure above the threshold, the computer will tell the process to stop. Further the computer is connected over a network to an engineering station which is used for such things as updating the algorithm in the control system, changing the threshold limits, etc.

What if someone wants to put the system out of order, without anyone noticing? The software’s access control would be a crucial barrier against anyone tampering with the functionality. Reliability standards do not talk about how to actually avoid weak authentication schemes, although they talk about access management in general. You may very well be compliant with the reliability standard – yet have very weak protection against compromising the access control. For example, the coder may very well use a “getuser()” call  in C in the authentication part of the software – without violating the reliability standard requirements. This is a very unsecure way of getting user credentials from the computer and should generally be avoided. If such a practice is used, a hacker with access to the network could with relaitve ease  get admin access to the system and change for example set points, or worse, recalibrate the pressure  sensor to report wrong readings – something that was actually done in the Stuxnet case.

In other words – as long as someone can be interested in harming your operation – your safety system needs security built-in, and that is not coming for free through reliability engineering. And there is always someone out to get you – for sports, for money or just because they do not like you. Managing security is an important part of managing your business risk – so do not neglect this issue while worrying only about reliability of intended functionality.

Planning lifecycle activities for safety instrumented systems

Modern industrial safety instrumented systems are often required to be designed in accordance with IEC 61508 or IEC 61511. These standards about functional safety take a lifecycle view on the safety instrumented system. Most people associate this with SIL – or safety integrity levels, which is an important concept in these standards. Many newcomers to functional safety focus only on quantitative measures of reliability and do not engage with the lifecycle process. This leads to poorer designs than necessary, and compliance with requirements from these standards is not possible without taking the whole lifecycle into account.

A good way to look at a safety instrumented system, is to define phases of the lifecycle, and then assign activities for managing the safety instrumented system throughout these phases. Based on IEC 61511 we can define these phases as:

  • Design
  • Construction
  • Commissioning
  • Operation and maintenance
  • Decomissioning

In other words – we need to manage the safety instrumented system from conception to grave – in line with asset management thinking in general. For each these phases there will typically be various activities related to the safety instrumented system that we will need to focus on. For example, in the design phase we need to focus on identifying the necessary risk reduction, performing risk analysis and determining necessary SILs for the different safety instrumented functions making up the system. A key document emerging from this phase is the “Safety Requirement Specification”. Typically, in the same phase one would start to map out vendors and put out requests for offers on equipment to buy. A guideline for vendors on what type of documentation they should provide would also be good to prepare in this early phase. The Norwegian oil and gas association has made a very nice guideline (Guideline No. 070) for application of functional safety in the oil industry; this guideline contains a very good description of what type of documentation would need to be collected. This is a good starting point.

Also part of design, and typically lasting into the construction phase as well, we would find activities such as compliance assessment (it is necessary to check whether the requirements in the SRS are actually fulfilled, based on documentation form eqipment vendors and system integrators). In addition, at this point it is necessary to complete a Functional Safey Assessment (FSA), a third-party review in the form of an audit to check that the work has been done the way the standards require us to.

Part of the plan should be on how to commission the safety instrumented system. When are the different functions tested, what type of verifications are we doing on the programming of actions based on inputs? Who is responsibel for this? All of this should be planned out from the start.

Further, when the system is taken into operation, the complete asset (including the SIS) is delivered to the company that is going to operate it. The owner is then responsible for maintenance of the system, for proof testing and ensuring that all barrier elements necessary for the system to work are in place. These type of activities should be planned as well.

Finally, the end-of-life for the asset should be managed. How to actually manage that should be part of the plan – taking the system out of service as a whole or only in parts shoudl still be done while maintaining the right level of safety for people, environment and other assets that may be harmed if an accident should occur.

Finally, there are a number of aspects that should be included in a plan for managing functional safety, that span over all these lifecycle phases. These are things like competence management of people involved in working with the SIS in the different lifecycles, how to deal with changes of the system or the environment the system is operating in, who is responsible for what and how to communicate across company interfaces – this list is not exhaustive. Consult the standards for looking at the details.

If all organizations involved in functional safety design would plan out their acitivites in a good way fewer changes would occur towards the end of large engineering projects, better quality would be obtained at a lower cost. And this, is a low-hanging fruit that we all should grab.