Safety critical software must be developed in accordance with certain practices, using specific tools fit for the required level of reliabliity, as well as by an organization with the right competence and maturity for developing such software. These software components are often parts of barrier management systems; bugs may lead to removal of critical functionality that again can lead to an accident situation with physical consequences, such as death, injuries, release of pollutants to the environment and severe material damage.
It is therefore a key question whether such software should be able to run under consumer grade operating systems, not conforming to any reliability development practices? The primary argument why this should be allowed from some vendors is “proven in use”; that they have used the software under said operating system for so many operating hours without incident, such that they feel the system is safe as borne out of experience.
It immedately seems reasonable to put more trust in a system set up that has been field testet over time and shown the expected performance, than a system that has not been tested in the field. Most operarting systems are issued with known bugs in addition to unknown bugs, and a list of bugs will exist for patching. A prioritzation of criticality is made, and the patches are developed accordingly. For Linux systems this patching strategy may be somewhat less organized as development is more distributed and less managed; even a the kernel level. The problem is akin to the classical software security problem; if software with design flaws and bugs is released, any such flaws will be found when a vulnerability is found by externals, or an incident occurs that can show the flaw. The bug or flaw is always inherent in code, and typically stems from lack of good practices during design and code development. In theory, damage resulting from such bugs and flaws shall then be limited by patching the system. In the meantime it is thought that perimeter defences cancounteract the risk of a vulnerability being exploited (this argument may not even hold in the security situation). For bugs affecting the safety in the underlying system, this thinking is flawed because even a single accident may have unacceptable consequences – including loss of human life.
In reliability engineering it is disputed whether a “workflow oriented” or “reliability growth oriented” view on software development and reliability is the most fruitful. Both have their merits. The “ship and patch” thinking inherent in proven in use cases for software indicate a stronger belief in reliability growth concepts. These are models that try to link the number of faults per operating time of the software to duration of discovery period ; most of them are modeled as some form of a Poisson process. It is acknowledged that this is a probabilistic model of deterministic errors inherent in the software, and the stochastic element is whether the actual software states realizing the errors are visited during software execution.
Coming back to operating systems, we see that complexity of such systems have grown very rapidly over the last decades. Looking specifically at the Linux kernel, the development has been tracked over time. The first kernel had about 10.000 lines of code ( in 1990). For the development of kernel version 2.6.29 they added almost 10.000 lines of code per day. If a reliability growth concept is going to work for such a rapid growth in complexity, 10.000 lines of code must be analyzed daily and end up as completely bug-free – and to prove that it is necessary to test every software state for those 10.000 lines of code.
Some research exists to compare effect of coding practices. Microsoft stated in 1992 that they had about 10-20 errors per 1000 lines of code prior to testing and that about 0.5 errors per 1000 lines of code would exist in shipped products.
Compliant development gives no guarantees on flaw and bug-free software. The same goes for development following good security practices – vulnerabilities may still exist. These practices have, however, been developed to minimize the number of design flaws and bugs getting into the shipped product. Structured programming techniques have been shown to produce code with less than 0.1 defects per 1000 lines of code – basically by following a workflow oriented quality regime in tandem with testing. If we assume 0.5 errors per 1000 lines of code in the Linux kernel (the kernel is not the entire OS), we have an estimated 7500 undiscovered bugs in the shipped version of Linux kernel 3.2.
An international rating for security of operating systems exist, the EAL rating. Commercial grade systems have a rating of EAL 4, where as secure RTOS’s tend to be EAL5 (semiformally designed, verified and tested).
The summary seems to be that consumer grade OS’s for life critical automation systems is not the best of ideas – which is why we don’t see too many of them.