CSCI 441: Software Engineering
Safety Engineering
Reference: Ian Sommerville, Software Engineering, 10 ed., Chapter 12
Note: The information provided here is an overview of the course notes provided by Dr. Stan Kurkovsky for Software Engineering
The big picture
Safety is a property of a system that reflects the system's ability to operate, normally or abnormally, without danger of causing human injury or death and without damage to the system's environment. It is important to consider software safety as most devices whose failure is critical now incorporate software-based control systems.
Safety and reliability are related but distinct. Reliability is concerned with conformance to a given specification and delivery of service. Safety is concerned with ensuring system cannot cause damage irrespective of whether or not it conforms to its specification. System reliability is essential for safety but is not enough.
Reliable systems can be unsafe:
Safety-critical systems
In safety-critical systems it is essential that system operation is always safe i.e. the system should never cause damage to people or the system's environment. Examples: control and monitoring systems in aircraft, process control systems in chemical manufacture, automobile control systems such as braking and engine management systems.
Two levels of safety criticality:
Safety terminology
Term | Definition |
Accident (mishap) | An unplanned event or sequence of events which results in human death or injury, damage to property, or to the environment. An overdose of insulin is an example of an accident. |
Hazard | A condition with the potential for causing or contributing to an accident. |
Damage | A measure of the loss resulting from a mishap. Damage can range from many people being killed as a result of an accident to minor injury or property damage. |
Hazard severity | An assessment of the worst possible damage that could result from a particular hazard. Hazard severity can range from catastrophic, where many people are killed, to minor, where only minor damage results. |
Hazard probability | The probability of the events occurring which create a hazard. Probability values tend to be arbitrary but range from 'probable' (e.g. 1/100 chance of a hazard occurring) to 'implausible' (no conceivable situations are likely in which the hazard could occur). |
Risk | This is a measure of the probability that the system will cause an accident. The risk is assessed by considering the hazard probability, the hazard severity, and the probability that the hazard will lead to an accident. |
Safety achievement strategies:
Accidents in complex systems rarely have a single cause as these systems are designed to be resilient to a single point of failure. Almost all accidents are a result of combinations of malfunctions rather than single failures. It is probably the case that anticipating all problem combinations, especially, in software controlled systems is impossible so achieving complete safety is impossible. However, accidents are inevitable.
Safety requirements
The goal of safety requirements engineering is to identify protection requirements that ensure that system failures do not cause injury or death or environmental damage. Safety requirements may be 'shall not' requirements i.e. they define situations and events that should never occur. Functional safety requirements define: checking and recovery features that should be included in a system, and features that provide protection against system failures and external attacks.
Hazard-driven analysis:
Safety engineering processes
Safety engineering processes are based on reliability engineering processes. Regulators may require evidence that safety engineering processes have been used in system development.
Agile methods are not usually used for safety-critical systems engineering. Extensive process and product documentation is needed for system regulation, which contradicts the focus in agile methods on the software itself. A detailed safety analysis of a complete system specification is important, which contradicts the interleaved development of a system specification and program. However, some agile techniques such as test-driven development may be used.
Process assurance involves defining a dependable process and ensuring that this process is followed during the system development. Process assurance focuses on:
Process assurance is important for safety-critical systems development: accidents are rare events so testing may not find all problems; safety requirements are sometimes 'shall not' requirements so cannot be demonstrated through testing. Safety assurance activities may be included in the software process that record the analyses that have been carried out and the people responsible for these.
Safety-related process activities:
Formal methods can be used when a mathematical specification of the system is produced. They are the ultimate static verification technique that may be used at different stages in the development process. A formal specification may be developed and mathematically analyzed for consistency. This helps discover specification errors and omissions. Formal arguments that a program conforms to its mathematical specification may be developed. This is effective in discovering programming and design errors.
Model checking involves creating an extended finite state model of a system and, using a specialized system (a model checker), checking that model for errors. The model checker explores all possible paths through the model and checks that a user-specified property is valid for each path. Model checking is particularly valuable for verifying concurrent systems, which are hard to test. Although model checking is computationally very expensive, it is now practical to use it in the verification of small to medium sized critical systems.
Static program analysis uses software tools for source text processing. They parse the program text and try to discover potentially erroneous conditions and bring these to the attention of the V & V team. They are very effective as an aid to inspections - they are a supplement to but not a replacement for inspections.
Three levels of static analysis:
Static analysis is particularly valuable when a language such as C is used which has weak typing and hence many errors are undetected by the compiler. Particularly valuable for security checking - the static analyzer can discover areas of vulnerability such as buffer overflows or unchecked inputs. Static analysis is now routinely used in the development of many safety and security critical systems.
BACK TO: