Introduction to Building and Maintaining Safe Systems
Mr. Smith has 30 years’ experience in military and commercial aviation
safety (including helicopter and multi-engine turbo propellor,
helicopter and jet aircraft), 7 years as an emergency manager, and 12
years as a senior director incorporate safety management. His education
includes a Bachelor’s Degree in Engineering, a Master’s Degree in
Aeronautical Systems Engineering, and training in safety, leadership,
and Total Quality Management. He is an FAA-licensed Airline Transport
Pilot (Multi-Engine Land and Rotorcraft) and the recipient of safety
training courses at the University of Southern California, Arizona State
University, Naval Postgraduate School Monterey, and various military
Specific examples of several industries will be discussed, and case studies will be developed to orient attendees so they can dissect systems and processes with a view toward identifying potential hazards. A safe system will keep the probabilities of injury and fatality below a specific social threshold. The learning objective will be the prevention of costly accidents and injuries, and/or severe loss of productivity by conducting a hazard analysis, establishing safety system functional requirements, safety system architecture, and failure rate allocations. We will assume the lowest acceptable probability of failure of a non-safety system is 10-3 per hour.
We will look at an example involving a falling elevator and examine the residual probability of a cable breaking P(a), the probability of Overspeed detection failing P(b), the probability that brakes do not deploy despite an Overspeed being detected P(c) and the probability the brakes don’t work despite being deployed P(d). Assuming the elevator carries people 4 hours per day X 6 days per week X 52 weeks per year, the overall residual probability of system failure would be P(a)x[(P(b)+P(c))+P(d)]. This will be explained through fault tree analysis using “and/or” logic. The fault rate can be calculated as a design requirement. It will be explained that non-recurring costs in the development process are higher (e.g., hazard analysis, requirements development, flow-down, traceability, validation, and verification). It will be explained that service/inspection must be performed by a trained and certified technician on a specific schedule with record keeping.
We will examine critical system processes and interface nodes for robust resistance to breakdown/failure or employ redundancy to mitigate. For a redundancy to be effective failures must be independent (cascade failures can defeat redundancy). We must rigorously identify failure mechanisms and eliminate single-point failures. A secondary means of fulfilling the safety function can continue to deliver the original function (fail-op) or not (fail-safe) – either way, the hazard is mitigated.
We will discuss common-mode failures that cause the “AND’s” in the fault tree to become invalid (can be avoided by experienced engineering judgment or use of dissimilarity) – auto brake example.
- Social expectations of fatality risk (tolerance is higher than actual rates)
- Let’s assume an OSHA employee rate of 5 deaths per annum per 100K = 5 X 10- per annum or .00005 is our rate, but the maximum tolerable risk is much higher (5 X 10-3 per annum), so we will pick a design target of 1 X 10-4 per annum
- High and low demand safety systems
- Safety system development steps
- Failure modes and effects analysis
- Hazards and mitigations
- Fault trees
- Redundancy (detection and mitigation redundancy) – “fail-op and “fail-safe”; dual and triple redundancy)
- Maintenance and obsolescence
- Common-mode failures (lightning, fire, icing, erroneous design, bad components)
- Fault detection and isolation fault accommodation
- Active and passive monitors (former triggers fault accommodation; latter creates annunciation)
Course Level - Fundamental to Intermediate Level.
Who Should Attend
- System Designers
- Safety Managers
- Loss Prevention Specialists
Why Should You Attend
You will learn that hazards can be eliminated by:
- Substitution – using safer materials
- Simplification – minimizing parts, modes, interfaces; reducing unknowns, using computers to design dangerously complex systems
- Decoupling – reducing highly interdependent system components
- Reducing opportunities for human error
- Reducing hazardous materials or conditions by minimizing the likelihood of occurrence
- Using passive “fail-safe” safeguards or active safeguards requiring detection and recovery
- Enhancing controllability (incremental control vs. one step), providing feedback, providing fallback or intermediate steps
- Lower time pressures
- Providing decision aids
- Making correct assumptions about system behavior
- Constructing barriers (lockout and lockin)
- Establishing redundancy
A safe system is defined as a set of procedures, resulting from careful study of a task, that informs how work must be carried out. Safe systems are developed by consideration of the people, substances, and equipment involved in performing a task, identifying all foreseeable hazards, assessing the risks, and eliminating those risks by providing a formal framework for workers to follow. Safe system design incorporates hazard identification, control, reduction, and ultimately elimination, in order of decreasing cost and increasing effectiveness.