Robust Techniques in Solving Complex Problems
Dev Raheja, MS, CSP, author of the books Design for Reliability”, Preventing Medical Device Recalls, and Safer Hospital Care, is international risk management, reliability, durability, and system safety consultant for the government, commercial, and aerospace industry for over 30 years. His clients include Army, Navy, Air Force, NASA, Siemens, Eaton, Boeing, Lockheed, Northrup Grumman, General Motors, Prior to becoming a consultant in 1982 he worked at GE as Supervisor of Quality Assurance/Manager of Manufacturing Engineering, at Cooper Industries as Chief Engineer, and at Booz-Allen & Hamilton as Risk Management consultant for a variety of industries.
teaches Design for Reliability courses at the University of Maryland
for degree programs in Mechanical Engineering and Reliability
Engineering. He is a Fellow of the American Society for Quality and
recipient of its Austin Bonis Award for Reliability Education
Advancement, and former chair of the Reliability Division. He is a
Senior Member of IEEE. He is a former National Malcolm Baldrige Quality
Award Examiner in the first batch of examiners. He served as Vice
president of the International System Safety Society where he received
the Scientific Achievement Award and the Educator-of-the-Year Award. He
served on the Board of Directors for the Annual Reliability and
Maintainability Symposium for more than 10 years.
Robustness analysis captures three types of risks: the risk of the known, the risk of known unknowns, and the risk of unknown unknowns. The known risks are understood through past history and customer statements of needs. Unfortunately, customers are not aware of many potential requirements until the device is functionally broken. This is the experience of this author working with many organizations. They recognize the need for a requirement only after they “don’t get it”. This is particularly true for software specifications. Many companies attempt to make use of lessons learned, but most do not have formal and verifiable protocols. Some known risks can be identified through tools such as Failure Mode and Effects Analysis, Fault Tree Analysis, and Event Tree Analysis. Some progress is being made in handling the known risks. The other two risks are significant, but there is no significant progress.
The “known-unknown” risks are unknown to the specification writers but are known to users of similar devices. The author, while working with the Baltimore Mass Transit System, could not come up with more than 200 requirements in the specification with the engineers. He interviewed train drivers, technicians, and passengers in San Francisco’s BART system, and discovered a list of over 1000 concerns. At least 500 of them were added to the Baltimore requirements.
The “unknown-unknowns” are special risks. They mostly apply to smart devices such as smart infusion pumps, MRIs, patient monitoring systems, and smart alarms that depend on trustworthy inter-operability. The faults are usually unpredictable with the tools we have today. The reason is that the systems are too complex. No longer are we dealing with one mechanical system which can perform and stand alone. The software in a pacemaker may require over 80,000 lines of code, a drug-infusion pump of 170,000 lines, and an MRI (magnetic-resonance imaging) scanner with more than 7million lines. This growing reliance on software causes problems that are familiar to anyone who has ever used a computer: bugs, crashes, and vulnerability to digital attacks. The key point is that we are dealing with a system made up of several systems. The software typically interacts with several systems, resulting in hundreds of possible interactions called system-of-systems. The interactions are unbounded. We cannot know how the system-of-systems will behave by knowing only the behavior of individual systems. Tweaking one system without the knowledge of inter-system behavior is doomed to failure. The unknown - unknown risks are the result of a lack of knowledge of the interactions and associated behavior of the system-of-systems. Altering the behavior of any part affects other parts and connecting systems.
- Nature of complex problems
- Complex Functions
- Active safety
- Safe defaults for sudden malfunctions
- Human interface requirements
- Logistics requirements (software changes, maintenance)
- Input interface requirements
- Output interface requirements
- Installation requirements
- Engineering change requirements
Course Level - All levels including senior management.
Who Should Attend
- Senior management
- Software managers and engineers
- Hardware engineer
- Systems engineers
- Quality assurance staff
- Safety staff
- Security staff
- Marketing managers
Why Should You Attend
Usually, 60% of requirements are missing in most specifications. This creates unmanageable systems in real use. Architecture and “principled engineering practices” therefore become highly flawed. It affects a wide range of systems and services, with potentially life-threatening consequences. In other words, complexity cannot be controlled. The device would accept unsigned, counterfeit software updates and ignore security.
Complexity control in most systems is a function of several systems working together to produce properties and behavior different than those of components. The disciplines of gathering such intelligence are often missing. This is one reason for the increase in device recalls. Most manufacturers have not applied rigors of hardware risk analysis to software designs. The same methods apply to software even though there are differences in software and hardware. Specification Requirements Analysis, PHA, FMEA, FTA, and HAZOP are great tools for controlling complexity. Approximately 80% of the dollars that go into system development are spent on finding and fixing failures. This is very inefficient. For a robust design, the opposite is required, that is, 80% of dollars should be spent on preventing failures so that the chance of mishaps are dramatically reduced.
The importance of robustness in system integration cannot be disputed. Product mishaps inevitably lead to losses in the form of recall costs from deaths, warranty claims, customer dissatisfaction, loss of sales, and in extreme cases, loss of entire business. Thus, designing to avoid mishaps of any kind plays a critical role in modern science and engineering to create opportunities for a very high return on investment.