Safety in ML-Enabled Systems

Christian Kästner
23 min readOct 27, 2022

--

Together with parts of “Planning for Mistakes,” this post covers the lecture “Safety” of our Machine Learning in Production course. For other chapters see the table of content.

Safety is the prevention of a system failure or malfunction that results in (1) death or serious injury, (2) loss or severe damage to property, or (3) harm to the environment or society. Safety engineering is often associated with systems that have substantial potential for harm if things go wrong, such as nuclear power plants, airplanes, and autonomous vehicles, and developers not working on such traditional safety-critical systems often pay little attention to safety concerns. This is a mistake: Even when a software system is unlikely so kill somebody, it may cause harm at smaller scale, such as (1) contributing to mental health problems with machine-curated news feeds in social media, (2) creating noise pollution by malfunctioning smart car alarms, or (3) causing stress caused by inappropriate recommendations on a video sharing site. Safety is a relevant quality for most software systems, including almost every system that uses machine learning.

Safety is fundamentally a system property. Software by itself cannot be unsafe and neither can machine-learned models. However, safety issues can emerge when software interacts with the environment, either by direct actuation, such as a controller accelerating a vehicle, or by presenting results upon which humans take actions, such as medical software suggesting an unsafe exposure of radiation as a treatment to a physician. Therefore, safety engineering always needs to consider the entire system and how it interacts with the environment, and we cannot assure safety just by analyzing a machine-learned model.

Machine learning tends to complicate safety considerations since machine-learned models always may make mistakes. As discussed in chapter Model quality, we usually do not even have a specification that could indicate what correctness even means and should consider models fundamentally as unreliable components in a system. The challenge is hence how to build safe systems even if some components are unreliable.

Safety and Reliability

A system is safe if it prevents accidents, where an accident is an undesired or unplanned event that causes harm. Beyond physical injuries to people and disastrous damage to property, harms may include harms to mental health, financial loss, property damage, polluting the environment, and harms to society more broadly such as causing poverty and polarization. Systems can rarely guarantee absolute safety where no accidents can ever happen, but safety engineering focuses on reducing risk of accidents, primarily by avoiding hazardous conditions that can enable accidents, and reducing harms when accidents still occur. The goal of safety engineering is to demonstrate that the system overall provides acceptable low levels of risks. For example, we may accept an autonomous vehicle as generally safe enough even when some accidents still happen occasionally if we can demonstrate that accidents are very rare and occur much less frequently than accidents caused by human drivers.

Reliability refers to the absence of defects in a system or component, often quantified in terms of the mean time between failure. That is, reliability refers to whether the system or its components perform as specified and how often they make mistakes, with the idea that accidents can be avoided if there are no mistakes. In principle, techniques like formal verification can even guarantee (at substantial cost) that software behaves exactly as specified (see chapter Quality assurance basics). For hardware, given physical properties, such guarantees are a bit harder to establish, but we can typically make stochastic claims about reliability. However, reliability of components is usually not sufficient to achieve safety, since accidents often happen from unanticipated interactions of components even when each component works as specified, from incorrect assumptions, or from operating a system beyond its specified scenario. Conversely it is possible to build safe systems with unreliable components, by introducing safety mechanisms in the system design.

Improving reliability of the walls of the gas storage tank is not enough to assure safety.

To illustrate how safety and reliability are separate properties, let us borrow the non-software example of a pressure tank storing flammable gas from Nancy G. Leveson: Making the walls of the tank thicker increases reliability of the vessel in terms of making it less likely to burst — but when failure occurs it may happen at much higher pressure with higher potential for harm from a violent explosion. To achieve safety, it may be more productive to invest in safety mechanisms that help to return the system to a fail-safe state, rather than to increase reliability of the components: Adding a pressure valve that releases gas if the pressure gets too high avoids creating a situation where the pressure is so high that a rupture of the tank causes significant harm, thus achieving safety even with less reliable thinner walls of the tank.

When it comes to software and machine learning, focusing too much on reliability can similarly undermine safety. We may reduce the frequency with which a software or model makes a mistake, but when that mistake will eventually happen it still causes harm. By only focusing on improving reliability, we may miss opportunities to improve the system with safety mechanisms that ensure that mistakes do not lead to accidents or that at least attempt to reduce the harm caused by the mistake.

Improving Model Reliability

When it comes to safety and machine learning, most discussions focus quickly on accuracy and robustness of models, which both relate to reliability rather than safety. For example, in many projects in industry, in the government sector, and in academia, we have seen a large interest in discussing model robustness as a safety strategy (which is really about reliability, as we will discuss), often to the exclusion of any broader safety considerations. Since accuracy and robustness are so prominent in safety discussions in practice and academia and they may actually be quite useful to reduce the frequency of mistakes, we will provide a quick overview.

Model accuracy

All activities to improve the accuracy of models will help with reliability. The model will make fewer mistakes thus reducing the frequency with which a model’s mistake can lead to an accident causing harm.

Improving model accuracy is typically the core competency of a data scientist. This may involve collecting better data, data augmentation, better data cleaning and feature extraction, better learning algorithms, better hyperparameter selection, better accuracy evaluations, and so forth.

Similarly all testing activities described in chapter Model Quality can help to get a better understanding of the frequency of mistakes, a better understanding of the supported target distribution where mistakes are rare and how far the model generalizes, and a better understanding whether mistakes are biased to specific subpopulations or corner cases. For example, for autonomous vehicle development a lot of effort is spent in anticipating corner cases, such as sinkholes, unusual road configurations or art made of traffic signs and traffic lights. Once anticipated, we can collect or augment corresponding training data and curate test data slices to evaluate reliability for corner cases that otherwise would just show up rarely in test sets and hence affect overall model accuracy only marginally.

Model robustness

Machine-learned models often have brittle decision boundaries where small modifications to the input can result in drastic changes of the predicted output. For example, a traffic sign detection model might detect a stop sign just fine, except when the image is slightly rotated, when lighting is poor, when it is foggy, or when the sensor is aging. This brittleness particularly has received attention because it can be intentionally exploited in attacks, as we will discuss in more detail in the next chapter Security and Privacy, but it can also be problematic in a safety context when the model’s predictions are unreliable in the presence of minor disturbances.

Example of an intentional attack of a traffic sign detection system: By attaching small stickers to a stop sign in deliberate locations, the model detects the sign as a speed limit sign instead. That is, the model’s predictions are not robust to small changes in the input. Image from Eykholt et al. “Robust physical-world attacks on deep learning visual classification.” In Proc. CVPR. 2018.

Robustness in machine learning has received a lot of attention in research, possibly because it can be expressed as a formal property, against which models can then be tested with various methods. In a nutshell, robustness can be expressed as the invariant ∀x’. d(x, x’) < ε ⇒ f(x) = f(x’) for a model f with input x, some distance function d, and a maximum distance ε: This invariant states that all inputs in the direct neighborhood of the given input of interest should yield the same prediction.

In a safety context, we usually care about robustness with regard to perturbations to the input that can occur randomly or due to anticipated root causes. In our traffic sign predictor, this might include a certain degree of random noise, but also predictable effects like lower contrast due to weather and light conditions, blurry images due to rain on the camera sensor, or rotation due to tilted signs or sensors. For each of these anticipated changes, custom distance functions can be defined to capture the intended neighborhood for which the predictions should be stable. The key challenge in analyzing robustness is identifying the right distance function to answer “Robust against what?”

A model with a decision boundary (black line) detecting the original stop sign and many other images in the neighborhood correctly as a stop sign, but not all images; for example, the image with several black and white stickers is not detected as a stop sign. As a whole, the prediction of the original image is not considered robust, because some neighboring inputs produce different predictions.

To use robustness to improve the reliability of a model, the key idea is that a check could be used at inference time in production to check whether the current prediction is robust — that is, whether all neighboring inputs also yield the same prediction. If the prediction is robust, we have more confidence in its correctness. If the prediction is not robust, say because the model predicts a different output for a slightly blurred version of the input, the system may not want to rely on the prediction. How exactly the system should handle a non-robust prediction is up to the system designer, for example, it may rely on a redundant fail-over component or involve a human.

Note that robustness is usually determined for a given input and neighborhood around that input, but not for the model as a whole. A model is never fully robust, since there are always inputs near the decision boundary. It is possible (and actually quite common in practice) to measure average robustness of a model across many inputs, or measure how hard it is to find non-robust inputs with a given search strategy, but it is not always obvious how to interpret such a measure when making a reliability or safety claim. Some assessment of model-wide robustness can be seen as a form of capability testing, as discussed in chapter Model quality, for example, identifying that a model generally is poor at handling certain kinds of perturbations, such as blurry or tilted images.

No nontrivial model is fully robust: In any model with a decision boundary, there are some non-robust inputs near the decision boundary with neighboring inputs on the other side of the decision boundary.

There is a large amount of research and tooling on model robustness, especially for deep neural networks, often using verification techniques (e.g., abstract interpretation or randomized smoothing) to make confident claims about whether an input is robust within a given neighborhood defined by a distance function. These techniques are usually computationally very expensive, often with costs equivalent to several thousand model inferences, making it challenging to deploy these techniques as practical tools in production code. Robustness can be a building block when building a safe system, for example, to detect when to switch to a redundant fail-over system, but it does not allow us to make safety claims about the system by itself. Robustness is fundamentally about reliability, not safety.

Models are unreliable components

As discussed at length in earlier chapters, it is best to always consider machine-learned models as unreliable components. Even when improving reliability by improving accuracy, supporting more corner cases, or strengthening robustness, mistakes will still happen eventually. If wrong model predictions within a system can cause serious harm, it seems unlikely that we will be able to increase the reliability of a machine-learned model to the point that mistakes are so rare that we do not have to worry about them.

While improving model reliability is a worthy endeavor, it must not distract us from considering safety of the entire system, beyond the model. The observation that we can build safe systems with unreliable components is good news for software products with ML components, because we have a chance to build safe systems with unreliable ML components, as we will discuss in the remainder of this chapter.

Anticipating Hazards and Root Causes

Beyond just improving component reliability, safety engineering looks at the entire system. It typically proceeds in four steps: (1) Identify relevant hazards and safety requirements, (2) identify potential root causes of hazards, (3) develop a mitigation strategy for each hazard, and (4) provide evidence that the mitigations are properly implemented and effective.

The concept of a “hazard” is common in safety engineering to describe a system condition in which an accident can happen. The accident is the actual harmful event, whereas the hazard is a more general condition that can but does not always lead to an accident (necessary, but not sufficient). For example, “an autonomous vehicle going too fast while not recognizing pedestrians” is a hazardous condition that can lead to a fatal crash, but being in the hazardous condition does not cause harm in every case. Safety engineering is about preventing accidents by ensuring that systems do not enter a hazardous condition.

It can be difficult to clearly causally attribute why the system enters a hazardous condition or what exactly caused an accident. For example, “an autonomous vehicle going too fast while not recognizing pedestrians” could be caused by wrongly recognizing the speed limit sign, by wrong predictions of a pedestrian-detection model, by a bug in the speed controller, or by a hardware malfunction of the brakes, among other possible causes. Often there are multiple mistakes that enable the hazard and may be causally responsible for an accident. Basic events that cause or contribute to a hazard are typically called root causes.

When thinking about safety, a good starting point is usually identifying hazards and their root causes. There are many hazard analysis techniques to systematically identify these, including fault tree analysis, failure mode and effects analysis (FMEA), and hazard and operability study (HAZOP), which we already discussed in chapter Planning for mistakes.

Hazard analysis can generally proceed in two directions:

  • Forward analyses start with possible mistakes (root causes), typically at the component level, and analyze whether hazards can arise from those mistakes. For example, for every machine-learned model we should ask (a) in what ways a prediction may be wrong and (b) what hazards may arise from a wrong prediction. Similarly, we should inspect our assumptions about the environment and the reliability of our sensors and actuators (see chapter Gathering requirements) and what hazards can arise from wrong assumptions. FMEA and HAZOP, introduced in chapter Planning for mistakes, are both forward analyses.
  • Backward analyses start from hazards and trace backward what mistakes (root causes) may have caused the hazard. The hazards we analyze can be informed by actual accidents that have happened or by hypothetical ones. Fault tree analysis (introduced in chapter Planning for mistakes) is often used for this backward analysis, identifying the various conditions that can lead to the hazard.

Forward and backward analysis can be interleaved in different ways, for example, using FMEA to identify possible hazards from wrong model predictions (forward analysis) and then identifying whether those hazards can also be caused in different ways (backward analysis).

It may be useful to explicitly explore different ways in which machine-learned models may be systematically wrong, in addition to preparing for a generic mistake simply from a “wrong prediction.” This allows us to later plan for mitigation strategies for specific kinds of mistakes. For example, a model may be systematically bad at recognizing traffic signs in foggy weather conditions or may have systematically lower accuracy for “do not pass” signs. Also by relying primarily on some form of pattern matching, machine learning tends to be bad at understanding edge cases and unknown cases, including many cases which humans recognize and handle easily. For example, a traffic sign detector in an autonomous vehicle may not expect to see traffic lights transported on a truck. While hazard analysis will likely not anticipate all edge cases and certainly not unknown cases, it can help to prepare for the almost certain existence of some unanticipated cases.

In many domains, we do not need to start entirely from scratch. Existing accident reports in similar systems can provide various accidents and hazards as starting points for backward analyses; existing error classifications of machine-learned models (e.g., typical mistakes that object detection systems make) can similarly help to guide forward analyses by providing a list of plausible root causes to analyze. In well understood domains like aviation, vehicles, and medical systems existing safety standards often include lists of common hazards (e.g., ISO 26262, ISO 21448, IEEE P700x, UL 4600), even when most of them were developed before the rise of machine learning.

Example excerpt of an FMEA analysis identifying possible failure modes for components in an autonomous vehicle and possible accidents.

Note that no hazard analysis technique can provide guarantees that all hazards and root causes are identified. They are intended as structured approaches to think systematically through possible failures, but they cannot replace human expertise, experience, and creativity.

Hazard prioritization. In risk management, risks are typically ranked by the likelihood of an accident multiplied with the severity of the harms caused by the accident. Therefore, hazards that are more likely to occur and hazards that have more potential for harm are typically prioritized in the design process. It is usually not necessary to exactly quantify the likelihood and harm. Most commonly rough categorizations based on some judgment and estimation into rough categories like “likely,” “unlikely,” “very unlikely” and “mild,” “severe,” and “very severe” are already sufficient to identify which hazards should be addressed most urgently and which hazards expose only an acceptably low level of remaining risk.

Hazard Mitigation

The system should be designed to be safe. Hazard analysis identifying hazards and root causes can help in the design process, because it directs attention to the specific places where mistakes can have harmful consequences and need to be mitigated. Some mitigations may eliminate a hazard entirely, but most will aim to make it less likely by increasing the number of independent conditions that need to occur for an accident to happen. As usual, it is easier to design a system in a safe way rather than trying to patch safety problems detected during testing.

We already discussed various design strategies to mitigate mistakes in chapter Planning for mistakes. These techniques are essential building blocks when designing safe systems. Instead of repeating the previous discussion, we just summarize these techniques, each with an example of how they might be used to improve safety of a sidewalk delivery robot (see chapter Technical Debt) that is built with unreliable machine-learned components for obstacle detection and navigation:

  • Human-AI interaction design (human in the loop): Identify mechanisms for humans to oversee the system. For example, prompt a remote operator to take over control on a blocked sidewalk, rather than trying to automate navigation, to avoid the hazard of driving into a construction site.
  • Undoable actions: Design mechanisms to undo automated actions, including providing mechanisms to appeal decisions. For example, provide a web form for the public to complain about robots navigating through private alleys, to avoid the (continued) harms through noise and interruptions in private spaces.
  • Guardrails: Introduce safety controls outside the model to prevent unsafe actions. For example, set a maximum speed for the vehicle that is low enough to reduce harm, to avoid the hazard of crashing into obstacles at high speed.
  • Mistake detection and recovery: Install an independent system to detect mistakes and intervene. For example, detect mistakes in obstacle detection with a bump sensor and recover with alternative route planning or by involving a remote operator, to avoid the hazard of getting stuck or repeatedly bumping into an obstacle.
  • Containment and isolation: Ensure mistakes of components are handled locally and do not spread to other components of the system. For example, ensure that a network outage in the robot’s mobile internet connection does not prevent it from detecting obstacles while it is moving, to remove this possible cause for the hazard of crashing into obstacles.

We argue that to achieve safety of a system, hazard analysis and hazard mitigation should be the key focus during system development. Improving reliability through model accuracy and robustness are important building blocks, but safety fundamentally relies on understanding what happens when component mistakes still occur.

Demonstrating Safety

For any nontrivial system, it is essentially impossible to fully guarantee safety. Safety engineering focuses on avoiding hazards and thus reducing the chance of accidents and their harms, but some risk usually remains. Even if formal methods are used to formally prove some safety properties (which is complicated by the lack of specifications for ML models), wrong assumptions about the environment, incomplete requirements, or behavior not captured in the formal model can still leave the chance for accidents.

Instead of attempting safety guarantees, practitioners usually aim to demonstrate an acceptable level of safety. Usually different forms of evidence can be provided:

  • Evidence of safe behavior in the field: Extensive field trials can provide evidence that accidents are rare. For example, autonomous vehicles are extensively tested first in simulation, then on closed test tracks, and finally on public roads (with a human driver to take over if things go wrong) to demonstrate safe behavior; medical devices typically are tested in medical trials under controlled and monitored conditions to demonstrate safety. Field trials tend to be expensive and may need to be quite extensive to provide sufficient confidence in safety.
  • Evidence of responsible (safety) engineering process: Evidence of following a rigorous engineering process can provide confidence that the engineers have anticipated and mitigated many hazards. For example, the designers of a smart medical device can show that they performed hazard analysis and built hazard mitigation for all identified hazards into the product.

Typically a combination of the two strategies are used to provide confidence in a system’s safety, without ever providing guarantees. A number of domain-specific standards for safety certification prescribe specific process steps and documentation for how to provide evidence of safety tests and safe engineering practices. For example, when approving (smart) medical devices in the US, the Food and Drug Administration usually requires evidence (a) of reliability of the model in an offline evaluation, (b) of safety and efficacy of a (smart) medical device in a clinical trial, and (c) of compliance with state of the art engineering processes.

Documenting evidence with assurance (safety) cases. Assurance cases (or safety cases) are a common format for documenting arguments and evidence for safety of a system. An assurance case is essentially a structured argument that breaks a safety claim into arguments and then provides evidence for each argument.

Excerpt of an assurance case example: The main claim is broken down into sub-claims, each of which is connected to evidence.

An assurance case helps to decompose a main safety claim hierarchically into manageable claims for which it is feasible to collect assurance evidence. The evidence can come in many different forms, including (1) results of testing, inspection, and formal verification, (2) expert opinion, (3) evidence of design mechanisms that mitigate problems, and (4) evidence of process compliance and process quality. To evaluate an assurance case, we then need to ask whether the evidence is strong enough for each (leaf) claim, whether the subclaims combined are sufficient to support the parent claim, and whether any subclaims are missing.

For autonomous vehicles, Aurora has developed and released an extended assurance case structure of claims and subclaims and explained their rationale. While evidence is not made public with these documents, they provide extensive examples of the kind of reasoning used to argue for safe behavior of autonomous vehicles.

Assurance cases have many proponents and have seen some adoption but also have received some criticism. An assurance case provides an explicit structure for safety arguments, that third-party auditors can navigate, inspect, and refute. It provides flexibility but also explicit traceability between high-level claims, low-level claims, and corresponding evidence. The structure of a safety case may be reused across similar products, hence encouraging reuse of best practices. This kind of structure can also be extended beyond safety claims for security, reliability, and other properties. However, reasoning is often informal and requires judgment. How much evidence is sufficient for a subclaim may require negotiation between developer and evaluator. Evaluators may be prone to confirmation bias and accept provided evidence rather than critically questioning whether any hazards are missing from the analysis or any claims interact in unanticipated ways. If abused, an assurance case can be interpreted as a proof and used to signal a level of confidence in safety that is not justified.

Safety culture. Safety engineering, especially when needed to demonstrate compliance with some regulation (e.g., FDA approval for medical devices), faces the risk of turning into a checkbox compliance exercise. If safety is not taken seriously, but seen only as a box to check, engineers go through the motions and generally minimally follow the prescribed procedures and produce the paperwork needed, but do so only as required without critical engagement. For example, they may kind of follow a hazard analysis procedure such as FMEA and produce some tables showing some potential hazards (possibly only the ones the developers have already mitigated), but do so without creativity and investing real effort, missing serious hazards. Such steps might be sufficient to shield them from liability when an accident happens, but they barely contribute to actual safety.

If safety is taken seriously in a product, it shows in a safety culture in the team. The team is committed to achieving safety as a key priority, visible in internal goal statements, in public mission statements, and, importantly, also in everyday practice. When competing demands rest on the project and tradeoffs need to be made, a team with a robust safety culture will avoid shortcuts that compromise safety and choose the safer path, even if it takes longer to develop or sacrifices functionality in the product.

Openness, trust, and avoiding blame are important for a healthy safety culture. When safety concerns surface, instead of blaming individuals responsible, appreciate the identification of the problem and focus on how to improve the product and possibly how to improve the process to avoid future similar problems. This goes as far as seeing accidents not as failure but as learning opportunities, triggering in-depth investigations on how to improve. This creates an environment where engineers avoid hiding mistakes, where engineers feel empowered to think deeply about potential problems, and where everybody feels comfortable raising concerns and dissenting opinions.

As with all forms of team culture (see chapter Interdisciplinary Teams), establishing cultural norms takes effort and changing culture is a slow and difficult process that usually requires strong leadership.

The AI Alignment Problem

We believe that most practical safety concerns in ML-enabled systems are best addressed through traditional safety engineering with careful hazard analysis and careful hazard mitigation through system design. In the popular press and also part of the research community, a lot of attention is placed on potential scenarios of robots finding loopholes and achieving their given tasks with unintended and possibly disastrous consequences — known commonly as the alignment problem. The extreme examples of these involve dystopian scenarios, for example, as in the Terminator movies where a defense AI system decides to consider all humanity as a threat and thus launches a nuclear attack. Another common example is the paperclip maximizer, an AI task to produce paper clips, but given enough autonomy that it prioritizes paper clip production over everything else, eventually consuming all resources on the planet.

The alignment problem is all about designing the right objective function for a task, so that the way the task is encoded for the AI aligns with how humans intend the AI to perform the task. In a nutshell, this is a requirements engineering problem of specifying the right requirements for the system. A problem occurs when the AI’s objective function does not align with the real requirements of the task, particularly when it does not consider negative side effects as costs. For example, a sidewalk robot with the objective function to reach a goal fast might drive at dangerously high speeds and might endanger cyclists in bike lanes, unless speed limits and not interfering with other traffic is embedded in the objective — technically the dangerous behavior of the sidewalk robot meets the given specifications, but it does not follow the designer’s intent.

If we anticipated the negative side effects, we could encode them as constraints or costs in the objective function. However, AI algorithms are often very good at finding loopholes to achieve the goal that we did not anticipate, a process also known as reward hacking. Examples of such loopholes are common especially in game AIs and include (a) a Tetris AI that learns to pause the game indefinitely to avoid losing, (b) a racing game AI that learns that going in circles to hit a few targets repeatedly gives more points than competing in the race, and (c) an AI finding and exploiting bugs in games.

Defining an objective function that takes all requirements into account can be very challenging, since it requires understanding and expressing requirements very precisely, anticipating all potential loopholes. Humans generally use common sense to reason about corner cases and tend to not enumerate all possible cases. There are many specific problems, including hard to observe goals, abstract rewards, and feedback loops that make defining a clear objective, that make it difficult and likely often impossible to write objective functions that fully align with the real intention.

Many of these problems particularly occur with reinforcement learning, where an AI learns incrementally from interacting with the environment. Most examples and discussions occur in game AIs and (simulated) robots, where AI algorithms can freely and repeatedly explore actions. When exploration happens in the real world, there are additional questions of how to explore safely and scale human oversight. These problems will become more severe the more autonomy we give AI-enabled systems, especially if we were to make advances toward more artificial general intelligence. However, even without reinforcement learning and artificial general intelligence, it is worth asking questions about potential harms from misaligned objective functions as part of the hazard analysis for any ML-enabled systems.

While the alignment problem is popularly discussed and an interesting conceptual problem, we think it is not usually a serious safety concern for the kind of software products with ML components we discuss in this book. However, if we ever get to artificial general intelligence, the alignment problem might become much more important.

Summary

Safety is the prevention of harm at small and large scale from system failure or malfunction — not just serious injury and death, but also property damage, environmental pollution, stress, and societal harms. Even if machine-learned models were reliable, reliability alone is not sufficient to assure safety. Safety is a system property that requires an understanding of how components in the system interact with each other and with the environment; it cannot be assured at the software or ML model level alone.

Safety engineering focuses on designing systems in a way that they minimize risks of accidents. This typically relies heavily on design strategies that avoid and mitigate hazards that could lead to accidents. Hazard analysis techniques can help to anticipate many problems and to design corresponding mitigations. Model robustness is a heavily discussed research topic that can improve the reliability of the ML components and provide a building block when engineering safe systems. While safety guarantees are rare, it is common to demonstrate safety and document claims and evidence in structured form in assurance cases.

Whether robots will eventually pose existential threats, because they solve tasks we give them in ways we did not anticipate is at the heart of discussions of the alignment problem. While it may not be a concern for the kind of ML-enabled systems mostly covered in this book, it is worth having an eye on this discussion, especially when pushing more into reinforcement learning and increased autonomy.

Further Reading

As all chapters, this text is released under Creative Commons 4.0 BY-SA license.

--

--

Christian Kästner

associate professor @ Carnegie Mellon; software engineering, configurations, open source, SE4AI, juggling