Security and Privacy in ML-Enabled Systems
This post covers content from the “security and privacy” lecture of our Machine Learning in Production course. For other chapters see the table of content.
Malicious actors may try to interfere with any software system and there is a long history of attempts, on one side, to secure software systems and, on the other side, to break those security measures. Among others, attackers may try to gain access to private information (confidentiality attack), may try to manipulate data or decisions by the system (integrity attack), or may simply take down the entire system (availability attack). With machine-learning components in software, we face additional security concerns, as we have to additionally worry about data (training data, inference data, telemetry) and models. For example, malicious actors may be able to manipulate inference data or trick a model into making a specific prediction, such as slightly manipulating offensive images to evade being deleted by a content moderation model.
Privacy is the ability to keep information hidden from others. Privacy relies on being deliberate about what data is gathered and how it is used, but also relies on the secure handling of information. Machine learning additionally introduces new privacy threats, for example, models may infer private information from large amounts of innocent-looking data, such as ML models in e-commerce companies predicting a customer’s pregnancy from purchase data without the customer disclosing that fact, and in some cases even before the customer knows about the pregnancy.
In this chapter, we provide a brief overview of common security and privacy concerns, new challenges introduced with machine-learning components, and common design strategies to improve security and privacy in software systems.
Use case: Content moderation
Whenever an organization allows users to post content on their website, some users may try to post content that is illegal or offensive to others. Different organizations have different, often hotly-contested policies about what content is allowed and different strategies to enforce those policies. Since manual moderation is expensive and difficult to scale, many large social media websites rely on automated moderation through models that identify and remove copyrighted materials, hate speech, and other objectionable content in various forms of media, including text, images, audio, and video. For this chapter, let us consider a social image-sharing site like Pinterest or Instagram that wants to filter calls for violence and depiction of violence within images.
Security Requirements
What it means to be secure may differ between projects. For a specific project, security requirements (also called security policies) define what is expected of a project. A responsible engineer will then design the system, such that these security requirements are likely to be met, even in the presence of malicious users who try to intentionally undermine them. Most security requirements fall into a few common categories, and most systems have requirements from all these categories to be considered secure. The most common way to classify security requirements is as confidentiality, integrity, and availability (CIA triad) which are often all relevant to a system.
Confidentiality requirements. Confidentiality simply indicates that sensitive data can be accessed only by those authorized to do so, where what is considered sensitive and who is authorized for what access is highly project specific. In our running example, we likely want to ensure that private posts on the social-media platform are only readable to those selected by the user. If private information is shared with the software, confidentiality is important to keep it private from others. Malicious users may try to gain access to information they should not have access to.
In a machine learning setting, we may need to additionally consider who is supposed to have access to training data, models, and inference and telemetry data. For example, we might also want to keep the inner workings of the content-moderation model secret from users, some of which might otherwise use that information to try to specifically craft images calling for violence that are just beyond the decision boundary where the model detects it. We also have to worry about new ways to indirectly access data, for example, inferring information about (confidential) training data from model predictions or inferring information about people from other inference data. In the content-moderation example, we likely do not want users to be able to recover examples of forbidden content used for training the content-moderation model, where that data may even contain private information of a user previously targeted for harassment.
Integrity requirements. Whereas confidentiality is about controlling access to information, integrity is about restricting creating and modifying information to those authorized to do so. In our running example, we might want to ensure that only the original users and authorized moderators (automated or human) can delete or modify posts. Malicious users may try to modify information in the system.
With machine learning, again we have to worry additionally about training data, models, and inference and telemetry data. For example, we may want to make sure that only developers in the right team are allowed to change the model used for content moderation in production. Again, we need to be worried about indirect access: When users have the ability to influence training data or training labels through their behavior captured in telemetry data, say by reporting harmless images as violent with a button, they may be able to manipulate the model indirectly.
Availability requirements. Malicious users may try to take the entire system down or make it so slow or inaccurate that it becomes essentially useless, possibly stopping critical services on which users rely. For example, a malicious user may try to submit many images to overload the content moderation service such that other problematic content stays on the site longer or they might try to take down the entire site as revenge over an account suspension. Classic distributed denial of service (DDoS) attacks where malicious actors flood a system with requests from many hijacked machines are a typical example of attempts to undermine availability requirements.
With machine learning, attackers may target expensive and slow model inference services or they may try to undermine model accuracy to the point where the model becomes useless (e.g., by manipulating training data). For example, influencing the content-moderation model to almost never flag content, almost always flag content, or just randomly flag content would all be undermining the availability of the content-moderation system in practical terms.
Attacks and Defenses
Security discourse starts with the mindset that there are malicious actors (attackers) that want to interfere with our system. Attackers may be motivated by all kinds of reasons. For example, attackers may try to find and sell private customer information such as credit card numbers; they may attempt to blackmail users or companies based on private information found in the system such as private photos suggesting an affair; they may plant illegal material in a user’s account to get them arrested; or they may attempt to lower overall service quality to drive users to a competitor. Attacks may be driven purely by monetary incentives such as selling private information, ransomware, blackmail, and disrupting competitors, but attacks can also come as a form of activism, such as, exposing internal documents from nefarious organizations and drawing attention to pollution or climate change.
At the same time, developers try to keep the system secure by implementing defense mechanisms that assure that the security requirements are met, even when faced with an attacker. Typical and common defense mechanisms are limiting access to the system generally (e.g., closing ports, limiting access to internal addresses), authorization mechanisms to restrict which account has access to what (e.g., permissions, access control lists), authentication of users to ensure that users are who they say they are (e.g., passwords, 2-factor authentication, biometrics), encryption to protect data from unauthorized reading when stored or in transit, and signing data to track which account created information.
Attackers can break the security requirements if the system’s defense mechanisms are insufficient or incorrectly implemented. These gaps in the defense are called vulnerabilities. Vulnerabilities can stem from bugs in the implementation that allow the attacker to influence the system, such as a buffer overflow enabling an attacker to execute custom code and a bug in a key generator reducing key entropy. Frequently though, vulnerabilities stem from design flaws where the defense mechanisms were not sufficient to begin with, such as, not encrypting information sent over public networks, choosing a weak encryption algorithm, and setting a default admin password. There is a fundamental asymmetry here in that developers need to defend against all possible attacks but attackers often only need one weakness in the defense.
Notice that we have a world vs machine problem (see chapter Gathering Requirements): We can only reason about the machine view (e.g., user accounts) but not about the real world (e.g., people) beyond what is mediated by sensors (e.g., inputs, network traffic, biometry). A holistic security solution needs to consider the world beyond the machine, for example, how people could influence inputs of the system (e.g., faking a fingerprint), how information is transferred in the physical world (e.g., whether somebody can eavesdrop on an unencrypted TCP/IP connection), how information can flow within the real world (e.g., writing down a password, using partner’s name as password), and even whether somebody has physical access to the machine (e.g., disconnect and directly copy data from the hard drive).
ML-Specific Attacks
While security is important and challenging without introducing machine-learning components into a software system, machine learning introduces new attack strategies that are worth considering. In the following, we discuss four commonly discussed attacks. These attacks tend to relate to access to training or inference data and tend to emerge from fitting models to data without having clear specifications. Some of these attacks may seem rather academic and difficult to exploit in the wild, but it is worth considering whether defenses are in order for a given system.
Evasion attacks (adversarial examples)
The most commonly discussed attack on machine-learned models are evasion attacks, commonly known as adversarial examples. In a nutshell, in an evasion attack, the attacker crafts the input (inference data for the model) such that the model will produce a desired prediction at inference time. Typically the input is crafted such that it does look innocent to a human observer, but tricks the model into a “wrong” prediction — that is human and model disagree on the right outcome. For example, an attacker knowing the content moderation model could create a tailored image that to humans clearly contains a call for violence, but that the model classifies as benign. Evasion attacks are often used to circumvent integrity security requirements, for example, to post violent content that is not allowed. If the model is used for access control itself, such as facial recognition for logging into a phone, an adversarial attack can also break confidentiality requirements.
Adversarial examples work because machine-learned models usually do not learn exactly the intended decision boundary for a problem, if we could even specify that decision boundary in the first place. That is, there are inputs where models will make mistakes and adversarial attacks are specifically looking for such mistakes. In the simplest case, we just search for an input i for which model f produces the desired prediction o: f(i) = o, but more commonly, we search for a small modification δ to an existing input x, where the modification is small enough to be barely perceptible to humans, but sufficient to change the prediction of the model to the desired outcome o: f(x+δ)=o.
Academics have proposed many different search strategies to create adversarial examples. The search tends to start with a given input and then explores the neighborhood until it finds a nearby input with the desired outcome. Search is much more efficient if the internals of the model are known, because the search can follow the gradients of the model. If the attacker has no direct access to the model, but the model returns confidence scores and can be queried repeatedly, classic hill-climbing algorithms can be used to incrementally modify the input toward the desired outcome. Attacks are more difficult if queries to the model are limited (e.g., rate limit) and if the inference service provides predictions without (precise) confidence scores.
Evasion attacks and the search for adversarial examples are closely related to counterfactual examples discussed in chapter Interpretability and Explainability and to robustness discussed in chapter Safety: Counterfactual examples are essentially adversarial examples with the intention of showing the difference between the original input and the adversarial input as an explanation, for example, “if you had removed this part of the image, it would not have been considered as violent.” Robustness is a property that essentially ensures that no adversarial examples exist within a certain neighborhood of a given input, such as, “this image depicts violence and that is still the case for every possible change involving 5% of the pixels or fewer.”
At the time of this writing, it is not obvious what real-world impact adversarial examples have. On the one hand, even since the early days of spam filters, spammers have tried to evade spam filter models by misspelling words associated with spam and inserting words associated by the model with non-spam messages (often in a try-and-error fashion by human attackers rather than by analyzing actual model boundaries). Attackers have also tailored malicious network messages to evade intrusion detection systems. On the other hand, essentially all examples and alarming news stories of sophisticated adversarial attacks against specific models come from academics showing feasibility rather than real-world attacks by malicious actors — including makeup and glasses to evade facial biometrics models, attacks attaching stickers to physical traffic signs to cause misclassifications of traffic sign classifiers or steer a car into the opposing lane, and 3D-printed objects to fool object detection models.
Defenses. There are multiple strategies to make adversarial attacks more difficult.
- Improving decision boundary: Anything that improves the model’s decision boundary will reduce the opportunity for adversarial examples in the first place. This includes collecting better training data and evaluating the model for shortcut learning (see chapter Model Quality). As discussed throughout this book though, no model is ever perfect, so we are unlikely to ever prevent adversarial examples entirely.
- Adversarial training: Use adversarial examples to harden the model and improve decision boundaries. A common strategy is to search for adversarial examples, commonly starting with training data or telemetry data as the starting point, and add the found adversarial examples with correct labels to the training data. This way, we incrementally refine training data near the decision boundary.
- Input sanitation: In some cases, it is possible to use domain knowledge (or information from past attacks) to identify parts of the input space that are irrelevant to the problem and that can be sanitized at training and inference time. For example, color depth reduction and spatial smoothing of image data can remove artifacts that a model may overfit on and that an attacker can exploit when crafting adversarial examples. By reducing the information that reaches the model, the model may be more robust, but also has fewer signals to make decisions, resulting possibly in lower accuracy.
- Limiting model access: Restricting access to the model, limiting the number of inference requests, and not giving (exact) confidence scores all make it more costly to search for adversarial attacks. While some attacks are still possible, instead of a highly efficient search on model gradients, attackers may have to rely on few samples to learn from and few attacks to try.
- Redundant models: Multiple models are less likely to learn the exact same decision boundaries susceptible to the same adversarial examples. It may become more expensive for attackers to trick multiple models at the same time and discrepancies between model predictions may alert us to unreliable predictions and possible adversarial attacks.
- Redundant information: In some scenarios, information can be encoded redundantly making it harder to attack the models for each encoding at the same time. For example, a checkout scanner can rely on both the barcode and the visual perception of an object when detecting an item (e.g., ensuring that the barcode on a bag of almonds was not replaced with one for lower-priced bananas). As a similar example, proposals have been made to embed infrared “smart codes” within traffic signs as a second form of encoding information.
- Robustness check: Robustness checks at inference time (see chapter Safety) can evaluate whether a received input is very close to the decision boundary and may hence be an attack. Robustness checks tend to be very expensive and require careful consideration of the relevant distance metric within which the attacks would occur.
All of these approaches can harden a model and make attacks more difficult, but given the lack of specifications in machine learning, no approach can entirely prevent wrong predictions that may be exploited in adversarial attacks. When considering security, engineers will need to make difficult tradeoff decisions between security and accuracy, between security and training cost, between security and inference cost, between security and benefits provided to users, and so forth.
Poisoning attacks
Poisoning attacks are indirect attacks on a system with a machine-learned model that try to change the model by manipulating training data. For example, attackers could attempt to influence the training data of the content-moderation model such that content with certain political messages is often filtered as violent content, even if it does not contain any violence. Untargeted poisoning attacks try to render the model inaccurate in production use, breaking availability requirements. In contrast, targeted poisoning attacks aim to manipulate the model to achieve a desired prediction for a specific targeted input, essentially creating a back door and breaking integrity requirements.
To anticipate possible poisoning attacks it is important to understand how attackers can directly or indirectly influence the training data. Given how many systems collect training data from production data and outsource or crowdsource data collection and data labeling, attackers have many approaches beyond directly breaking into our system to change data and labels in a database. If we rely on public datasets or datasets curated by third parties, could an attacker have influenced that dataset? If we crowdsource data collection or data labeling, can an attacker influence data or labels? If we incorporate telemetry data as new training data, could an attacker create specific telemetry data to influence the training data? In our content-moderation example, an attacker could contribute to public training datasets, an attacker could intentionally mislabel production data by reporting benign content as violent on the platform, and an attacker could intentionally upload certain violent images and then flag them with a different account.
There are many real-world examples of data poisoning, though mostly not very sophisticated: In 2015, an anti-virus company collecting virus files on a web portal alleged that a competitor had uploaded benign files as viruses to degrade product quality to the point of causing false positive alerts that annoyed and unsettled users. Review bombing is a phenomenon in which one person with many accounts or a group of people all poorly review a movie, video game, or product for perceived political statements, such as the review bombing of the 2022 Amazon Prime series The Rings of Power over its diverse cast — if not countered review bombing affects ratings and recommendation systems. Microsoft’s failed 2016 chatbot Tay learned from user interactions and, in what Microsoft called a coordinated attack, some users successfully fed data that led Tay to utter anti-semitic statements within 24 hours of its release.
Studies have shown that even small amounts of mislabeled data can substantially reduce the accuracy of a model, rendering it too unreliable for production use. Similarly, a few mislabeled points of training data can be enough to flip the prediction for a specific targeted input, thus essentially creating a back door in the model — for example, an attacker wanting to get a specific image taken down by the content moderation system (without the ability to influence that image) could create a few similar images, upload them, and flag them as violent, hoping that the next version of the model now misclassifies the target image. Moreover, large datasets may be difficult to review, so it may be relatively easy to hide a few poisonous data points. Similar to evasion attacks, having access to details of the other training data and the pipeline enables more efficient and targeted attacks that create damage with very few very new or mislabeled poisonous data points.
Defenses. The most common defenses against poisoning attacks focus on detecting and removing outliers in the training data and on detecting incorrect labels. However, defenses should consider the entire system and how data flows within the system, and what data can be accessed or influenced by attackers. This is important both (a) when users can influence data directly, for example, by uploading or reporting content, and (b) when information is collected indirectly from user behavior, for example, when interpreting whether content is widely shared as a proxy for whether it is benign. Overall there are many possible defense mechanisms, including:
- Improving robustness to outliers: There are many techniques to detect and remove outliers in data (including anomaly detection), but it is important to balance this with recognizing drift when data changes consistently across many users. Data debugging techniques can help to investigate outliers, such as influential instances discussed in chapter Interpretability and Explainability. Also some machine learning algorithms are designed to be particularly robust to outliers, such as ensemble learning with bagging.
- Review external datasets: Not all outside training data, such as public datasets or datasets created by a third party, may be trustworthy. For example, developers may prefer datasets from reputable sources that clearly describe how data was curated and labeled.
- Increase confidence in training data and labels: Increase confidence in training data and labels by either (a) reducing reliance on individual users or (b) considering the reputation of users. To avoid reliance on individual users, establish consensus between multiple users: When crowdsourcing, ask multiple people to label data and check agreement. For example, consider images as violent only when multiple users flag it and possibly even review labels in a manual step before using them for training. A reputation system might be used to trust information from older and more active accounts more and detect and ignore bots and accounts with unusual usage patterns.
- Hiding and securing internals: Keeping training data, model architecture, and overall ML pipeline confidential makes it more challenging for attackers to anticipate the specific impact of poisoned data. Of course, attackers should also not be able to modify training data directly, for example by hacking directly into the database storing the training data.
- Track provenance: By authenticating users for all telemetry and tracking data provenance (see chapter Provenance), developers can increase the barriers to injecting malicious telemetry. This also allows the system to track sources of all training data for anomaly detection or possible later removal. Note that detailed provenance tracking may be incompatible with anonymity and privacy goals in collaborative and federated learning settings.
None of these defenses will entirely prevent poisoning attacks, but each defense increases the cost for attackers.
Model extraction attacks
Models are difficult to keep confidential. When allowing users to interact with the model through an API, attackers can extract a lot of information about the model simply by querying the model repeatedly. With enough queries, the attacker can learn a surrogate model on the predicted results (see chapter Interpretability and Explainability) that may perform with similar accuracy. This stolen model may then be used in own products or may be used as the basis of performing evasion or poisoning attacks more efficiently. In our content-moderation example, an attacker may learn a surrogate model to understand exactly what kind of content gets moderated and to which features the model is sensitive.
We are not aware of many known public real-world examples of model extraction attacks but would not be surprised if they were common and often undetected among competitors. In 2011, Google accused Microsoft of stealing search results by training their own search engine models on results produced by Google’s search. In a sting operation, they set up fake results for some specific synthetic queries (e.g., “hiybbprqag”) and found that Microsoft’s search returned the same results for these queries a few weeks later.
Defenses. Model stealing can be made harder by restricting how the model can be queried. If a model is used only internally within a product, it is harder for attackers to query and observe. For example, the predictions of the content moderation model may only be shown to moderators after users have reported a posted image, rather than revealing the model’s prediction directly to the user uploading the image. If model predictions are heavily processed before showing results to users (see chapter Planning for Mistakes), attackers can only learn about the behavior of the overall system but may have a harder time identifying the specific behavior of the internal model.
In many cases though, model predictions are (and should be) visible to end users, for example, content moderation is typically automated whenever content is uploaded and search engine results are intended to be shown to users. In some cases, the model inference service may even be available as a public API, possibly providing even confidence scores with predictions. When the model can be queried directly or indirectly, then rate limiting, abuse detection (e.g., detecting large numbers of unusual queries), and charging money per query can each make it more difficult for attackers to perform very large numbers of queries. In some cases, it is also possible to add artificial noise to predictions that make model stealing harder, though this may also affect the user experience through reduced accuracy.
Model inversion and membership inference attacks
When attackers have access to a model they can try to exfiltrate information from the training data with model inversion attacks and membership inference attacks, breaking confidentiality requirements. Since models are often trained on private data, attackers may be able to steal information, such as extracting another patient’s medical information from a medical diagnosis model. In our content moderation scenario, attackers could attempt to extract privately posted images from the content moderation model or determine whether a given image was previously flagged and used for training.
A membership inference attack basically asks whether a given input was part of the training data, for example, whether a given image was previously flagged and used for training in our content moderation scenario. A model inversion attack aims to reconstruct the training data associated with a specific prediction, for example, recover images used as training data for content moderation that were labeled as violent. Academics have demonstrated several such attacks, such as recovering medically sensitive information like schizophrenia diagnoses or suicide attempts from a model trained on medical discharge records given only partial information about a patient’s medical history and recovering (approximations of) photos used to identify a specific person in a face recognition model.
Model inversion and membership inference attacks rely on internal mechanisms of how machine-learning algorithms learn from data and possibly overfit training data. Machine-learned models can often essentially memorize and reproduce parts of the training data, as was very visible when GitHub’s Copilot reproduced large chunks of open-source code verbatim on certain prompts. Since a model is usually more confident in predictions closer to training data, the key idea behind these attacks is to search for inputs for which the model can return a prediction with high confidence. There are many different technical approaches to perform such attacks; as for evasion and poisoning attacks, they are usually more effective with access to model internals but also work when having only access to an API.
At the time of writing, we are not aware of any model inversion attacks or membership inference attacks performed by malicious actors in the wild.
Defenses. Defenses against model inversion and membership inference attacks usually focus on reducing overfitting during model training, adding noise to confidence scores after inference, and novel machine-learning algorithms that make certain (narrow) privacy guarantees. Since these attacks rely on many model queries, again, system designers can use strategies like rate limiting and abuse detection to increase the attacker’s cost. In addition, at the system level, designers may be able to sometimes ensure that the training data is sufficiently anonymized that even successful membership inference does not leak meaningful confidential information.
The ML security arms race
Security in machine learning is a hot research topic these days with thousands of papers published yearly. There is a constant arms race of papers demonstrating attacks, followed by papers showing how to defend against the published attacks, followed by papers showing how to break those defenses, followed by papers with more robust defenses, and so forth. By the nature of machine learning, it is unlikely that we will ever be able to provide broad security guarantees at the model level.
Most current papers on security in machine learning are model centric, analyzing specific attacks on a model and defense strategies that manipulate the data, training process, or model in lab settings. Demonstrated attacks are often alarming, but also often brittle in that they are tied to a specific model and context. In contrast, more system-wide security considerations, such as deciding how to expose a model, what telemetry to use for training, and how to rate limit an API, are less represented, as are discussions around tradeoffs, costs, and risks. At the system level, it is a good idea to assume that a model is vulnerable to the various attacks discussed and consider how to design and secure the system around it.
Threat Modeling
Threat modeling is an effective approach to systematically analyze the design of a software system for security. Threat modeling takes a system-wide holistic perspective, rather than focusing just on individual components. Threat modeling is ideally used early in the design phase, but can also be used as part of a security audit of a finished product. It establishes system-level security requirements and suggests mitigations that are then implemented in individual components and infrastructure.
While there are different flavors of threat modeling, they usually proceed roughly through five stages: (1) Understanding attacker goals and capabilities, (2) understanding system structure, (3) analyzing the system structure for security threats, (4) assessing risk and designing defense mechanisms, and (5) implementation and testing of defense mechanisms.
Understanding attacker goals and capabilities
Understanding the motivation and capabilities of attackers can help to focus security activities. For example, in our content-moderation scenario, we have very different concerns about juveniles trying to bypass the content-moderation system in a trial-and-error fashion for personal bragging rights than concerns about nation-state hackers trying to undermine trust in democracy with resources, patience, and knowledge of sophisticated hacks over long periods. While it may be harder to defend against the latter, we may be more concerned about the former group if we consider those attacks to be much more likely. A list of common security requirements and attacks can guide brainstorming about possible attack motives, for example, asking why attackers might want to undermine confidentiality or availability.
Understanding system structure
To identify possible weak points and attack vectors, understanding the system structure is essential. Threat modeling involves describing the structure of the software system, how it exchanges and stores information, and who interacts with it — typically in the form of an architecture-level data-flow diagram. When it comes to machine-learning components in software systems, the diagram should include all components related to data storage and to training, serving, and monitoring of the models. Furthermore, it is particularly important to carefully track how training and inference data flow within the system and how it can be influenced directly or indirectly by various components or actors. Note that actors also include people indirectly interacting with the system by curating or contributing to public training data, by labeling some data, and by influencing telemetry data.
For example, in the content-moderation scenario, the model is trained regularly by an ML pipeline from a database with training data. That training data is seeded with manually labeled images and a public dataset of violent images. In addition, the dataset is automatically enhanced with telemetry: images that are reported by multiple users are added with a corresponding label, and images popularly shared without reports are added and labeled as benign. In addition, an internal moderation team has access to the training data through a labeling and moderation interface. While end users do not directly access the model or the model inference service, which are deployed to a cloud service, they can trigger a model prediction by uploading an image and then observing whether that image is flagged.
Analyzing the system structure for security threats
Once the system structure is established, an analyst systematically analyzes all components and connections for possible security threats. This is usually performed as a form of manual inspection by the development team or security specialists, usually guided by a checklist. The inspection process encourages the analyst to think like an attacker and checklists can help to cover different angles of attack. For example, the well-known STRIDE method developed at Microsoft asks reviewers to analyze every component and connection for security threats in six categories:
- Spoofing identity: Can attackers pretend to be somebody else and break authentication requirements, if any?
- Tampering with data: Can attackers modify data on disk, in a network connection, in memory, or elsewhere and break integrity requirements?
- Repudiation: Can attackers wrongly claim that they did or did not do something, breaking non-repudiation requirements?
- Information disclosure: Can attackers access information to which they are not authorized, breaking confidentiality requirements?
- Denial of service: Can attackers exhaust the resources needed to provide the service, breaking availability requirements?
- Elevation of privilege: Can attackers perform actions that they are not allowed to do, breaking authorization requirements?
Note that this analysis is applied to all components and edges of the data-flow diagram, which in systems with machine-learning components usually include various data storage components, the learning pipeline, the model inference service, some labeling infrastructure, and some telemetry, feedback, and monitoring mechanisms.
For example, in our content moderation scenario, inspecting the connection in the data-flow diagram representing the upload of images by users, we do not trust the users at all. Using the STRIDE criteria as a checklist, we identify the (possibly obvious and easy to defend against) threats of users uploading images under another user’s identity (spoofing), a malicious actor modifying the image during transfer in a man-in-the-middle attack (tampering), users claiming that they did not upload images after their accounts have been blocked for repeated violations (repudiations), users getting access to precise confidence scores of the moderation decision (information disclosure), individual users being able to overwhelm the system with large numbers of uploads of very large images (denial of service), and users remotely executing malicious code embedded in the image by exploiting a bug in the image file parser (elevation of privilege). There are many more threats even for this single connection and the same process can be repeated for all other connections and components in the data-flow diagram. For many of these threats there are obvious defenses, such as authenticating users and logging uploads, but identifying a list of threats is useful to ensure that defenses are indeed implemented and tested.
Assessing risk and designing defense mechanisms
Engineers tend to immediately discuss defense strategies for each identified threat. For most threats, there are well-known standard defense mechanisms, such as authentication, access control checks, encryption, and sandboxing. For machine-learning-specific threats, such as evasion and poisoning attacks, new kinds of defenses may be considered, such as adversarial training. Once security threats are identified, analysts and system designers judge risks and discuss and prioritize defense mechanisms. The results of threat modeling then guide the implementation and testing of security defenses in the system.
Threats are typically prioritized by judging associated risks, where risk is a combination of a likelihood of an attack occurring and the criticality in terms of damage caused when the attack succeeds. While there are specific methods, most rely on asking developers or security experts to roughly estimate the likelihood and criticality of threats on simple scales (e.g., low-medium-high or 1 to 10), to then rank threats by the product of these scores. The concrete values of these scores do not matter as long as risks are judged relative to each other.
- The likelihood of attacks is typically judged considering the goals and capabilities of expected attackers (as analyzed in Step 1): For example, we might assign high likelihood scores to attacks that require little skill, such as adding noise to an image to obfuscate violent content to the moderation system. Similarly, we might assign a high likelihood score even to attacks that require substantial resources and skills when we have reason to believe that a well-resourced and highly motivated attacker exists, such as political terrorists trying to post misinformation at scale from accounts of well-known politicians.
- The criticality is typically judged by the anticipated damage and the number of affected users. We might ask how valuable the information is we are trying to keep confidential (e.g., accessing the internal violence score of a single image, leaking passwords and phone numbers of all users), how much damage integrity violations could cause (e.g., placing a backdoor in the content-moderation model, posting an image under somebody else’s name), and how severe a loss from reduced availability would be (e.g., delaying content moderation for 20 min, crashing the entire social media site).
Prioritization is important, because we may not want to add all possible defenses to every system. Beyond the costs, defense mechanisms usually increase the technical complexity of the system and may decrease usability. For example, in our content moderation scenario, we might allow anonymous posts, but more likely would require users to sign up for an account. Beyond that, we may plan to verify user identity, such as requiring some or all users to provide a phone number or even upload a picture of their passport or other ID. When users log in, we can require two-factor authentication and time out their sessions aggressively. Each of these defenses increases technical complexity, implementation and operating cost, and lowers convenience from a user’s perspective.
In the end, developers will need to make an engineering judgment trading off the costs and inconveniences with the degree to which the defenses reduce the security risks. In some cases, software engineers, product managers, and usability experts may explicitly push back against the suggestions of security experts. Designers will most likely make different decisions about defense mechanisms for a small photo-sharing site used internally in a company, than for a large social media site, and again entirely different decisions for banking software. Ideally, as in all requirements and design discussions, developers explicitly consider the tradeoffs upfront and document design decisions. These decisions then provide the requirements for subsequent development and testing.
Implementation and testing of defense mechanisms
Once the specific requirements for defense mechanisms are identified, developers can implement and test them in a system. Many defense strategies, like authentication, encryption, and sandboxing are fairly standard. Also for ML-related defenses, common methods and tools emerge, including adversarial training and anomaly detection. Yet getting these defenses right often requires substantial expertise, beyond the skills of the typical software engineer or data scientist. It is usually a good idea to rely on well-understood and well-tested standards and libraries, such as SSL and OAuth, rather than developing novel security concepts. Furthermore, it is often worth bringing security experts into the project to consult on the design, implementation, and testing.
Designing for Security
Designing for security usually starts by adopting a security mindset, assuming that all components may be compromised at one point or another, anticipating that users may not always behave as expected, and considering all inputs to the system as potentially malicious. Threat modeling is a powerful technique to guide people to think through a system with a security mindset. This kind of mindset does not come naturally to all developers and many may actually dislike the negativity associated with it, hence training, process integration, and particularly bringing in experts are usually good strategies.
The goal of designing for security is to minimize security risks. Perfect security is usually not feasible, but the system can be defended against many kinds of attacks.
Secure design principles
A first secure design principle is to minimize the attack surface, that is, minimizing the ways that anybody can interact with the system. This includes closing ports, limiting accepted forms of inputs, but also not offering APIs or certain features in the first place. In a machine-learning context, consider whether it is necessary to make a model inference service accessible publicly, and, if an API is offered, whether to return precise confidence scores. In our content-moderation example, we likely only accept images in well-known formats and do not publicly expose the moderation APIs. However, at the same time, we cannot close functionality that is essential to the working of the system, such as uploading images, serving images to users, and collecting user reports on inappropriate images.
Next, anticipating that attackers will find a vulnerability in the system eventually to compromise a component, a common design goal is to minimize the impact of a compromised component on the rest of the system.
A core secure design principle to minimize the impact of a compromised component is the principle of least privilege, indicating that each component should be given the minimal privileges needed to fulfill its functionality. For example, the content-moderation subsystem needs access to incoming posts and some user metadata, but does not need and should not have access to the user’s phone number or payment data; the content-moderation subsystem needs to be able to add a moderation flag to posts, but does not need and should not have permissions to modify or outright delete posts. Using the principle of least privilege, each component is restricted in what it is allowed to access, typically implemented through authentication (typically public-private key authentication) and authorization mechanisms (e.g., access control lists, database permissions, firewall rules).
Another core design principle to minimize the impact of a compromised component is isolation (or compartmentalization), where components are deployed separately, minimize their interactions, and consider each other’s inputs as potentially malicious. For example, the content moderation system should not be deployed on the same server that handles login requests, such that an attacker who achieves remote code execution through the content moderation system cannot modify the login mechanism’s implementation on the shared disk to also steal passwords or log in as an administrator. Isolation is often achieved by installing different components on different machines or using sandboxing strategies for components within the same machine (these days, typically installing each component in their own container), reducing interactions between components to well-defined API calls that validate their inputs and encrypt data in transit, ideally following the least privilege design.
These days, a design that focuses on least privilege and isolation between all components in a system is popularly known as zero-trust architecture. This combines isolation, with strong mutual authentication for all components, with access control following least privilege principles, and with the general principle of never trusting any input, including inputs from other components in the same system.
Secure design with least privilege and isolation comes with costs though. The system becomes more complex, because access control now needs to be configured at a granular level and because components need to authenticate each other with extra complexity for key management. Sandboxing solutions often create runtime overhead, as do remote-procedure calls, where otherwise local calls or local file access on the same machine may have sufficed. Misconfigurations can lead to misbehavior and outages, when keys expire or components no longer have access to important inputs. Hence, in practice designers often balance simplicity with security, for example, deploying multiple components together in a “trust zone,” rather than buying into the full complexity of zero-trust architectures.
With the introduction of machine learning, all these design principles still apply. As discussed, the model inference service is typically modular and can be isolated in a straightforward fashion, but a machine-learning pipeline typically interacts with many other parts of the system and may require (read) access to many data sources within the system (see chapter Automating the ML Pipeline). In addition, special focus should be placed on the various forms of data storage, data collection, and data processing, such as who has access to training data, who can influence telemetry data, and whether access should be controlled at the level of tables or columns. When it comes to exploratory data science work and the vast amounts of unstructured data collected in data lakes, applying the least privilege principle can be tricky.
Detecting and monitoring
Anticipating that attackers may be able to break some parts of the system despite good design and strong defense mechanisms, we can invest in monitoring strategies that detect attacks as they occur and before they cause substantial damage — attacks sometimes take a long time while the attackers explore the system or while they exfiltrate large amounts of data with limited disk and network speed. For example, attackers may have succeeded in breaking into the content-moderation subsystem and can now execute arbitrary code within the container that is running the content moderation’s model inference service, but now they may try to break out of the container and access other parts of the system. If we detect unusual activity soon and alert developers or operators, we may be able to stop the attack before actual damage occurs.
Typical intrusion detection systems analyze activity on a system to detect suspicious or unusual behavior, such as running additional processes, accessing additional files, writing to files when they otherwise only read them, establishing network connections with new internal or external addresses, sending substantially more data than usual, or substantially change the output distribution of a component. Intrusion detection systems typically collect various forms of runtime telemetry from the infrastructure, such as monitoring CPU load, network traffic, process execution, and file access on the various machines or containers. They then use more or less sophisticated analysis strategies to detect unusual activities, often using machine learning itself. For example, the 2017 Equifax data breach would have been detected very early by an existing intrusion detection system that monitored network traffic, if that system had not been inactive due to an expired certificate. An intrusion detection system would also likely have an easy time detecting extra processes or network connections in our previous example of an attacker compromising the content moderation’s model inference service.
When a monitoring system identifies a potential attack, on-call developers or operators can step in and investigate (with the usual challenges of notification fatigue), but increasingly also incidence responses are automated to react more rapidly, for example, by having the deployment infrastructure automatically restart or shut down services or automatically reconfigure firewalls to isolate components.
In general, AI for Security is a large and active field with many commercial vendors of security solutions. Some detect simple anomalies in network traffic whereas others can detect subtle patterns of unusual behavior over a long period in what is known as an advanced persistent threat. Much recent research has investigated how to use advances in machine learning and artificial intelligence more broadly, including time series analysis and game theory, to detect attacks in all forms of telemetry data.
Secure coding and security analysis
While many security vulnerabilities stem from design flaws that may be best addressed with threat modeling, some vulnerabilities come from coding mistakes. Many common design and coding problems are well understood and collected in lists, such as the OWASP top 10 list of web application security risks or Mitre’s Common Weakness Enumerations (CWEs), including coding mistakes such as wrong use of cryptographic libraries, mishandling of memory allocation leading to buffer overflows, or lack of sanitation of user inputs enabling SQL injection or cross-site scripting attacks.
Many of these mistakes can be avoided with better education that sensitizes developers for security issues, with using safer languages or libraries that systematically exclude certain problems (e.g., using memory-safe languages, using only approved crypto APIs), with code reviews and audits to look for problematic code, and with static analysis and automated testing tools that detect common vulnerabilities. For many decades now, researchers and companies have developed many tools to find security flaws in code, and recently many have used machine learning to find (potential) problematic coding patterns. Recently, many automated security analysis tools that can be executed during continuous integration have been marketed under the label DevSecOps.
Process integration
Security practices are only effective if developers actually take them seriously and perform them. Security practices like other quality assurance practices and responsible engineering practices need to be integrated into the software development process. This can include (1) using security checklists during requirements elicitation, (2) mandatory threat modeling as part of the system design, (3) code reviews for every code change, (4) automated static analysis and automated fuzz testing during continuous integration, (5) audits and manual penetration testing by experts before releases, and (6) establishing incident response plans, among many others. Microsoft’s Security Development Lifecycles provide a comprehensive discussion of common security practices throughout the entire software development process.
As usual, buy-in and a culture that takes security concerns seriously are needed, in which managers plan time for security work, where security contributions are valued, where developers call on and listen to security experts, and where developers do not simply skip security steps when under time pressure. As with other quality assurance work, activities can be made mandatory in the development process if they are automated as part of actions, such as, executed automatically on every commit during continuous integration, or when they are required to pass certain process steps, such as infrastructure refusing to merge code with security warnings from static analysis tools, or DevOps pipelines only deploying code after sign off from security expert.
Data Privacy
Privacy refers to the ability of an individual or group to control what information about them is shared or not shared and how shared information may be used. Privacy gives users a choice in deciding whether and how to express themselves. In the US, discussions going back to 1890 frame privacy as the “right to be let alone.” In software systems, privacy typically relates to users choosing what information to share with the software system and deciding how the system can use that information. Examples include users deciding whether to share their real name and phone number with the social image-sharing site, controlling that only friends may view posted pictures, and agreeing that the site may share those pictures with advertisers and use them to train the content moderation model. Many jurisdictions codify some degrees of privacy as a right, that is, users must retain certain choices regarding what information is shared and how it is used. In practice, software systems often request broad permissions from users by asking or requiring them to agree to privacy policies as a condition of using the system.
Privacy is related to security, but not the same. Privacy relates to whether and how information is shared and security is needed to ensure that the information is only used as intended. For example, security defenses such as access control and data encryption help ensure that the information is not read and used by unauthorized actors, beyond the access allowed by privacy policies. While security is required to achieve privacy, it is not sufficient: A system can break privacy promises through their own actions without attackers breaking security defenses to reveal confidential information, for example, when the social image-sharing site sells not only images but also the users’ phone number and location data to advertisers without the users’ consent.
Privacy threats from machine learning
Machine learning is powerful at predicting information from innocently looking data that was shared for another purpose. For example, machine learning algorithms can predict that somebody is likely pregnant from their shopping history or may predict likely age, gender, race, and political leaning from a few search queries or posts on our social image-sharing site. Although humans can make similar predictions with enough effort and attention, outside the realm of detective stories and highly specialized and well-resourced analysts, it is rather rare for somebody to invest the effort to study correlations and manually go through vast amounts of data integrated from multiple sources. In contrast, big data and machine learning allow these kinds of predictions cheaply, fully automated, and at an unprecedented scale. Companies’ tendencies to aggregate massive amounts of data and integrate data from various sources, shared originally intentionally or unintentionally for many different purposes, further push the power of predicting information that was intended to be private. Users rarely have a good understanding of what can be learned indirectly from the data that they share.
In addition, as we have discussed above, machine learning can make it very challenging to keep data confidential, thus opening the system to new security threats: Data is stored in additional places and processed by different processes, all of which may be attacked, and models can memorize training data and model inversion attacks can try to extract training data.
The value of data
In a world of big data and machine learning, data has value as it enables building new prediction capabilities that produce valuable features for companies and users, such as automating tedious content-moderation tasks or recommending interesting content (see also the business case for machine learning in chapter When to use Machine Learning). From an organization’s perspective, more data is usually better as it enables learning better models and more models.
As such, organizations generally have an incentive to collect as much data as possible and downplay privacy concerns. For example, Facebook for a long time has tried to push a narrative that social norms are changing and privacy is becoming less important. The whole idea behind data lakes (see chapter Scaling the System) is to collect all data on the off chance that it may be useful someday. Access to data can be an essential competitive advantage, and many business models such as targeted advertising and real-time traffic routing are only possible with access to data. This is all amplified through the machine-learning flywheel (discussed in chapter Quality Assurance in Production) in which companies with more data can build products with better models, attracting more users, which allows them to collect yet more data further improving their models.
Beyond benefiting individual corporations, arguably also society at large can benefit from access to data. With access to healthcare data at scale, researchers have improved healthcare in terms of health monitoring, diagnostics, drug discovery, and many others. For example, during the early days of the Covid 19 pandemic, apps collecting location profiles helped with contact tracing and understanding the spread of the disease. Conversely, law enforcement often complains about privacy controls, when it restricts them from investigating crime, such as in requests to unlock private data on phones.
Overall, data and machine learning can provide great utility to individuals, corporations, and society, but unrestrained collection and use of data can enable abuse, monopolistic behavior, and harm. There is constant tension between users who prefer to keep information private and organizations who want to benefit from the value of that information. In many cases, users may face an uphill battle against organizations who freely collect large amounts of data.
Privacy policies, privacy controls, and consent
A privacy policy is a document that explains what information is gathered by a software system and how the collected information may be used and shared. In a way, it is public-facing documentation of privacy decisions in the system. Ideally, a privacy policy allows a user to deliberate whether to use a service and agree to the outlined data gathering, processing, and sharing rules. Beyond broad privacy policies, a system may also give user privacy controls where they can make more fine-grained decisions about how their data is used, for example, considering who may see shared pictures or whether to share the user’s content with advertisers to receive better-targeted advertisement.
In many jurisdictions, any service collecting personally identifiable information must post privacy policies, regulators may impose penalties for violations, and privacy policies may become part of legal contracts. In some regulated domains, such as health care and education, regulation may further restrict possible policies. In some jurisdictions, service providers must offer certain privacy controls, for example, allowing users to opt out of sharing their data with third parties.
The effectiveness of privacy policies as a mechanism for informed consent can be questioned and the power dynamics involved usually favor the providers of the service. Privacy policies are often long legalistic documents that few users read. Even if they were to read them, users usually have only a basic choice between fully agreeing to the policy as is or not using the service at all — in some cases after having already paid for the product, such as only being able to use a new smartphone after agreeing to its software’s privacy policies. If there are multiple similar competing services, users may decide to use those with the more favorable privacy policies (though studies show that they do not), but in many settings it may be hard to avoid services with near monopoly status in the first place, including social media sites, online shopping sites, and news sites. In practice, many users become accustomed to simply checking the ubiquitous “I agree” checkbox agreeing to whatever teams companies set.
Overall, there are more recent attempts at regulating privacy than regulating many other areas of responsible engineering, such as fairness and transparency. In addition, some of the recent privacy laws, such as GDPR in the European Union, threaten substantial penalties for violations that developers start to take very seriously. However, beyond basic compliance with the minimum stipulations of the law, we again have to rely on responsible engineers to limit data gathering and sharing to what is necessary, to transparently communicate privacy policies, and to provide meaningful privacy controls with sensible defaults.
Designing for Privacy
A design for privacy typically starts by minimizing data gathering in the first place. Privacy-conscientious engineers should be deliberate about what data is needed for a service and avoid gathering unnecessary private information.
If data needs to be collected for the functioning of the service or its underlying business model, ideally the service is transparent with clear privacy policies explaining what data is collected and why, giving users a clear choice about whether to use the service or specific functionality within the service. Giving users privacy controls to decide what data may be gathered and shared and how at a finer granularity, for example, at the level of individual profile attributes or individual posts, can give users more agency compared to blanket privacy policies.
When data is stored and aggregated for training ML models, consider removing identifying information or sensitive attributes. However, data anonymization is notoriously difficult as machine learning is good at inferring missing data. More recently, lots of research has explored formal privacy guarantees (e.g., differential privacy) that have seen some adoption in practice. Also, federated learning is often positioned as a privacy-preserving way to learn over private data, where some incremental learning is performed locally and only model updates are shared with others, but not the possibly sensitive training data.
Systematically tracking provenance (as discussed in chapter Versioning, Provenance, and Reproducibility) is useful to identify how data flows within the system. For example, it can be used to trace that private posts that can be used for training content moderation (as per privacy policy) are not also used in models or data shared with advertisers. Provenance becomes particularly important when giving users the opportunity to remove their data (as required by law in some jurisdictions) to then also update downstream datasets and models.
Ensuring the security of the system with basic defenses such as encryption, authentication, and provenance tracking can help ensure that private data is not leaked accidentally to attackers. It is also worth observing developments regarding model inversion attacks and deploying defenses if such attacks become a real concern.
Privacy is complicated and privacy risks can be difficult to assess when risks emerge from poorly understood data flows within a system, from aggregating data from different sources, from inferences made with machine-learned models, or from poor security defenses in the system. It can be valuable to bring in privacy experts with technical and legal expertise who are familiar with the evolving discourse and state of the art in the field to review policies and perform a system audit. As usual, consulting experts early helps avoid design mistakes in the first place over patching problems later.
Summary
Securing a software system against malicious actors is always difficult and machine learning introduces new challenges. In addition to new kinds of attacks, such as evasion attacks, poisoning attacks, and model inversion attacks, there are also many interdependent parts with data flowing through the system for training, inference, and telemetry. Traditional defenses, such as encryption and access control, remain important and threat modeling is still likely the best approach to understanding the security needs of a system, combined with secure design principles, monitoring, and secure coding in the implementation.
With the value of data as inputs for machine learning, privacy can appear as an inconvenience giving users the choice of not sharing data. Privacy policies describe the gathering and handling of private data and can in theory support informed consent, but may in practice have only limited effects. Privacy regulation is evolving though and is curbing some data collection in some jurisdictions, giving users more control over their own data.
Responsible engineers will care both about security and privacy in their system, establishing controls through careful design and quality assurance. Given the complexity of both fields, most teams should consider bringing in security and privacy experts into a project at least for some phases of the project.
Further readings
- Classic non-ML books on software security at the code level and considering the entire development process, both including a discussion of threat modeling: 🕮 Howard, Michael, and David LeBlanc. Writing secure code. Pearson Education, 2003. and 🕮 Howard, Michael, and Steve Lipner. The security development lifecycle. Vol. 8. Redmond: Microsoft Press, 2006.
- Broad and early introduction to security and privacy concerns in machine learning with good running examples: 🗎 Huang, Ling, Anthony D. Joseph, Blaine Nelson, Benjamin IP Rubinstein, and J. Doug Tygar. “Adversarial machine learning.” In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 43–58. 2011.
- Systematic list of possible security problems in corresponding defenses in systems with ML components, considering the entire architecture of the system beyond the model, which could provide a useful checking during threat modeling: 🗎 McGraw, Gary, Harold Figueroa, Victor Shepardson, and Richie Bonett. “An architectural risk analysis of machine learning systems: Toward more secure machine learning.” Technical report, Berryville Institute of Machine Learning, v 1.0 (2020).
- Overview of security threats and defenses for machine learning at the model level: 🗎 Liu, Qiang, Pan Li, Wentao Zhao, Wei Cai, Shui Yu, and Victor CM Leung. “A survey on security threats and defensive techniques of machine learning: A data driven view.” IEEE access 6 (2018): 12103–12117.
- Many technical papers illustrating specific attacks against ML models: 🗎 Papernot, Nicolas, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. “Practical black-box attacks against machine learning.” In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp. 506–519. 2017. 🗎 Fredrikson, Matt, Somesh Jha, and Thomas Ristenpart. “Model inversion attacks that exploit confidence information and basic countermeasures.” In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp. 1322–1333. 2015. 🗎 Eykholt, Kevin, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. “Robust physical-world attacks on deep learning visual classification.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1625–1634. 2018. 🗎 Shafahi, Ali, W. Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Goldstein. “Poison frogs! targeted clean-label poisoning attacks on neural networks.” Advances in neural information processing systems 31 (2018). 🗎 Koh, Pang Wei, Jacob Steinhardt, and Percy Liang. “Stronger data poisoning attacks break data sanitization defenses.” Machine Learning 111, no. 1 (2022): 1–47.
As all chapters, this text is released under Creative Commons 4.0 BY-SA license.