When to use Machine Learning
This post covers some content from the “Goals and Success Measures” lectures of our Machine Learning in Production course. For other chapters see the table of content.
Machine learning is now commonly used in software products of all kinds. Not in all cases, machine learning is appropriate. In some cases, adoption of machine learning seems more driven by interest in the technology or marketing, rather than a real need to solve the specific problem. However, machine learning brings a lot of costs and risks to a system, which can be avoided if easier options suffice. To better make deliberate decisions when to use machine learning and when not, we will first explore for which kind of problems machine learning is suitable, both at a technology and business level.
As a running example, we will discuss personalized music recommendations, such as Spotify automatically curating personalized playlists for each subscriber.
Problems that Benefit from Machine Learning
With the hype around machine learning, machine learning can seem as a tempting solution for many problems, but it is not always appropriate. Machine learning involves substantial complexity, cost, and risk, as we discuss throughout this book. Among others, substantial expertise is needed to build and deploy models, those models are fundamentally unreliable and may make mistakes, and hence substantial effort is needed to evaluate suitability and mitigate mistakes.
In general, if it is possible to avoid using machine learning and use hand-coded algorithms instead, it is very often a good idea to do so. However, there are certain classes of problems for which machine learning seems worth the effort.
Intrinsically hard problems: For problems for which we simply do not know how to solve them programmatically, machine learning can discover patterns and strategies to solve them regardless. This is particularly common for tasks that mirror human perception, such as natural language understanding or identifying speech in audio and objects in images, but also predicting music preferences. These tasks are complex and we usually do not really understand how they work. Pre-machine-learning attempts usually have made very limited progress for such tasks. Machine learning may not work for all hard problems, but it can be worth a try, and many amazing machine-learning achievements have shown the potential.
Big problems: For some problems hand-crafting a solution might be possible but it would be so complex and large that maintaining it manually becomes infeasible. Resolving conflicts between multiple rules and handling exceptions can be especially tedious. For example, we might attempt to manually encode music recommendations, but there are so many artists and tracks that rule-based heuristics might become unmaintainable; similarly manually curating indexes of websites has been tried in the early days of the internet, but it simply did not scale, opening the door for (ML-based) search instead. In such cases, it might be easier to automate a solution that learns rules than to write them manually.
Time-changing problems: For problems where the inputs and solutions change frequently, machine learning may be able to more easily keep up if suitable data is available. For example, in music recommendations, individual preferences and music trends change over time, as does what music is available in the first place as new music is released. Change is constant and even if we had hardcoded recommendation rules it would be tedious to update them regularly. In such cases, it may be easier to build a (complex) ML-enabled system that can automatically update itself.
Tolerating Mistakes and ML Risk
Machine-learned models are essentially unreliable functions that often work, but that sometimes may make mistakes — it can even be hard to even define what it would mean for them to be always correct (see chapter Model quality: Defining correctness and fit). Hence it is important to consider whether occasional mistakes are acceptable for the problem where machine learning is considered.
In some settings, mistakes may simply be acceptable. For example, music recommendations are not critical and subscribers can likely tolerate poor recommendations and benefit from recommendations even if not all are equally good. However, also seemingly harmless predictions can cause harms through discrimination if they are systematically wrong, such as the music recommendation model not serving hispanic subscribers or never recommending LGBTQ+ artists, as we will discuss in chapter Fairness.
In many other settings, mistakes could potentially cause harm, but those harms can be mitigated to reduce risk to an acceptable level. The worst consequences from wrong cancer prognosis in radiology images can be mitigated if radiologists oversee the process, or if a biopsy is scheduled before surgery as a non-ML confirmation — the biopsy causes some harm but less than just acting on an unquestioned wrong prediction. Mitigating mistakes requires careful design and evaluation of the system, as we will discuss throughout multiple chapters, especially chapters Planning for Mistakes and Safety. In some cases mitigations may be costly and undermine the benefits of the system; in some cases, we may simply decide that we cannot build the system safely and should not build it at all. In the end, system designers need to carefully weigh the benefits of the system with the costs of its mistakes and the overall risks it poses.
If we have a clear specification and correctness of a component is essential, machine learning is not a good match — we should just implement and evaluate the component using traditional non-ML techniques. For example, we would not want to use machine-learned components when tabulating information in accounting systems or when transmitting control signals in an airplane, where we know exactly what the correct behavior is and behaving correctly is very important.
Continuous Learning
We only can use machine learning if we have data to train (and evaluate) the model for the task. Getting data of sufficient quantity and quality can be a substantial bottleneck and cost driver in a project, especially training models for hard problems often requires large amounts of data.
Machine learning can be particularly effective in settings, where we have continuous access to data and can improve and update the model over time. As mentioned in the Introduction, many systems can benefit from observing users over time to collect more data to build better models (which then may attract more users, producing even more data — known as the machine learning flywheel). For example, our music streaming service may monitor which subscribers play which recommendations and which recommendations they skip to improve their profiles for individual subscribers and recommendations overall.
For time-changing problems continuous learning is essential to keep up. Here, we continuously need fresh training data to regularly re-train the model. Ideally, such data is automatically collected (see chapter Quality Assurance in Production). For example, our music streaming service might observe changing trends over time and adopt deliberate strategies to recommend new releases initially to learn which subscribers like them to improve recommendations for everybody else.
If we do not have access to sufficient training data or no mechanism to observe data continuously in time-changing problems, it is likely not feasible to build a machine-learning-based solution.
Costs and Benefits
In the end, a decision whether to use machine learning for a problem comes down to a comparison of costs and benefits. The machine-learning components need to provide concrete benefits to the system that offset the (often substantial) costs for building and operating them. Costs can be substantial for the initial data acquisition, model building, and deployment, and in addition a machine-learning component can also create substantial cost during operations, for example, with substantial hardware and energy needs to serve the model and when humans need to oversee operations and intervene in case of model mistakes. On the other hand, also benefits can be substantial, especially when developing breakthrough capabilities that can dominate market segments or create new market segments, such as the success of TikTok largely attributed to its recommendation models.
Both benefits and costs can be difficult to measure or even estimate well. On the benefits side, for example, it can be difficult to quantify how much the music recommendations contribute to attracting (or keeping) subscribers to the streaming service. On the cost side, it is notoriously difficult to estimate development and operating cost before building the system (usually estimates are way too optimistic); also quantifying potential harm and risk from wrong predictions or systematic bias is very challenging. In many cases, startup creators simply bet big and hope that their machine-learning innovations will bring huge future payoffs. We will return to measurement in the next chapter.
Generally, system designers should always have an open mind and explore whether machine learning is actually needed and cost effective. It may be sufficient to use a simple heuristic instead, which may be less accurate and hence have fewer benefits, but also much lower cost. To this end, we can consider simple heuristic solutions (hand coded rules that approximate the solution), such as simply recommending the most popular songs on the music streaming platform by artists the subscriber has listened to before. We might also consider a simple semi-manual solution involving a few humans working together with the system, for example, asking a few experts to manually curate 20 playlists and recommend the one that most overlaps with the subscriber’s recent listens. Even if the non-ML solution does not perform well enough to use it in production, it provides a great baseline for evaluations (see chapter Model quality: Measuring prediction accuracy). Finally, we can consider the system without the feature, for example, music streaming without personalized recommendations — or even not building the entire system, for example abandoning attempts for predictive policing when we estimate that it produces more harm to society than actual benefits (see also chapter Fairness).
The Business Case: Machine Learning as Predictions
The book Prediction Machines by Agrawal et al. provides a great discussion of machine learning from a business perspective. In a nutshell, it discusses machine learning primarily as a way to make predictions, which are used as an input for manual or automated decisions. This framing can be very useful to understand the role of machine learning in a business context.
At a high level, machine learning is primarily used to make predictions when there is no clear algorithm to compute a result. A machine-learned model could be predicting what and how much customers will buy, predicting how much traffic there will be on a road, predicting what text is spoken in an audio snippet, or, in our running example, predicting what music a subscriber likes. The term “prediction” implies a best effort approach to reduce uncertainty that makes use of past data but is not necessarily correct — which fits machine learning characteristics well. With machine learning, we often improve predictions that human experts make, thus providing more accurate predictions at lower cost.
Predictions are critical inputs for decision making. Having more, faster, and more accurate predictions often helps to make better decisions. However, predictions alone are not sufficient, we still need to interpret the predictions to make decisions — this requires judgment. Judgment is fundamentally about trading off the relative benefits and costs of decisions and their outcomes, making decisions about risk. For example, whether a physician should start treatment after a prediction that a radiology image contains signs of cancer requires weighing the costs of false positives and false negatives with the benefits of correctly detecting cancer at an early stage. Judgment of value to make decisions is often left to humans (“human in the loop”), but it is also possible to learn to predict human judgment with enough data, for example, by observing how doctors react to cancer predictions. In the case of music recommendations, the subscribers make decisions about what to listen to, though we could also envision a system that automatically decides what music to play and when to play it. Automating judgment makes the step toward full automation, where the system itself acts on predictions to maximize some goal.
From a business perspective then, machine learning vastly reduces the cost of predictions and often improves their accuracy, compared to prior approaches such as predictions made by human experts. Higher accuracy and lower cost of predictions may allow us to use cheaper and more predictions for traditional tasks: For example, cab drivers in London invested 3 years to learn streets and predict the fastest route, but then navigation tools got close or better at marginal costs, hence now highly trained cab drivers compete with drivers that rely entirely on navigation tools as predictions for navigating are no longer a scarce commodity. Lower cost of predictions also allows us to use predictions for new applications at scale where it would have previously been cost prohibitive to rely on humans, such as curating personalized music recommendations in our running example. Cheap and accurate predictions can entirely business models and enable transformative novel business strategies, such as the book’s example of a shop proactively sending customer’s products of which a model predicts that the customer’s will like them (relying on accurate predictions at scale, even at the cost of paying for return shipping of unwanted items from wrong predictions). Having access to more, cheaper, and more accurate predictions can be a distinct economic advantage.
Automation is desirable but not necessary to benefit from cheap predictions in a business context. Even when humans are still making decisions, they now benefit from more, more accurate, and faster predictions as inputs, for example, physicians having to wait less for insights from radiology images when making cancer treatment decisions.
When identifying opportunities for where machine learning can provide benefits in an organization, we need to identify what existing or new tasks use predictions or could benefit from predictions. For each opportunity, we can then analyze the nature of the predictions and how they contribute to the task and what benefit we could gain from cheaper, faster, or better predictions. We can then explore to what degree humans previously doing the tasks can be supported or replaced with partial or full automation of predictions and decisions. We would then focus attention where the return on investment is highest, considering costs and benefits as discussed above.
Summary
Avoid using machine learning when it is not really needed. It introduces complexity and failure modes that are best avoided if possible. However, when problems are hard, big, and time-changing, machine learning may provide a solution, as long as the solution can tolerate or mitigate risks from wrong predictions, data is available, and the benefits outweigh the costs. To identify opportunities where machine learning can provide business opportunities, it can also be instructive to think of machine learning as a mechanism to provide cheaper predictions, which in turn can help to make better decisions and even potentially automate them.
Further Reading
- Book chapter discussing when to use machine learning: 🕮 Hulten, Geoff. “Building Intelligent Systems: A Guide to Machine Learning Engineering.” (2018), Chapter 2 (Knowing when to use Intelligent Systems)
- Excellent book discussing the business case of machine learning: 🕮 Ajay Agrawal, Joshua Gans, Avi Goldfarb. “Prediction Machines: The Simple Economics of Artificial Intelligence” Harvard Business Review Press, 2018
- Frequently shared blog post cautioning (among others) to adopt machine learning only when needed: 📰 Zinkevich, Martin. “Rules of machine learning: Best practices for ML engineering.” Google Blog (2017).