Thinking like a Software Architect

20 min readFeb 14, 2022

This chapter covers content from the “Software Architecture of ML-enabled Systems” lecture of our Machine Learning in Production course. For other chapters see the table of content.

Investing in some planning before diving into implementation details helps to build systems that better meet requirements, especially when it comes to qualities such as scalability, usability, robustness, safety, and security. Software architecture is the part of the design process that focuses on the most important decisions early that are important for achieving the key quality requirements of a system (e.g., scalability, flexibility). Architectural design decisions are fundamental decisions that are hard to change later without a full redesign of the system. Yet those decisions are often difficult to make, requiring a good understanding of requirements, of consequences of different designs, and of the tradeoffs involved. Experienced software architects think about system qualities and system designs and know what questions to ask early and what information to gather with prototypes or experiments.

Architectural design is not without cost though. In the software engineering community, there is a well documented tension between architecture and agility. On the one hand, planning and architecture help to avoid costly mistakes and build the system with the right qualities that can be evolved later. On the other hand, in a race to a minimal viable product, too much architecture may slow down the project as it invests into qualities that are not relevant until the project is truly successful, at which point more resources for a serious redesign may be available.

Experienced software engineers and software architects can help navigate this space, focusing on the key qualities early while giving enough flexibility for change later. This skill is equally or even more important when introducing machine learning with many additional design decisions and new quality requirements into software systems.

Beyond early decision making, architectural planning has many benefits for the system and the team developing it:

Architecture aids in communication by identifying key abstractions and concerns. It helps to ask the right questions about achieving key quality requirements. It raises communication to a higher level of abstraction that helps to communicate between different stakeholders.
Architecture constraints the implementation. For example, it may set key structures within which the rest of the system is later designed, such as specifying a microservice architecture with built-in monitoring tooling for all remote procedure calls. It helps with prototyping by possibly providing a skeleton for the implementation early.
Architecture dictates the organizational structure by identifying key components of the system and assigning responsibilities for these components to different teams. This also helps with predicting cost, quality, and scheduling, with predictions broken down for individual components.

Quality Requirements Drive Architecture Design

The key insight in software architecture is to seriously consider a system’s quality requirements early on and design the system to support those qualities. In the design process, we often need to collect additional information to make informed decisions and we will realize that we cannot equally achieve all desired requirements. Almost always, we will not be able to meet all requirements optimistically envisioned by the customer or product owner fully, but will need to make hard tradeoff decisions. However, by deliberately considering alternatives in a design stage we can deliberate about tradeoffs and make deliberate choices, rather than locking in tradeoffs from ad-hoc implementation decisions. Given how difficult to change some early decisions are (often requiring a full redesign), it is risky to make those decisions based on ad-hoc implementation choices without considering alternatives and the entire system.

*Architecture is the binding element that guides the implementation to meet the (quality) requirements of the system.*

When designing the system, we have hopefully already identified important quality requirements for the system; if not, design time is a good opportunity to reflect again on the relative importance of various qualities. Typical qualities include availability and scalability, development cost and time to release, modifiability, response time and throughput and operating cost, security and safety, and usability. With machine-learning, common qualities of concerns are accuracy, training and inference costs and latency, reproducibility and data provenance, fairness, and explainability (more details in the next chapter).

The key of architectural design is to make important decisions that foster the key quality goals. For example, if availability is important, developers should consider redundancy and mechanisms to detect and resolve outages early, even if it increases development and operating costs; if getting out to market quickly is more important, it is likely worth intentionally building on existing and simple infrastructure even if it may inhibit scalability later. As we will discuss, many desired qualities have implications on the kind of machine learning techniques that are feasible in a project.

Twitter Case Study

For an illustrative (non-ML) example of the role of software architecture, consider the complete redesign of Twitter in 2011–2012: Twitter was originally designed as a monolithic database-backed web application, written in Ruby on Rails by three friends. Once Twitter became popular, it became slow and hard to scale. Developers introduced caches throughout the application and bought many machines to keep up with load, but could not handle spikes in traffic. After the system was already built and fundamental design decisions were made (e.g., monolithic code base, single database, codebase in Ruby), it was hard to change the system to increase performance. Worse, changes that marginally improved performance often made the system harder to maintain and debug, and it became increasingly harder to fix bugs or implement new features. Since scaling was mostly achieved by buying more hardware, the company was paying a large amount of money on operating costs, which was not sustainable for the business.

After 2010, Twitter decided to step back and redesign the entire system which led to a complete redesign and new implementation (this is fairly rare and typically a last resort for most companies). For the redesign, they explicitly considered four primary quality goals: (1) improve latency and operating cost, (2) improve reliability by isolating failures, (3) improve maintainability with clearer boundaries between modules, and (4) improve modifiability to allow small teams to quickly release new features.

None of these goals could be achieved with the existing system design. Instead a completely new system structure was designed from scratch. Instead of a monolithic system (one process running all functionality), a microservice architecture was designed (distributed system where each functionality is isolated in separate processes that can be independently scaled); Ruby on Rails was replaced with Scala to improve performance; a completely new storage solution was designed that avoided a single bottleneck for writing tweets; reliability and scalability strategies such as automated failover, monitoring, and load balancing were built into the infrastructure used for all remote procedure calls. While this redesign was certainly expensive, it served all four quality goals much better than the previous system. The new system was more complex (inherent in distributed systems) and more costly to develop, but this was deemed a necessary tradeoff for achieving the primary four quality goals.

Notice how the key architecture decisions (microservices, monitoring, data storage strategy) affect the entire system, not just individual modules. All key decisions are driven by explicit quality goals and provide a scaffolding for the design of the rest of the application and its modules. Architectural decisions were deliberated carefully, considering tradeoffs between alternative designs, accepting certain drawbacks for achieving the primary quality goals. Also notice that the quality goals of the systems have changed since it’s inception — when Twitter was first started, scalability was likely less important than releasing a prototype quickly and gaining venture funding and users, so the monolithic Ruby application may have been appropriate at a time, just not future proof given how difficult to change architectural decisions are later.

Architectural Views

Software architecture tends to focus on the big important decisions, while abstracting less important details. To reason about different qualities, different abstractions are appropriate.

To illustrate different views of a system, consider the following three maps of Pittsburgh. They are all different abstractions that represent specific aspects of the same real-world city. They show neighborhoods, cycle paths, or tourist attractions, each useful for reasoning about the city of Pittsburgh in different ways for different goals. For example, when trying to find a good cycling path from Downtown to Carnegie Mellon University one of those maps is obviously much more useful than the others, but that map is fairly useless when trying to identify the names and boundaries of neighborhoods. Each map abstracts some details, e.g., streets or location names, and focuses on others that are relevant for a specific task.

*Neighborhoods of Pittsburgh by* *Andrew Somerville*

*Map of Downtown Pittsburgh (CC BY-SA 1.0,* *PerryPlanet*)

In line with the maps analogy, software architecture discussions focus on specific abstractions of a system. For example, they may focus on processes, how they exchange messages, and the timing involved, but ignore internals of the processes — this can be useful for reasoning about bottlenecks in system performance; others may focus on the structure of the system in terms of modules and plugin interfaces to focus on extensibility; yet others may focus on the deployment across different networks to reason about trust boundaries and security. In each case, we would typically collect information that is relevant to reason about a specific quality in that abstraction, for example, measure performance behavior, gather information about needed extensions, or collect information about network topology and existing security measures. Software architects often draw diagrams, often using informal purpose-specific notations, but it is not necessary to visualize abstractions graphically.

Below is an architectural diagram, which depicts the various different machine learned models in a self-driving car system and how they exchange information and contribute to decisions. This model abstracts all kinds of other details, but allows us to reason about which models build on outputs of other models — which might be useful for debugging or for reasoning about feedback loops. This abstraction does not have enough information to reason about real-time properties or capacities of various CPUs and GPUs on the car; for this other models could be designed and corresponding information could be collected.

Architecture of the Apollo self driving car system depicting the various ML models and how they exchange information. From Peng, Zi, Jinqiu Yang, Tse-Hsun Chen, and Lei Ma. “A first look at the integration of machine learning models in complex autonomous driving systems: a case study on Apollo.” In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1240–1250. 2020.

Decomposition, Interfaces, and Responsibility Assignment

A key decision that needs to be made early in the development process is how to decompose the system, such that different teams can work on different components of the project (see also the process and teamwork chapters). At this stage, early in the design and architecture process, ideally all major components and their interactions are identified, components and their interfaces are described, and responsibilities are assigned. Of course many specifics can be changed and renegotiated later, but a careful architectural design can provide a strong foundation that supports the desired quality requirements and guides the implementation.

A component’s interface plays a key role in collaboration between teams building a system as a negotiation and collaboration point. The interface is where teams (re-)negotiate how to divide work and assign responsibilities. Team members often seek information that may not be captured in interface descriptions, as interfaces are rarely fully specified. In an idealized development process, interfaces are defined early based on what is assumed to remain stable, because changes to interfaces later are expensive and require the involvement of multiple teams.

As discussed in the previous chapter, machine learning introduces a number of additional components and interfaces. For example, we may need to negotiate and document, as part of an interface, who is responsible for data quality — (a) the component providing the data or (b) the machine-learning pipeline consuming it. While the machine-learned model is relatively straightforward to integrate as an isolated model inference component with a clear interface, the machine-learning pipeline may interact with other components in more intricate ways as we will discuss.

Planning Evolution and Deployment

Even though architectural planning is sometimes derided for too much upfront investment that assumes stable requirements and cannot react to change with enough agility, proper architectural planning can actually prepare for changes and make it easier to evolve the system. Anticipating what parts of the system are more likely to change or what qualities will become more important in the future allows it to design the system such that those changes are easier when needed. For example, anticipating the need for future extensions in a web browser can encourage developers to design the system with extension points or even a plugin system; anticipating possibly running the web browser on low memory embedded devices in the future may encourage to modularize the memory-hungry rendering unit to enable swapping it out easily later. Anticipating and encapsulating change such that future changes can happen locally without affecting the rest of the system is the core idea behind information hiding. It isolates change to individual teams and reduces disruptive ripple effects across the entire system implementation.

In addition, designing the system such that it can be updated easily and frequently with confidence in the updates will lower barriers to change and increase confidence in deployments. Design strategies typically include automating testing, virtualizing software in containers, automating deployments, and monitoring systems. Much of this is discussed under the terms continuous delivery and DevOps.

With the introduction of machine learning in software systems, we also expect substantial future change in many systems. Most systems will want to anticipate the constant need to update models and machine-learning pipelines, for example, (1) to fix common mistakes the model makes, (2) to react to new ideas for feature engineering or hyperparameter selection, (3) to incorporate new machine-learning technologies, (4) to train with new data, or (5) to accomodate shifting quality requirements (e.g., latency, explainability, fairness). Therefore, it is prudent to consider early on how to design the system to continuously learn with new data and learn from feedback, to deploy updates with confidence, and to allow rapid experimentation and rapid reaction to changes in data or user behavior, all without disrupting system development and operation.

In a machine learning context, designing for evolution and experimentation with frequent deployments there are many well understood and also emerging practices that we will discuss, including deploying machine-learned models as independent services, building robust machine-learning pipelines, model and infrastructure testing, testing in production including canary releases and A/B testing, versioning of data and models, and system and model monitoring. In line with and building on DevOps work in traditional non-ML systems, much infrastructure for developing ML components with this flexibility and automation is now developed under the term MLOps.

Codifying Design Knowledge

Designers and architects typically accumulate tacit and codified knowledge based on their own experience and best practices shared in the community. They might know that there are three common designs for a specific problem, such as how to detect when a system malfunctions, and what the tradeoffs between those choices are. This way, when approaching a new system, they do not need to start from scratch, but can start with a design vocabulary and focus on the specific qualities relevant for the tradeoffs.

Common System Structures

At the highest level of organizing components in a system architecture, there are common system structures that will appear repeatedly throughout the next chapters. Understanding such common system structures is useful for considerations of how to fit machine learning components into a system. The most common system structures include:

Client-server architecture. Computation is split between multiple machines. A server provides functionality to multiple clients, typically over a network connection. This way, computational resources can be shared centrally for many users, whereas clients can remain relatively simple. The client invokes communication with the server.

Multi-tier architecture. Computation and data storage is organized into multiple layers or tiers (on the same or different machines): Clients make requests to servers, which make requests to other servers, and so forth. Higher tiers send requests to lower tiers, but not the other way around. The most common structure is a three-tier architecture with a presentation tier, a logic tier, and a data tier — the data tier manages data storage; the logic tier implements business logic, for example processing transactions that request or change data; the presentation tier provides the user interface through which clients interact with the system. This design separates concerns regarding user interface from how requests are processed from how data is stored. This structure is common for business and web applications and can be conceptually extended with components related to machine learning.

*Typical representation of a 3-tier architecture. Adapted from* Haruki Yokoyama.

Service-oriented architecture (incl. microservices). A system is organized into multiple self-contained services (processes) that call other services other through remote procedure calls. The services are not necessarily organized into layers and typically each service represents a cohesive piece of functionality and is responsible for its own data storage. This design allows independent deployment, versioning, and scaling of services and flexible routing of requests at the network level. Many modern scalable web-based systems use this design.

Sketched architecture of a Audible-like media purchasing and streaming service, composed of many (micro-)services each with a specific and narrow focus. Each service stores its own data. To increase throughput and reliability, multiple instances per service can be offered with partitioned or shared data storage. Adapted from a figure from *Christopher Meiklejohn*.

Event-based architecture. Individual components of a system listen to messages broadcasted by other components, typically through some message bus. Since the component publishing a message does not need to know who consumes it, this architecture strongly decouples components in a system and makes it easy to add new components. Many robotics systems follow this design, processing components subscribing to messages published from sensor readings.

Multiple processes producing and consuming messages on three topics. Most processes have multiple instances that process messages in parallel. Several processes consume messages one one topic and produce messages on a different topic.

Data-flow architectures. The system is organized around data, often in a sequential pipeline, where data produced by one component is used as input by the next component. This design allows flexible changes of components and flexible composition of pipelines from different subsets of components. Unix shell commands can be composed through pipes to perform more complex tasks and machine-learning pipelines often follow this design of multiple transformations to a dataset arranged in a sequence or directed acyclic graph. Batch processing systems scale these kinds of designs to very large datasets.

*Simple sketch of a dataflow program illustrated with shell commands. Output of one command flows into the next.*

Monolithic system. The system is composed of a single unit where internals are interwoven rather than clearly separated. Internally there might be modules and libraries, but they are usually not intentionally arranged as services or in layers. Machine-learning components may be interwoven in such systems, often using libraries. System development is initially simple and local without the need of networked communication and the complexities of distributed systems. This design is often derided for being hard to maintain and scale though once the system grows.

All of these common system structures will also reappear in the context of ML-enabled systems. We will see examples of deploying machine-learned models as services as well as deploying them as libraries in monolithic systems (chapter Deploying a Model), we will see event-based architectures in the context of processing large amounts of data with stream processing (chapter Scaling the System), and we will see data-flow architectures within pipelines to train models (chapter Automating the ML Pipeline).

Design Patterns

In software engineering, codified design knowledge below the level of entire system structures is best known in the form of design patterns. A design pattern names and describes a common solution to a known design problem and the known advantages and pitfalls. For example, the observer pattern is a common object-oriented design pattern to describe a solution how objects can be notified when another object changes (e.g., when a button is clicked) without strongly coupling these objects to each other. Entire catalogs of design patterns are often published in books or online. The idea of design patterns originally emerged from the field of architecture (as in designing buildings, not software) and has been popularized in software engineering initially around object-oriented programming, but has since been applied to many other design challenges, such as parallel programming, distributed programming, big data systems, security, designing community organizations, software architecture, and recently also machine learning. At smaller scale design patterns describe solutions for design challenges among objects in a program; at the much larger scale of software architecture, patterns (often called architectural tactics or architectural patterns in this community) often discuss interactions among subsystems or the guiding principle of how the entire system is organized. Independent of the scale at which these patterns are discussed, the key idea of codifying design knowledge is the same. Below we illustrate three different examples of patterns in different domains and at different levels of granularity.

Patterns typically follow a similar structure of name, problem, solution, alternatives, and tradeoffs. The name is important to enable abstraction and efficient communication. For example, instead of a conversation “maybe we should decouple these objects by introducing a interface with a single method and letting the other objects keep a list of instances of this interface, to then call the interface rather than the individual objects when something changes” we might just say “maybe we should use the observer pattern to decouple these objects.” If the people involved know the pattern, the term observer pattern compactly refers to lots of underlying knowledge, including the shape of the solution and its implications and tradeoffs. That is, thinking and communicating in terms of design patterns raises the design process to a much higher level of abstraction, building on accumulated design experience.

Regarding machine learning in software systems, we are still at an early stage of encoding design knowledge as design patterns. Although many academic articles, books, and blog posts try to suggest patterns related to machine learning in software systems, we are not yet at a stage where a stable catalog of patterns has emerged and has become more broadly adopted. As of this writing, the suggested patterns are all over the place — some focus on system organization broadly, some focus on very specific components, such as how to encode features during model training. Most patterns are not well defined, do not have broadly-agreed names, and are not well grounded. In the following section, we selectively mention some emerging patterns related to the concerns we discuss, but we do not try to comprehensively cover ML-related design patterns.

Example object-oriented design pattern: The Observer design pattern

Name: Observer (aka publish-subscribe)
Intent: Define a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically.
Motivation: [This would include an illustrative example of a user interface that needs to update multiple visual representations of data whenever input data changes, such as multiple diagrams in a spreadsheet.]
Solution: [This would include a description of the technical structure with an observer interface implemented by observers and an observable object managing a list of observers and calling them on state changes.]
Benefits, costs, tradeoffs: Decoupling of the observed object and observers; support of broadcast communication. Implementation overhead; observed objects unaware of consequences and costs of broadcast objects. [Typically this would be explained in more detail with examples.]

Example of an architectural tactic for availability: The Heartbeat tactic

Name: Heartbeat (aka dead-man timer)
Intent: Detect when a component is unavailable to trigger mitigations or repair
Motivation: Detect with low latency when a component or server becomes unavailable to automatically restart it or redirect traffic.
Solution: The observed component sends heartbeat messages to another component monitoring the system in regular predictable intervals. When the monitoring component does not receive the message it assumes the observed component is unavailable and initiates corrective actions.
Options: The heartbeat message can carry data to be processed. Standard data messages can stand in for heartbeat messages so that extra messages are only sent when no regular data messages are sent for a period.
Benefits, costs, tradeoffs: Component operation can be observed. Only unidirectional messaging needed. Observed component defines heartbeat frequency and thus detection latency and network overhead. Higher detection latency can be achieved at the cost of higher network traffic with more frequent messages; higher confidence in detection can be achieved at the cost of lower latency by waiting for multiple missing messages.
Alternatives: Ping/echo tactic where the monitoring component requests responses.

Example of a machine-learning design pattern for reproducibility: The Feature Store pattern

Name: Feature Store
Intend: Reuse features across projects by decoupling feature creation from model development and serving
Motivation: The same feature engineering code is needed during model training and model serving; inconsistencies are dangerous. In addition, some features may be expensive to compute but useful in multiple projects. Also, data scientists often need the same or similar features across multiple projects, but often lack a systematic mechanism for reuse.
Solution: Separate feature engineering code and reuse it both in the training pipeline and the model inference infrastructure. Catalog features with metadata to make them discoverable. Cache computed features used in multiple projects. Typically implemented in open-source infrastructure projects.
Benefits: Reusable features across projects; avoiding redundant feature computations; preventing training-serving skew; central versioning of features; separation of concerns.
Costs: Nontrivial infrastructure; extra engineering overhead in data science projects.
This concept is discussed in more depth in chapter Deploying the Model.

Example of a machine-learning design pattern for large language models: The Retrieval-Augmented Generation (RAG) pattern

Name: Retrieval-Augmented Generation (RAG)
Intend: Enabled a generative model to generate content more accurately or to generate answers about proprietary or recent information that was not used for model training.
Motivation: Provide a generative model performing tasks such as question answering with relevant context information or enhance a search with powerful summarization techniques of a large language model. Generative models are trained on large datasets but often do not have access to proprietary or recent information or may hallucinate answers when factual information is already available in documents.
Solution: Decompose the problem into two steps, search and generation. In the search step, relevant context information is located (e.g., using traditional search or a modern retrieval model backed by a vector database). The search results are then provided as part of the context in a prompt to the generative model.
Benefits, costs, tradeoffs: Enables generating answers about recent or proprietary information without retraining the model. The generative model’s answer is focused and grounded in the search result provided as context, reducing the risk for hallucinations. Nontrivial infrastructure and expensive inference cost and additional latency for search and generation. Requires access to relevant data to search in. The model may leak proprietary information from the context.

Summary

Going from requirements to implementation is hard and design and architecture planning can help to bridge the gap. The key is to think and plan before coding to focus on the qualities that matter and which may be very hard to fix later in a poorly designed system. Architectural design is deeply driven by quality attributes, such as scalability, changeability, and security. It focuses on the key abstractions, on gathering relevant information, and on deliberating about key design decisions to achieve the quality goals. In the process, decomposing the system and deciding how to divide the work is a key step. Planning for evolution as part of the early system design and making deployment of new versions easy can enable organizations to move much faster later and to experiment more easily.