Dividing business actions into their smallest possible units via microservices seems like a wonderful idea. However, in business-critical scenarios, the architecture must align and coordinate half a dozen different distributed systems and services. If one step fails, an entire business process may need to be rolled back and corrected, much like a failed database transaction.
Microservices, by default, have no coordinating or controlling function: individual services should inherently have no knowledge of others outside of them. As such, maintaining smooth process flows between systems is much easier said than done without some sort of supervisory control mechanism. Fortunately, we can achieve this through an architectural design known as the saga model.
What is the design model for the saga?
Imagine an electronic data interchange (EDI) system interface that takes customer orders and passes them to a product ordering system. This ordering service then triggers a chain of service calls that alert downstream manufacturing and shipping systems to complete the transaction. The diagram below shows an example of a model to illustrate this, where the services are called in a “round robin” style. Once the entire transaction chain is finalized, the shipment sends a message to the product ordering department to confirm its completion. This can be seen as a service choreography method.
Typically, the application will perform each of these actions one at a time. If the production order and the shipping order both pass, but the payment transaction fails, a member of the team or system should send the previous departments an alert to roll back.
Unfortunately, things get a bit more complex when large-scale business transactions span over long periods of time. If one of these systems fails along the way or the order is canceled, we need a system that can perform a logical restore and reset all of those systems and transactions. For example, a single failure in the payment transaction could very well force teams to reverse dozens of previous transactions that were made by dozens of separate systems.
Airline ticketing systems are a prime example of this problem. An unexpected event can cause people to cancel a trip a few minutes before the plane takes off. This one-time cancellation will force ticketing systems to adjust seat availability, baggage to be rerouted, payment systems to issue necessary refunds – and these are likely just a few of the steps involved.
As you can imagine, this system will need a way to “backwash” itself by reversing some of the previous messages between web services. Unfortunately, our transactions are just too complex and time consuming to try to simply call up all the services and validate them. This requires a controller with a little more nuance than a master program. This requires a controller that can take over the whole process, which is where the saga model comes in to deliver service orchestration.
Saga implementation for web services
To illustrate the design model of the saga, imagine that your team implements an enterprise service bus that listens for particular transaction events, and then passes the messages to the systems to start their operation. After the bus creates a message that represents the event, it is sent to any service associated with that event. In this case, the controller is a web service triggered by this event. This controller makes function calls to the next enterprise web service in the queue.
Note that this gives us two types of services:
- Controllers, which receive events in the form of messages and then relay functional instructions to other services; and
- The services that perform the actual business process that needs to occur, and then communicate their completion to move the transaction forward.
To implement these controller services, you can basically create an event handler for an event application or introduce an finite state machine which simulates sequential logic. This component can then take the message, determine where it came from, analyze its status, and then process the next command. This can be accomplished simply through a switch statement, a set of nested IF commands, or even a single database lookup.
Keep in mind that implementing this design can still be tricky if the overall application demands a great deal of reliability. For example, imagine that the controller service crashes after triggering an “order placed” event, but before it can pass a “payment complete” event. When this service restarts, it should go to some sort of transaction log, search for unprocessed transactions, resubmit the event (or events), and mark the job as complete. This leaves the possibility that if a validation fails, a system will shut down after sending the event, but before it actually confirms the validation. There are a number of architectural models that address this specific problem, but by far the easiest is to allow redundant messaging, but schedule the services to ignore them if necessary.
Should you implement the saga design pattern?
The main focus of the saga design pattern is to support long-running multi-system business processes and add the ability to intelligently restore failed systems. However, it adds more code, which means new layers of complexity, debugging challenges, bandwidth requirements, and processing power.
Put simply, an orchestration-focused saga model will usually prove overkill for simple app-based transactions. Unless your organization is particularly struggling to manage large business process chains, the complexity of the code involved in a saga design pattern can cause more problems than it solves. But, if long-running deals keep you awake at night, especially when it comes to dealing with chess, the Saga Model may be the answer you’ve been looking for.