In a recent InfoQ article, Sergii Kram, lead software architect at iDeals, shared a business case where he used event-driven architecture with a Mediator topology and several interesting implementation details, such as elastic scalability, reliability, and durable workflows. All systems were built using Kubernetes, KEDA, AWS, and .NET technologies. In the iDeals product, users upload files to later share and review online as part of due diligence processes. But behind the scenes, everything is much more complicated. Each file must be processed, converting it to a basic format, and optimizing it to view on browsers, generating previews, determining the language and recognizing text on images, collecting metadata, and other operations. The files include documents, pictures, technical drawings, archives (.zip), and videos. Sometimes the product can receive hundreds of thousands of files uploaded daily, and sometimes there are days without activity. Still, users generally want to start collaborating on a file as soon as possible after uploading it. So an architecture was needed that would scale elastically and be cost-effective. Kram and his team chose an event-driven architecture with the Mediator topology pattern. A special service called "Event Mediator", is internally named "Orchestrator". It receives the initial message to process the file and executes a file-processing script called a "workflow". The workflow is a declarative description of what must be done with a particular file as a set of discrete steps. Each step type is implemented as a separate stateless service. In pattern terms, they are called Event Processors or "Converters". The end result created was a highly scalable system that is easy to extend, modify, and test, with good observability and cost-effectiveness. Kram stated that they did have some failures and incidents. Most were related to defects in third-party libraries appearing only in edge cases under heavy load. As responsible engineers, they tried their best to contribute diagnostic information and fixes, and they were very grateful to the OSS maintainers for their support and responsiveness. One lesson learned was to have easily configurable limits so that when something goes wrong, it is easy to reduce the load, let the system stabilize, recover, and continue operations under degraded throughput while working on the fix. This content is an excerpt from a recent InfoQ article by Sergii Kram, "A Case for Event-Driven Architecture with Mediator Topology". To get notifications when InfoQ publishes content on these topics, follow "Event-Driven Architecture", "Distributed Systems", and "Asynchronous Programming" on InfoQ. Missed a newsletter? You can find all of the previous issues on InfoQ. |