Actual-time information streaming and occasion processing are important elements of contemporary distributed system architectures. Apache Kafka has change into a number one platform for creating real-time information pipelines and enabling asynchronous communication between microservices and functions. Nonetheless, working and managing Kafka clusters at scale could be difficult, requiring specialised experience and vital operational overhead.
Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a totally managed service that means that you can construct and run manufacturing Kafka functions. With Amazon MSK, you may depend on AWS to deal with the heavy lifting of provisioning and managing Kafka clusters, whilst you deal with constructing modern functions and real-time information processing pipelines.
On this submit, we discover how Fitch Group, a number one credit standing firm, used Amazon MSK and Amazon MSK Replicator to realize multi-region resiliency in your mission-critical Kafka infrastructure.
About Fitch Group and its want for multi-regional resilience
As a number one international supplier of economic data providers, Fitch Group delivers important credit score and threat data, strong information, and dynamic instruments to champion extra environment friendly and clear monetary markets. With staff in additional than 30 international locations, the Fitch Group’s tradition of credibility, independence and transparency is ingrained all through its construction, which incorporates Fitch Rankings, one of many world’s high three credit standing companies, and Fitch Options, a supplier information, information, and evaluation chief.
To stay aggressive and environment friendly within the fast-paced monetary trade, Fitch Group strategically adopted an event-driven microservices structure. On the coronary heart of this ecosystem is Kafka, particularly Amazon MSK, which serves because the spine of its information integration techniques.
Fitch Group makes use of Kafka to allow functions to submit ratings-related enterprise occasions, facilitating automation inside its rankings workflow techniques and offering real-time or near-real-time processing. This architectural selection has considerably diminished time to marketplace for end-user-facing techniques such because the Fitch Rankings Professional and Fitch Group Rankings web sites. Moreover, Kafka’s strong capabilities allow the seamless aggregation and distribution of knowledge from many disparate techniques throughout its information platform, bettering information consistency, reliability, and accessibility throughout the group.
Given the important position Kafka performs in Fitch Group’s structure, offering strong catastrophe restoration (DR) mechanisms turned paramount. Any disruption to its Kafka infrastructure may have vital repercussions on rankings workflow automation, real-time processing, and end-user-facing techniques, doubtlessly exposing Fitch Group to regulatory, monetary, and reputational dangers.
To attain the specified ranges of resilience, Fitch Group had the next key necessities:
- Multi-region implementation – Deploy MSK clusters in a number of AWS Areas to supply enterprise continuity and preserve service availability throughout regional or service occasions.
- Automated replication – Replicate Kafka information throughout areas in close to real-time with minimal latency and information loss
- Constant theme namespaces – Keep the identical Kafka matter names and constructions on the supply and goal clusters to attenuate software adjustments.
- Fast restoration – Within the occasion of a failover, enable functions to seamlessly start consuming from the replicated cluster with a minimal restoration time goal (RTO) and restoration level goal (RPO).
Resolution Overview
Fitch Group selected to deploy its Kafka deployment throughout a number of areas utilizing Amazon MSK and MSK Replicator. MSK Replicator is a totally managed replication service that allows steady, automated information replication between MSK clusters throughout the similar area or between completely different areas. Helps information replication between clusters with completely different configurations, together with completely different dealer counts, storage volumes, and Kafka variations. Here is how Fitch Group used MSK Replicator to realize its resilience targets in a number of areas:
- MSK clusters had been deployed in two separate areas, with the first cluster within the major area and the secondary cluster in a special area for catastrophe restoration.
- MSK Replicator was configured to constantly replicate information from the first cluster to the secondary cluster, sustaining the identical matter names and constructions on each clusters.
- Applied software failover logic to routinely swap to consumption from the secondary cluster within the occasion {that a} major cluster turns into unavailable, with minimal restoration time and information loss.
The next diagram illustrates this structure.
Advantages obtained
By implementing Amazon MSK and MSK Replicator, Fitch Group achieved a number of key advantages:
- Enhanced catastrophe restoration – Multi-regional implementation supplies enterprise continuity even within the face of regional or service occasions.
- Simplified operations – The managed functionality of MSK Replicator offloads the operational complexity of self-managed customized replication options, decreasing the burden on Fitch Group’s IT workforce.
- Scalability – The answer can scale to deal with completely different information masses, guaranteeing catastrophe restoration capabilities develop alongside enterprise wants.
- Minimal app adjustments – MSK Replicator helps replication of matters with the identical title, eliminating the necessity for modifications to client functions, decreasing improvement effort and potential errors.
- Seamless failover and failback – Bi-directional replication capabilities allow speedy switching of operations to the standby area with minimal disruption and straightforward rollback after the first area is restored.
- Enhanced testing capabilities – The setup facilitates common DR workouts with out impacting manufacturing techniques, permitting Fitch Group to validate its DR plans persistently.
Conclusion
Utilizing Amazon MSK and MSK Replicator, Fitch Group has efficiently deployed a extremely resilient and scalable Kafka infrastructure that meets its stringent enterprise continuity and catastrophe restoration necessities. This multi-region deployment permits them to course of mission-critical monetary information at scale whereas offering minimal downtime and information loss within the occasion of service occasions or disasters. As Fitch Group continues to innovate and develop, its strong Kafka infrastructure supplies a stable basis for future enlargement and the event of recent data-driven providers, finally enhancing its means to supply well timed and correct monetary data to its purchasers. .
Concerning the authors
Kalyan Janaki He’s a senior analytics and massive information specialist at Amazon Net Providers. Helps clients design and construct extremely scalable, environment friendly, and safe cloud-based options on AWS.
Venu Nemallikanti is the enterprise architect and occasion broadcast chief at Fitch Group, a globally acknowledged monetary data providers supplier working in additional than 30 international locations. His major obligations embrace overseeing the structure and implementation of occasion streaming options, guaranteeing seamless integration and efficiency of techniques that present credit score rankings, analysis, information and analytics to a worldwide clientele.
Chaitanya Shah is a Senior Technical Account Supervisor at AWS, based mostly in New York. He likes to code and actively contributes to AWS Resolution Labs to assist clients resolve complicated issues. Supplies steerage to AWS clients on greatest practices for his or her cloud migrations. He’s additionally specialised in AWS information switch and information and analytics area.
Oleg Chugaev is a Principal Options Architect and Serverless Evangelist with over 20 years in IT and a number of AWS certifications. At AWS, he guides clients on their cloud transformation journeys by turning complicated challenges into actionable roadmaps for technical and enterprise audiences.