The Silent Migration: How Salesforce Moved 760+ Kafka Nodes Without a Single Drop
This blog recounts Salesforce’s massive engineering feat of migrating 760+ Kafka nodes handling 1 million+ messages per second, all with zero downtime and no data loss. Told in a story-like war-room style, it highlights the challenges of moving from CentOS to RHEL and consolidating onto Salesforce’s Ajna Kafka platform. The narrative walks through how the team orchestrated the migration with mixed-mode clusters, strict validations, checksum-based integrity checks, and live dashboards. In the end, it showcases how a seemingly impossible migration was achieved smoothly proving that large-scale infrastructure upgrades are less about brute force and more about meticulous planning, safety nets, and engineering discipline.