Kafka: Building scalable, secure, fast applications.

Your System Designs won’t be the same.🔐

Today, we will talk about Apache Kafka, its architecture, and how this will affect your System Designs.

Photo by Florian Krumm on Unsplash

Welcome to the Crypto Element: We talk about Crypto, Digital Economy, Blockchains, dApps and Distributed Systems. Let’s get started!

What is Kafka?

Apache Kafka is essentially a distributed messaging platform, that can provide you with a fast, distributed, scalable and available PubSub system. As a result of this, Kafka is great for low-latency, real-time data handling. A famous example for usage of Kafka is @ for driver matching.

This is how your system would work if you used Kafka

Kafka’s architecture has 4 major components: Topics, Producers, Consumers and Broker. Each of these components are dependent on the other in some way or the other. Here’s a breakdown of all four.

  1. Producer: These are apps that are reponsible for pushing data into the Kafka System. It sends data asynchronously to the Topics.
  2. Broker: Instances of Kafka that commit message exchances are called Brokers.
  3. Topics: Topic are the table equivalent of Kafka systems. All inputs to the systems are stored as part of some topic.
  4. Consumers: The published messages are then used by consumer apps if the consumer has subscribed to the required topic.

So where do we use Kafka?

Kafka provides you with some distinct advantages.

Beautiful drawing of a Kafka system, innit?

Kafka provides you with some immense benefits.

  1. Scalability: You can amp up the number of producers: Multiple producers on one topic. You can partition your topics. You can group consumers to have them consume single partitions.
  2. High Throughput: Kafka can handle some high volume of data.
  3. Fault Tolerant: Clusters are usually fault-tolerant because if a node fails, the other ones take up the load.
  4. Durability: Data is persistent and message replication makes it more durable.

If you have ever had the m̶i̶s̶f̶o̶r̶t̶u̶n̶e̶ chance to design the flow of a big website, you must’ve come across the term CAP Theorem.

Courtesy: Lourenço, João & Cabral, Bruno & Carreiro, Paulo & Vieira, Marco & Bernardino, Jorge. (2015). Choosing the right NoSQL database for the job: a quality attribute evaluation. Journal of Big Data.

CAP Theorem, in simple english, means that in a distributed system you can only chose two of the three: Consistency, Avaiability and Partition Tolerance. According to developers @ Linkedin, who originally developed this system, Kafka is Consistent, Kafka is available, but Kafka has issues with Partition Tolerance (This is of course highly dependent on how you configure your systems). So when do you actually need Kafka? There are some big dogs out there using Kafka.

  1. Linkedin: Linkedin developed Kafka internally as a part of the infrastructure. I am not sure how Linkedin uses Kafka, but newsfeed, jobs, feed- all seem like a huge pubsub function.
  2. Twitter: Twitter uses Kafka to stream posts, typical pubsub architecture.
  3. Netflix: Event Processing and RT monitoring systems.

Do I need Kafka?

Kafka is a tremendous project for messaging systems. You can use it to track website activities and summarize that data for metrics, it helps remove the centralized need to log storage and abstracts it into the system, log events using Event Sourcing and for Stream Processing. Kafka is a great tool for abstraction, and if your website needs anything along those lines, Kafka helps.

🌺 Hey, hope you enjoyed reading that article. I am Abhinav, editor @ The Crypto Element. It takes a lot of work to research for and write such an article, and a clap or a follow 👏 from you means the entire world 🌍 to me. It takes less than 10 seconds for you, and it helps me with reach! You can also ask me any questions, or point out anything, or just drop a “Hey” 👇 down there. I 💓making new friends!

Research Intern @ Persistence.one