Apache Kafka is a powerful event streaming platform that is used in particular in highly scalable architectures. The main architecture of Kafka was designed with throughput, scalability and durability of data streams in mind. These properties are making Kafka increasingly popular in the Industrial IoT.
The emergence of Kafka: from LinkedIn to open source
Kafka was originally developed by LinkedIn and later handed over to the Apache Software Foundation as an open source project. Kafka was originally designed to address LinkedIn’s challenges in processing large amounts of data and providing real-time data to the rapidly growing network. The name “Kafka” was chosen after the famous Czech writer Franz Kafka. This reflects the objective of robust and error-free data transmission in a complex and often chaotically distributed environment.
Event-based communication
Kafka minimizes bandwidth usage by moving from a conventional “poll-response” behavior (such as the OPC UA client-server architecture) to an event-based “publish-subscribe” behavior. Event-based means a data consumer waits for data changes (instead of polling it cyclically). This enables real-time, event-driven communication between devices and applications within the organization and beyond. Further information can be found here.
The advantages of Kafka
Apache Kafka has the following advantages in the industrial environment:
- Processing large amounts of data: Kafka can handle huge amounts of data effectively and offers high throughput rates. This is particularly essential when processing large data streams in industrial environments.
- Reliability and fault tolerance: Thanks to its distributed architecture, Kafka offers impressive fault tolerance. In this way, Kafka ensures continuous data transmission even if individual system components fail.
- Scalability: Kafka can be easily scaled to keep pace with the growth of data volumes and requirements. This can be achieved by adding further nodes to the Kafka cluster.
- Real-time communication: Kafka makes it possible to process data streams in real time and ensure that important information reaches its destination quickly.
- Interoperability: It can easily interact with different data sources and sinks and transfer data between different technology stacks.
Kafka in the Industrial IoT
Manufacturing companies use Kafka particularly in environments with high scaling requirements, to transfer data streams between systems and applications. The platform ensures that this data is forwarded to all subscribers in a reliable and fault-tolerant manner. Kafka therefore gains importance in the following use cases:
- Communication across different layers: Kafka is increasingly being used by large manufacturing companies (e.g. automotive OEMs) as a communication protocol for the Enterprise and Connected World layer of the ISA95 model. The aim is to minimize integration costs and increase the scalability of the system architecture. This is particularly important in the context of architectures such as the Unified Namespace (UNS).
- Unified Namespace (UNS): The UNS architecture offers a central, non-hierarchical system architecture in which all factory data is accessible via a uniform naming convention and data structure in a central message broker. Kafka supports this approach by enabling data producers to continuously publish data in the central message broker. This follows the principle of “publish once, distribute everywhere”, whereby data can be published once and then subscribed to by any number of systems and applications. You can find more information about UNS in our blog.
MQTT or Kafka: what’s right for Industrial IoT?
Both MQTT and Kafka are widely used in the world of data transmission and processing and have proven themselves in various use cases as a broker-based publish/subscribe architecture. Both technologies make it possible to transfer data between different systems or components. However, the technologies offer different focuses and functions.
What do MQTT and Kafka have in common?
- Data transfer: Both enable the sending of data between producers and consumers on the basis of a message broker.
- Middleware: Both Kafka and MQTT act as middleware to transport data between senders and receivers.
- Distributed systems: Both technologies support distributed systems and can be implemented in environments with multiple servers and clients.
- Reliability: Both Kafka and MQTT offer mechanisms to guarantee data transmission, although the implementation and guarantees are different.
What are the differences between MQTT and Kafka?
- Throughput and latency:
- MQTT: Offers lower latency and is ideal for use cases where fast delivery of messages is critical.
- Kafka: Can process extremely high throughputs of messages and is well suited for applications involving the processing of large data streams.
- Data storage:
- MQTT: Is primarily designed to transmit messages with low latency and does not offer any built-in functions for long-term data storage.
- Kafka: Provides robust data storage capabilities and can retain messages for extended periods of time (configurable) for reprocessing or referencing in the event of an error.
- Fault tolerance and recovery:
- MQTT: Has mechanisms for message acknowledgement, but no native support for replay or long-term storage of data for later recovery.
- Kafka: Provides strong disaster recovery capabilities thanks to its persistent storage architecture and the ability to “replay” data streams.
- Implementation and maintenance effort
- MQTT: The implementation based on client libraries, the configuration of an MQTT broker and the management and monitoring of MQTT brokers tend to be relatively simple.
- Kafka: The introduction of Kafka is significantly more complicated (e.g. in terms of the setup and configuration of the clusters, planning of hardware or cloud infrastructure). Data storage management and robust cluster management and monitoring are also much more demanding.
- Application:
- MQTT: Is a lightweight publish/subscribe protocol designed specifically for resource-constrained environments and is ideal for communicating with Industrial IoT devices.
- Kafka: Is a distributed streaming platform designed specifically for streaming and processing large amounts of event data in real time. Kafka is excellent for big data processing, data analytics, aggregation and applications that involve processing data streams on a large scale.
Conclusion
The question of whether “MQTT or Kafka” can be answered based on the requirements of the use case. In order to use Kafka profitably, the scaling benefits should more than compensate for the additional implementation and maintenance costs. In practice, MQTT and Kafka are often used in complementary ways – for example, MQTT at the edge level for communication with industrial IoT devices and Kafka for processing data streams in the cloud. Both have specific strengths and can be used together to create robust, scalable and efficient industrial IoT architectures.