Introduction to Apache Kafka in Event Hubs on Azure Cloud - Azure Event Hubs (2024)

  • Article

This article explains how you can use Azure Event Hubs to stream data from Apache Kafka applications without setting up a Kafka cluster on your own.

Note

This feature is supported only in the standard, premium and dedicated tiers.

Overview

Azure Event Hubs provides an Apache Kafka endpoint on an event hub, which enables users to connect to the event hub using the Kafka protocol. You can often use an event hub's Kafka endpoint from your applications without any code changes. You modify only the configuration, that is, update the connection string in configurations to point to the Kafka endpoint exposed by your event hub instead of pointing to a Kafka cluster. Then, you can start streaming events from your applications that use the Kafka protocol into event hubs, which are equivalent to Kafka topics.

Note

Event Hubs for Kafka Ecosystems supports Apache Kafka version 1.0 and later.

Apache Kafka and Azure Event Hubs conceptual mapping

Conceptually, Kafka and Event Hubs are very similar. They're both partitioned logs built for streaming data, whereby the client controls which part of the retained log it wants to read. The following table maps concepts between Kafka and Event Hubs.

Kafka ConceptEvent Hubs Concept
ClusterNamespace
TopicAn event hub
PartitionPartition
Consumer GroupConsumer Group
OffsetOffset

Key differences between Apache Kafka and Azure Event Hubs

While Apache Kafka is software you typically need to install and operate, Event Hubs is a fully managed, cloud-native service. There are no servers, disks, or networks to manage and monitor and no brokers to consider or configure, ever. You create a namespace, which is an endpoint with a fully qualified domain name, and then you create Event Hubs (topics) within that namespace.

For more information about Event Hubs and namespaces, see Event Hubs features. As a cloud service, Event Hubs uses a single stable virtual IP address as the endpoint, so clients don't need to know about the brokers or machines within a cluster. Even though Event Hubs implements the same protocol, this difference means that all Kafka traffic for all partitions is predictably routed through this one endpoint rather than requiring firewall access for all brokers of a cluster.

Scale in Event Hubs is controlled by how many throughput units (TUs) or processing units you purchase. If you enable the Auto-Inflate feature for a standard tier namespace, Event Hubs automatically scales up TUs when you reach the throughput limit. This feature also works with the Apache Kafka protocol support. For a premium tier namespace, you can increase the number of processing units assigned to the namespace.

Is Apache Kafka the right solution for your workload?

Coming from building applications using Apache Kafka, it's also useful to understand that Azure Event Hubs is part of a fleet of services, which also includes Azure Service Bus, and Azure Event Grid.

While some providers of commercial distributions of Apache Kafka might suggest that Apache Kafka is a one-stop-shop for all your messaging platform needs, the reality is that Apache Kafka doesn't implement, for instance, the competing-consumer queue pattern, doesn't have support for publish-subscribe at a level that allows subscribers access to the incoming messages based on server-evaluated rules other than plain offsets, and it has no facilities to track the lifecycle of a job initiated by a message or sidelining faulty messages into a dead-letter queue, all of which are foundational for many enterprise messaging scenarios.

To understand the differences between patterns and which pattern is best covered by which service, see the Asynchronous messaging options in Azure guidance. As an Apache Kafka user, you may find that communication paths you have so far realized with Kafka, can be realized with far less basic complexity and yet more powerful capabilities using either Event Grid or Service Bus.

If you need specific features of Apache Kafka that aren't available through the Event Hubs for Apache Kafka interface or if your implementation pattern exceeds the Event Hubs quotas, you can also run a native Apache Kafka cluster in Azure HDInsight.

Security and authentication

Every time you publish or consume events from an Event Hubs for Kafka, your client is trying to access the Event Hubs resources. You want to ensure that the resources are accessed using an authorized entity. When using Apache Kafka protocol with your clients, you can set your configuration for authentication and encryption using the SASL mechanisms. When using Event Hubs for Kafka requires the TLS-encryption (as all data in transit with Event Hubs is TLS encrypted), it can be done specifying the SASL_SSL option in your configuration file.

Azure Event Hubs provides multiple options to authorize access to your secure resources.

  • OAuth 2.0
  • Shared access signature (SAS)

OAuth 2.0

Event Hubs integrates with Microsoft Entra ID, which provides an OAuth 2.0 compliant centralized authorization server. With Microsoft Entra ID, you can use Azure role-based access control (Azure RBAC) to grant fine grained permissions to your client identities. You can use this feature with your Kafka clients by specifying SASL_SSL for the protocol and OAUTHBEARER for the mechanism. For details about Azure roles and levels for scoping access, see Authorize access with Microsoft Entra ID.

bootstrap.servers=NAMESPACENAME.servicebus.windows.net:9093security.protocol=SASL_SSLsasl.mechanism=OAUTHBEARERsasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required;sasl.login.callback.handler.class=CustomAuthenticateCallbackHandler

Note

The above configuration properties are for the Java programming language. For samples that show how to use OAuth with Event Hubs for Kafka using different programming languages, see samples on GitHub.

Event Hubs also provides the Shared Access Signatures (SAS) for delegated access to Event Hubs for Kafka resources. Authorizing access using OAuth 2.0 token-based mechanism provides superior security and ease of use over SAS. The built-in roles can also eliminate the need for ACL-based authorization, which has to be maintained and managed by the user. You can use this feature with your Kafka clients by specifying SASL_SSL for the protocol and PLAIN for the mechanism.

bootstrap.servers=NAMESPACENAME.servicebus.windows.net:9093security.protocol=SASL_SSLsasl.mechanism=PLAINsasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="{YOUR.EVENTHUBS.CONNECTION.STRING}";

Important

Replace {YOUR.EVENTHUBS.CONNECTION.STRING} with the connection string for your Event Hubs namespace. For instructions on getting the connection string, see Get an Event Hubs connection string. Here's an example configuration: sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="Endpoint=sb://mynamespace.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=XXXXXXXXXXXXXXXX";

Note

When using SAS authentication with Kafka clients, established connections aren't disconnected when the SAS key is regenerated.

Note

Generated shared access signature tokens are not supported when using the Event Hubs for Apache Kafka endpoint.

Samples

For a tutorial with step-by-step instructions to create an event hub and access it using SAS or OAuth, see Quickstart: Data streaming with Event Hubs using the Kafka protocol.

Other Azure Event Hubs features

The Event Hubs for Apache Kafka feature is one of three protocols concurrently available on Azure Event Hubs, complementing HTTP and AMQP. You can write with any of these protocols and read with any another, so that your current Apache Kafka producers can continue publishing via Apache Kafka, but your reader can benefit from the native integration with Event Hubs' AMQP interface, such as Azure Stream Analytics or Azure Functions. Conversely, you can readily integrate Azure Event Hubs into AMQP routing networks as a target endpoint, and yet read data through Apache Kafka integrations.

Additionally, Event Hubs features such as Capture, which enables extremely cost efficient long-term archival via Azure Blob Storage and Azure Data Lake Storage, and Geo Disaster-Recovery also work with the Event Hubs for Kafka feature.

Idempotency

Azure Event Hubs for Apache Kafka supports both idempotent producers and idempotent consumers.

One of the core tenets of Azure Event Hubs is the concept of at-least once delivery. This approach ensures that events will always be delivered. It also means that events can be received more than once, even repeatedly, by consumers such as a function. For this reason, it's important that the consumer supports the idempotent consumer pattern.

Feature differences with Apache Kafka

The goal of Event Hubs for Apache Kafka is to provide access to Azure Event Hubs capabilities to applications that are locked into the Apache Kafka API and would otherwise have to be backed by an Apache Kafka cluster.

As explained above, the Azure Messaging fleet provides rich and robust coverage for a multitude of messaging scenarios, and although the following features aren't currently supported through Event Hubs' support for the Apache Kafka API, we point out where and how the desired capability is available.

Transactions

Azure Service Bus has robust transaction support that allows receiving and settling messages and sessions while sending outbound messages resulting from message processing to multiple target entities under the consistency protection of a transaction. The feature set not only allows for exactly once processing of each message in a sequence, but also avoids the risk of another consumer inadvertently reprocessing the same messages as it would be the case with Apache Kafka. Service Bus is the recommended service for transactional message workloads.

Compression

The client-side compression feature of Apache Kafka compresses a batch of multiple messages into a single message on the producer side and decompresses the batch on the consumer side. The Apache Kafka broker treats the batch as a special message.

Kafka producer application developers can enable message compression by setting the compression.type property. In the public preview, the only compression algorithm supported is gzip.

Compression.type = none | gzip

The feature is currently only supported for Apache Kafka traffic producer and consumer traffic. AMQP consumer can consume compressed Kafka traffic as decompressed messages. The payload of any Event Hubs event is a byte stream and the content can be compressed with an algorithm of your choosing though in public preview, the only option is gzip. The benefits of using Kafka compression are through smaller message size, increased payload size you can transmit, and lower message broker resource consumption.

Kafka Streams

Kafka Streams is a client library for stream analytics that is part of the Apache Kafka open-source project, but is separate from the Apache Kafka event stream broker.

The most common reason Azure Event Hubs customers ask for Kafka Streams support is because they're interested in Confluent's "ksqlDB" product. "ksqlDB" is a proprietary shared source project that is licensed such that no vendor "offering software-as-a-service, platform-as-a-service, infrastructure-as-a-service, or other similar online services that compete with Confluent products or services" is permitted to use or offer "ksqlDB" support. Practically, if you use ksqlDB, you must either operate Kafka yourself or you must use Confluent’s cloud offerings. The licensing terms might also affect Azure customers who offer services for a purpose excluded by the license.

Standalone and without ksqlDB, Kafka Streams has fewer capabilities than many alternative frameworks and services, most of which have built-in streaming SQL interfaces, and all of which integrate with Azure Event Hubs today:

  • Azure Stream Analytics
  • Azure Synapse Analytics (via Event Hubs Capture)
  • Azure Databricks
  • Apache Samza
  • Apache Storm
  • Apache Spark
  • Apache Flink
  • Apache Flink on HDInsight on AKS
  • Akka Streams

The listed services and frameworks can generally acquire event streams and reference data directly from a diverse set of sources through adapters. Kafka Streams can only acquire data from Apache Kafka and your analytics projects are therefore locked into Apache Kafka. To use data from other sources, you're required to first import data into Apache Kafka with the Kafka Connect framework.

If you must use the Kafka Streams framework on Azure, Apache Kafka on HDInsight will provide you with that option. Apache Kafka on HDInsight provides full control over all configuration aspects of Apache Kafka, while being fully integrated with various aspects of the Azure platform, from fault/update domain placement to network isolation to monitoring integration.

Next steps

This article provided an introduction to Event Hubs for Kafka. To learn more, see Apache Kafka developer guide for Azure Event Hubs.

For a tutorial with step-by-step instructions to create an event hub and access it using SAS or OAuth, see Quickstart: Data streaming with Event Hubs using the Kafka protocol.

Also, see the OAuth samples on GitHub.

Introduction to Apache Kafka in Event Hubs on Azure Cloud - Azure Event Hubs (2024)

FAQs

What are Azure Event Hubs for Apache Kafka? ›

Azure Event Hubs is a cloud native data streaming service that can stream millions of events per second, with low latency, from any source to any destination. Event Hubs is compatible with Apache Kafka, and it enables you to run existing Kafka workloads without any code changes.

What is Apache Kafka in Azure? ›

Apache Kafka is an open-source, distributed streaming platform. It's often used as a message broker, as it provides functionality similar to a publish-subscribe message queue. In this Quickstart, you learn how to create an Apache Kafka cluster using the Azure portal.

What is Kafka's equivalent in Azure? ›

Event Hubs provides an endpoint that's compatible with the Apache Kafka producer and consumer APIs. This endpoint can be used by most Apache Kafka client applications, so it's an alternative to running a Kafka cluster on Azure.

What is the difference between Apache Kafka and Azure service bus? ›

Kafka enables real-time data processing by using a publish-subscribe messaging system. Azure Service Bus is used for event-driven data processing, where data is transmitted between applications asynchronously.

What is the difference between Eventhub and Kafka? ›

Managed Service vs. Self-Managed: Event Hubs is a fully managed service, while Kafka requires manual setup and management. Integration with Ecosystem: Event Hubs offers native integration with Azure services, while Kafka is more platform-agnostic.

What is the difference between Azure event grid and Kafka? ›

Azure Event Grid follows a consumption-based pricing model, which means that you only pay for the number of events delivered and the number of operations performed. Kafka is highly scalable and can handle large volumes of data in real-time.

What are the 4 major Kafka APIs? ›

The Admin API for inspecting and managing Kafka objects like topics and brokers. The Producer API for writing (publishing) to topics. The Consumer API for reading (subscribing to) topics. The Kafka Streams API to provide access for applications and microservices to higher-level stream processing functions.

What is the difference between Apache and Kafka? ›

Although both platforms include essential features that power data systems, Apache's framework are mainly used in data operations while Confluent Kafka is used in data processing. In this article, we will review key parameters, discuss some of the important features and compare Apache Kafka with Confluent Kafka.

What is Apache Kafka used for? ›

Kafka is primarily used to build real-time streaming data pipelines and applications that adapt to the data streams. It combines messaging, storage, and stream processing to allow storage and analysis of both historical and real-time data.

What is the difference between Azure data Hub and Kafka? ›

Key differences between Apache Kafka and Azure Event Hubs

While Apache Kafka is software you typically need to install and operate, Event Hubs is a fully managed, cloud-native service. There are no servers, disks, or networks to manage and monitor and no brokers to consider or configure, ever.

Which service is equivalent to Apache Kafka? ›

Other similar apps like Apache Kafka are Google Cloud Pub/Sub, MuleSoft Anypoint Platform, IBM MQ, and Amazon Kinesis Data Streams. Apache Kafka alternatives can be found in Event Stream Processing Software but may also be in Message Queue (MQ) Software or Stream Analytics Software.

What is RabbitMQ vs Kafka? ›

Your decision will however depend on your specific user case. While Kafka is best suited for big data use cases requiring the best throughput, RabbitMQ is perfect for low latency message delivery and complex routing. There are some common use cases for both Kafka and RabbitMQ.

What is the equivalent of Azure event hub in AWS? ›

Amazon Kinesis

What is the difference between Eventgrid and event hub? ›

Another difference is the delivery mechanism that they use. Azure Event Grid uses push delivery, which means that it pushes the events to the subscribers as soon as they are available. Azure Event Hub uses pull delivery, which means that the subscribers have to pull the events from the service at their own pace.

What is an Azure hub? ›

Azure IoT Hub provides a cloud-hosted solution back end to connect virtually any device. Extend your solution from the cloud to the edge with per-device authentication, built-in device management, and scaled provisioning. Security-enhanced communication channel for sending and receiving data from IoT devices.

Top Articles
Latest Posts
Article information

Author: Pres. Lawanda Wiegand

Last Updated:

Views: 5539

Rating: 4 / 5 (51 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Pres. Lawanda Wiegand

Birthday: 1993-01-10

Address: Suite 391 6963 Ullrich Shore, Bellefort, WI 01350-7893

Phone: +6806610432415

Job: Dynamic Manufacturing Assistant

Hobby: amateur radio, Taekwondo, Wood carving, Parkour, Skateboarding, Running, Rafting

Introduction: My name is Pres. Lawanda Wiegand, I am a inquisitive, helpful, glamorous, cheerful, open, clever, innocent person who loves writing and wants to share my knowledge and understanding with you.