What is Apache Kafka and How Can We Install It on Linux?
Apache Kafka is an open-source platform designed to handle the collection and processing of streaming data in real-time.
Kafka offers a distributed messaging system that enables seamless data transfer between various sources and consumers.
It operates on a publish-subscribe model, allowing data to be written and read from streams of events. This unique architecture, along with its high fault tolerance and scalability, makes Apache Kafka a popular choice for organizations dealing with high-volume data streams.
How to Install Apache Kafka on Linux?
Apache Kafka is a powerful distributed message broker that is widely used for handling large volumes of real-time data. It offers high scalability, fault tolerance, and superior throughput compared to other message brokers.
Let’s see the step-by-step process of installing Apache Kafka on a Linux system.
Prerequisites:
Before installing Apache Kafka, ensure that your Linux system meets the following prerequisites:
Pre-installed Ubuntu 22.04:
Ensure that you have a fresh installation of Ubuntu 22.04 on your server. This guide assumes you have a working Ubuntu environment; you can also use any other compatible Linux distribution – https://kafka.apache.org/documentation/
Sudo user with admin rights: You should have a user account with sudo privileges to execute administrative commands.
Java Version: Apache Kafka is written in Java, so a compatible Java runtime environment is required.
Recommended Java version: Java 8 or later.
Disk Space: Ensure sufficient disk space for storing Kafka data and logs. Kafka requires disk space for data retention, logs, and other operational aspects.
Memory (RAM): A minimum of 4 to 8 GB of RAM is recommended for optimal performance. More memory may be required based on the volume of data and the scale of your Kafka deployment.
Steps to Install Apache Kafka on Ubuntu
Step 1: Install and Check the Java Version
First, check the version of Java installed on your system; Kafka requires Java 8 or above to run properly. Open the terminal and type the following command:
java-version
Make sure the Java version displayed is “8” or above. If you have a lower version of Java installed, it is recommended to work with Java 8 for compatibility reasons. If you don’t have Java on the server, you need to install it on the server since Apache Kafka is written in Java and the installation of Java is a prerequisite. Begin by updating the local package index:
sudo apt update
Next, install OpenJDK 11, which is the Long-Term Support (LTS) release. You can install the Java version as per your need.
sudo apt install openjdk-11-jdk -y
Verify the installation by checking the Java version:
java -version
Head over to the Apache Kafka download page and locate the latest binary file in tarball format. Apache Kafka 3.6.1 is the latest version while writing this guide (January – 2024). You can install the Apache Kafka version as per your requirements.
wget https://downloads.apache.org/kafka/3.4.1/kafka_2.12-3.4.1.tgz
The downloaded file will be in “.tgz” format.
Step 3: Extract the Kafka zip file
After downloading the Kafka installation file, run the following command to extract it-
tar xvf kafka_2.12-3.4.1.tgz
Now, run the following command –
sudo mv kafka_2.13-3.5.1 /usr/local/kafka
This will create a folder named Kafka, move the kafka_2.12-3.4.1 folder to the /usr/local/ directory, and rename it to Kafka.
Next, you need to configure Kafka by modifying the server.properties file. Navigate to the Kafka config directory using the following command –
cd ~/kafka/config
Open the server.properties file in a text editor –
nano server.properties
In this file, you can modify various configuration options according to your requirements. For example, you can change the Kafka log directory, advertise listeners, and more.
Save the changes and exit the text editor.
Step 5: Create Kafka and ZooKeeper Systemd Unit Files
In this step, we will create systemd unit files for Kafka and ZooKeeper services. These unit files will allow us to manage the services using the systemctl command.
First, create the ZooKeeper systemd file using the nano editor –
sudo nano /etc/systemd/system/zookeeper.service
Paste the following lines of code into the file –
[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
ExecStart=/usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties
ExecStop=/usr/local/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
Save the changes and exit the editor.
Next, create the Kafka systemd file –
sudo nano /etc/systemd/system/kafka.service
Paste the following lines of code into the file –
[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
[Service]
Type=simple
Environment=”JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64″
ExecStart=/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
ExecStop=/usr/local/kafka/bin/kafka-server-stop.sh
[Install]
WantedBy=multi-user.target
Save the changes and exit the editor.
Step 6: Start Kafka and ZooKeeper Systemd Services
After creating the systemd unit files, notify systemd of the changes made –
sudo systemctl daemon-reload
Next, start the ZooKeeper and Kafka services –
sudo systemctl start zookeeper
sudo systemctl start kafka
To check the status of the services, use the following commands –
sudo systemctl status zookeeper
sudo systemctl status kafka
If everything is set up correctly, you should see the services running without errors.
Step 7: Create a Kafka Topic
With Kafka and all its components installed, let’s create a topic and send a message. In Kafka, a topic is a fundamental unit used to organize messages. Each topic should have a unique name across a cluster.
To create a topic called “sampleTopic” on localhost port 9092 with a single replication factor, run the following command –
cd /usr/local/kafka
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic sampleTopic
Make sure to replace localhost:9092 with the appropriate address and port if needed. Upon running the command, you will receive a confirmation that the topic was successfully created.
To list all the topics created, use the following command –
bin/kafka-topics.sh --list --bootstrap-server localhost:9092
Step 8: Send and Receive Messages in Kafka
In Kafka, a ‘producer’ is an application that writes data into topics across different partitions. To send messages to the sample topic created earlier, execute the following command:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic sampleTopic
You will be prompted to type messages. Enter a few lines and press “Enter” after each message. For example –
> Hello World!
> Welcome to Apache Kafka
> This is the first topic of Apache Kafka
To consume the messages and verify that they were successfully sent, open a new terminal and run the following command –
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sampleTopic --from-beginning
You should see the messages you typed earlier displayed in the terminal.
Step 9: Secure Kafka Server
To secure your Kafka server, you can enable SSL encryption and configure authentication and authorization mechanisms. The exact steps for securing Kafka vary depending on your specific requirements and environment. It is recommended to consult the official Kafka documentation for detailed instructions on securing your Kafka deployment.
How Does Apache Kafka Boost Streaming Platforms?
Kafka plays a crucial role in boosting the capabilities of streaming platforms. By providing a reliable and scalable infrastructure, Kafka enables organizations to process streaming data with ease. Its built-in partitioning and replication features ensure high throughput and fault tolerance, making it ideal for handling large volumes of data. Moreover, Kafka’s ability to process data in real-time allows organizations to gain valuable insights and make informed decisions faster than ever before.
Top 5 Kafka Use Cases
With Apache Kafka, organizations can leverage the streaming power data across different industries. Let’s explore some of the top use cases where Kafka shines:
Tracking Website Activity:
Kafka can be used to track user activity on websites by building real-time pipelines for user activity tracking, including monitoring pages viewed, user searches, registrations, transactions, and more. The data captured can be processed in real-time for analytics and insights, enabling organizations to optimize their websites and improve user experiences.
Operational Metrics:
Kafka is a powerful tool for collecting operational metrics from distributed applications. By aggregating data from various sources, Kafka allows organizations to monitor key metrics and generate alerts for operational issues. The metrics help in identifying performance bottlenecks optimizing resource allocation and ensuring smooth operations.
Aggregating Logs:
Kafka excels in collecting logs from multiple services and making them available in a standardized format to multiple consumers. Its support for low latency processing and integration with various data sources makes it an ideal choice for distributed log aggregation. Organizations can leverage this capability to gain real-time insights into system health troubleshoot issues and analyze application logs.
Stream Processing:
With Kafka, organizations can process streaming data in real time, enabling real-time analytics and decision-making. With Kafka, streams of data are continuously transformed, analyzed, and enhanced before they are delivered to downstream applications. This capability is especially valuable in scenarios where real-time data processing is critical, such as fraud detection, anomaly detection, and recommendation systems.
Data Integration:
Kafka acts as a powerful hub for data integration and allows organizations to connect various systems and applications seamlessly. By leveraging Kafka connectors, organizations can easily extract data from external systems, transform it, and load it into Kafka for further processing. This enables data integration across diverse sources and facilitates the creation of unified data pipelines and supporting business intelligence and reporting.
Which Industries Benefit from Kafka?
Kafka’s versatility makes it applicable to a wide range of industries; some of the industries that can benefit from Kafka include:
Financial Services: Kafka enables real-time transaction processing, fraud detection, and risk management in the financial sector.
Manufacturing: Kafka supports real-time monitoring of production lines, supply chain management, and predictive maintenance.
Retailing: Kafka facilitates real-time inventory management, personalized marketing, and real-time analytics for customer insights.
Gaming: Kafka powers real-time multiplayer gaming experiences in game analytics and event-driven gaming architectures.
Transportation and Logistics: Kafka enables real-time tracking of shipments, route optimization, and supply chain visibility.
Telecom: Kafka supports real-time call detail record (CDR) processing, network monitoring, and personalized marketing campaigns.
Healthcare: Kafka helps in real-time patient monitoring, healthcare analytics, and integration of medical devices.
Automotive: Kafka enables real-time data exchange between vehicles, smart traffic management, and fleet tracking.
Insurance: Kafka facilitates real-time claims processing, fraud detection, and personalized customer experiences.
In essence, any industry dealing with high-volume streaming data can benefit from Kafka’s capabilities and use it to gain a competitive edge.
What is Kafka Used For?
Kafka is a distributed streaming platform used for handling real-time data feeds and stream processing. It acts as a robust and scalable backbone for building applications that involve the continuous flow of data between different systems or components.
It helps to create live data pipelines and applications that operate in real-time; a data pipeline ensures the consistent processing and transfer of data between different systems. On the other hand, a streaming application is designed to handle continuous streams of data.
To illustrate, if you aim to establish a data pipeline tracking real-time user activity on your website, Kafka would be employed to efficiently collect and store streaming data. Additionally, Kafka is commonly utilized as a message broker, acting as a platform that facilitates communication between two applications by processing and mediating the exchange of messages.
Conclusion
Congratulations!
You have successfully installed Apache Kafka on Ubuntu 22.04 and created a Kafka topic.
Furthermore, you’ve learned to send and receive messages within a Kafka cluster. Apache Kafka is a powerful tool for handling large amounts of data and is widely used in big data and real-time analytics applications.
You can now leverage the power of Kafka to handle large volumes of real-time data efficiently. Remember to explore the various configuration options and advanced features of Kafka to fully utilize its capabilities.

Jilesh Patadiya, the visionary Founder and Chief Technology Officer (CTO) behind AccuWeb.Cloud. Founder & CTO at AccuWebHosting.com. He shares his web hosting insights on the AccuWeb.Cloud blog. He mostly writes on the latest web hosting trends, WordPress, storage technologies, and Windows and Linux hosting platforms.