Example to ingest data from Apache Kafka to Google Cloud Pub/Sub

Artur Khanin , Ilya Kozyrev & Alex Kosolapov

In this blog post we present an example that creates a pipeline to read data from a single topic or multiple topics from Apache Kafka and write data into a topic in Google Pub/Sub. The example provides code samples to implement simple yet powerful pipelines and also provides an out-of-the-box solution that you can just " plug’n’play".

This end-to-end example is included in Apache Beam release 2.27 and can be downloaded here.

We hope you will find this example useful for setting up data pipelines between Kafka and Pub/Sub.

Example specs

Supported data formats:

Serializable plain text formats, such as JSON
PubSubMessage

Supported input source configurations:

Single or multiple Apache Kafka bootstrap servers
Apache Kafka SASL/SCRAM authentication over plaintext or SSL connection
Secrets vault service HashiCorp Vault

Supported destination configuration:

Single Google Pub/Sub topic

In a simple scenario, the example will create an Apache Beam pipeline that will read messages from a source Kafka server with a source topic, and stream the text messages into specified Pub/Sub destination topic. Other scenarios may need Kafka SASL/SCRAM authentication, that can be performed over plaintext or SSL encrypted connection. The example supports using a single Kafka user account to authenticate in the provided source Kafka servers and topics. To support SASL authentication over SSL the example will need an SSL certificate location and access to a secrets vault service with Kafka username and password, currently supporting HashiCorp Vault.

Where can I run this example?

There are two ways to execute the pipeline.

Locally. This way has many options - run directly from your IntelliJ, or create .jar file and run it in the terminal, or use your favourite method of running Beam pipelines.
In Google Cloud using Google Cloud Dataflow:
- With gcloud command-line tool you can create a Flex Template out of this Beam example and execute it in Google Cloud Platform. This requires corresponding modifications of the example to turn it into a template.
- This example exists as a Flex Template version within Google Cloud Dataflow Template Pipelines repository and can be run with no additional code modifications.

Next Steps

Give this Beam end-to-end example a try. If you are new to Beam, we hope this example will give you more understanding on how pipelines work and look like. If you are already using Beam, we hope some code samples in it will be useful for your use cases.

Please let us know if you encounter any issues.

Example specs

Where can I run this example?

Next Steps

Latest from the blog