blog & java
2021/01/15
Example to ingest data from Apache Kafka to Google Cloud Pub/SubArtur Khanin
,
Ilya Kozyrev
&
Alex Kosolapov
In this blog post we present an example that creates a pipeline to read data from a single topic or multiple topics from Apache Kafka and write data into a topic in Google Pub/Sub. The example provides code samples to implement simple yet powerful pipelines and also provides an out-of-the-box solution that you can just " plug’n’play".
This end-to-end example is included in Apache Beam release 2.27 and can be downloaded here.
We hope you will find this example useful for setting up data pipelines between Kafka and Pub/Sub.
Example specs
Supported data formats:
- Serializable plain text formats, such as JSON
- PubSubMessage
Supported input source configurations:
- Single or multiple Apache Kafka bootstrap servers
- Apache Kafka SASL/SCRAM authentication over plaintext or SSL connection
- Secrets vault service HashiCorp Vault
Supported destination configuration:
- Single Google Pub/Sub topic
In a simple scenario, the example will create an Apache Beam pipeline that will read messages from a source Kafka server with a source topic, and stream the text messages into specified Pub/Sub destination topic. Other scenarios may need Kafka SASL/SCRAM authentication, that can be performed over plaintext or SSL encrypted connection. The example supports using a single Kafka user account to authenticate in the provided source Kafka servers and topics. To support SASL authentication over SSL the example will need an SSL certificate location and access to a secrets vault service with Kafka username and password, currently supporting HashiCorp Vault.
Where can I run this example?
There are two ways to execute the pipeline.
- Locally. This way has many options - run directly from your IntelliJ, or create
.jar
file and run it in the terminal, or use your favourite method of running Beam pipelines. - In Google Cloud using Google
Cloud Dataflow:
- With
gcloud
command-line tool you can create a Flex Template out of this Beam example and execute it in Google Cloud Platform. This requires corresponding modifications of the example to turn it into a template. - This example exists as a Flex Template version within Google Cloud Dataflow Template Pipelines repository and can be run with no additional code modifications.
- With
Next Steps
Give this Beam end-to-end example a try. If you are new to Beam, we hope this example will give you more understanding on how pipelines work and look like. If you are already using Beam, we hope some code samples in it will be useful for your use cases.
Please let us know if you encounter any issues.