apache_beam.io.external.kafka module

PTransforms for supporting Kafka in Python pipelines. These transforms do not run a Kafka client in Python. Instead, they expand to ExternalTransforms which the Expansion Service resolves to the Java SDK’s KafkaIO. In other words: they are cross-language transforms.

Note: To use these transforms, you need to start a Java Expansion Service. Please refer to the portability documentation on how to do that. Flink Users can use the built-in Expansion Service of the Flink Runner’s Job Server. The expansion service address has to be provided when instantiating the transforms.

If you start Flink’s Job Server, the expansion service will be started on port 8097. This is also the configured default for this transform. For a different address, please set the expansion_service parameter.

For more information see: - https://beam.apache.org/documentation/runners/flink/ - https://beam.apache.org/roadmap/portability/

class apache_beam.io.external.kafka.ReadFromKafka(consumer_config, topics, key_deserializer='org.apache.kafka.common.serialization.ByteArrayDeserializer', value_deserializer='org.apache.kafka.common.serialization.ByteArrayDeserializer', expansion_service='localhost:8097')[source]

Bases: apache_beam.transforms.ptransform.PTransform

An external PTransform which reads from Kafka and returns a KV pair for each item in the specified Kafka topics. If no Kafka Deserializer for key/value is provided, then the data will be returned as a raw byte array.

Note: Runners need to support translating Read operations in order to use this source. At the moment only the Flink Runner supports this.

Experimental; no backwards compatibility guarantees.

Initializes a read operation from Kafka.

Parameters:
  • consumer_config – A dictionary containing the consumer configuration.
  • topics – A list of topic strings.
  • key_deserializer – A fully-qualified Java class name of a Kafka Deserializer for the topic’s key, e.g. ‘org.apache.kafka.common. serialization.LongDeserializer’. Default: ‘org.apache.kafka.common. serialization.ByteArrayDeserializer’.
  • value_deserializer – A fully-qualified Java class name of a Kafka Deserializer for the topic’s value, e.g. ‘org.apache.kafka.common. serialization.LongDeserializer’. Default: ‘org.apache.kafka.common. serialization.ByteArrayDeserializer’.
  • expansion_service – The address (host:port) of the ExpansionService.
byte_array_deserializer = 'org.apache.kafka.common.serialization.ByteArrayDeserializer'
expand(pbegin)[source]
class apache_beam.io.external.kafka.WriteToKafka(producer_config, topic, key_serializer='org.apache.kafka.common.serialization.ByteArraySerializer', value_serializer='org.apache.kafka.common.serialization.ByteArraySerializer', expansion_service='localhost:8097')[source]

Bases: apache_beam.transforms.ptransform.PTransform

An external PTransform which writes KV data to a specified Kafka topic. If no Kafka Serializer for key/value is provided, then key/value are assumed to be byte arrays.

Experimental; no backwards compatibility guarantees.

Initializes a write operation to Kafka.

Parameters:
  • consumer_config – A dictionary containing the producer configuration.
  • topic – A Kafka topic name.
  • key_deserializer – A fully-qualified Java class name of a Kafka Serializer for the topic’s key, e.g. ‘org.apache.kafka.common. serialization.LongSerializer’. Default: ‘org.apache.kafka.common. serialization.ByteArraySerializer’.
  • value_deserializer – A fully-qualified Java class name of a Kafka Serializer for the topic’s value, e.g. ‘org.apache.kafka.common. serialization.LongSerializer’. Default: ‘org.apache.kafka.common. serialization.ByteArraySerializer’.
  • expansion_service – The address (host:port) of the ExpansionService.
byte_array_serializer = 'org.apache.kafka.common.serialization.ByteArraySerializer'
expand(pvalue)[source]