KafkaIO.Read (Apache Beam 2.13.0)

java.lang.Object
- org.apache.beam.sdk.transforms.PTransform<PBegin,PCollection<KafkaRecord<K,V>>>
- - org.apache.beam.sdk.io.kafka.KafkaIO.Read<K,V>

All Implemented Interfaces:

java.io.Serializable, HasDisplayData

Enclosing class:

KafkaIO
```
public abstract static class KafkaIO.Read<K,V>
extends PTransform<PBegin,PCollection<KafkaRecord<K,V>>>
```
A PTransform to read from Kafka topics. See KafkaIO for more information on usage and configuration.

See Also:

Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class KafkaIO.Read.External
Exposes KafkaIO.TypedWithoutMetadata as an external transform for cross-language usage.

Nested Classes
Modifier and Type	Class and Description
`static class`	`KafkaIO.Read.External` Exposes `KafkaIO.TypedWithoutMetadata` as an external transform for cross-language usage.

Field Summary
- Fields inherited from class org.apache.beam.sdk.transforms.PTransform
  name

Constructor Summary

Constructors
Constructor and Description

Read()

Constructors
Constructor and Description
`Read()`

Method Summary

All Methods Instance Methods Concrete Methods Deprecated Methods
Modifier and Type	Method and Description
`KafkaIO.Read<K,V>`	`commitOffsetsInFinalize()` Finalized offsets are committed to Kafka.
`PCollection<KafkaRecord<K,V>>`	`expand(PBegin input)` Override this method to specify how this `PTransform` should be expanded on the given `InputT`.
`void`	`populateDisplayData(DisplayData.Builder builder)` Register display data for the given transform or component.
`KafkaIO.Read<K,V>`	`updateConsumerProperties(java.util.Map<java.lang.String,java.lang.Object> configUpdates)` Update consumer configuration with new properties.
`KafkaIO.Read<K,V>`	`withBootstrapServers(java.lang.String bootstrapServers)` Sets the bootstrap servers for the Kafka consumer.
`KafkaIO.Read<K,V>`	`withConsumerFactoryFn(SerializableFunction<java.util.Map<java.lang.String,java.lang.Object>,org.apache.kafka.clients.consumer.Consumer<byte[],byte[]>> consumerFactoryFn)` A factory to create Kafka `Consumer` from consumer configuration.
`KafkaIO.Read<K,V>`	`withCreateTime(Duration maxDelay)` Sets the timestamps policy based on `KafkaTimestampType.CREATE_TIME` timestamp of the records.
`KafkaIO.Read<K,V>`	`withKeyDeserializer(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<K>> keyDeserializer)` Sets a Kafka `Deserializer` to interpret key bytes read from Kafka.
`KafkaIO.Read<K,V>`	`withKeyDeserializerAndCoder(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<K>> keyDeserializer, Coder<K> keyCoder)` Sets a Kafka `Deserializer` for interpreting key bytes read from Kafka along with a `Coder` for helping the Beam runner materialize key objects at runtime if necessary.
`KafkaIO.Read<K,V>`	`withLogAppendTime()` Sets `TimestampPolicy` to `TimestampPolicyFactory.LogAppendTimePolicy`.
`KafkaIO.Read<K,V>`	`withMaxNumRecords(long maxNumRecords)` Similar to `Read.Unbounded.withMaxNumRecords(long)`.
`KafkaIO.Read<K,V>`	`withMaxReadTime(Duration maxReadTime)` Similar to `Read.Unbounded.withMaxReadTime(Duration)`.
`KafkaIO.Read<K,V>`	`withOffsetConsumerConfigOverrides(java.util.Map<java.lang.String,java.lang.Object> offsetConsumerConfig)` Set additional configuration for the backend offset consumer.
`PTransform<PBegin,PCollection<KV<K,V>>>`	`withoutMetadata()` Returns a `PTransform` for PCollection of `KV`, dropping Kafka metatdata.
`KafkaIO.Read<K,V>`	`withProcessingTime()` Sets `TimestampPolicy` to `TimestampPolicyFactory.ProcessingTimePolicy`.
`KafkaIO.Read<K,V>`	`withReadCommitted()` Sets "isolation_level" to "read_committed" in Kafka consumer configuration.
`KafkaIO.Read<K,V>`	`withStartReadTime(Instant startReadTime)` Use timestamp to set up start offset.
`KafkaIO.Read<K,V>`	`withTimestampFn(SerializableFunction<KV<K,V>,Instant> timestampFn)` Deprecated. as of version 2.4. Use `withTimestampPolicyFactory(TimestampPolicyFactory)` instead.
`KafkaIO.Read<K,V>`	`withTimestampFn2(SerializableFunction<KafkaRecord<K,V>,Instant> timestampFn)` Deprecated. as of version 2.4. Use `withTimestampPolicyFactory(TimestampPolicyFactory)` instead.
`KafkaIO.Read<K,V>`	`withTimestampPolicyFactory(TimestampPolicyFactory<K,V> timestampPolicyFactory)` Provide custom `TimestampPolicyFactory` to set event times and watermark for each partition.
`KafkaIO.Read<K,V>`	`withTopic(java.lang.String topic)` Sets the topic to read from.
`KafkaIO.Read<K,V>`	`withTopicPartitions(java.util.List<org.apache.kafka.common.TopicPartition> topicPartitions)` Sets a list of partitions to read from.
`KafkaIO.Read<K,V>`	`withTopics(java.util.List<java.lang.String> topics)` Sets a list of topics to read from.
`KafkaIO.Read<K,V>`	`withValueDeserializer(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<V>> valueDeserializer)` Sets a Kafka `Deserializer` to interpret value bytes read from Kafka.
`KafkaIO.Read<K,V>`	`withValueDeserializerAndCoder(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<V>> valueDeserializer, Coder<V> valueCoder)` Sets a Kafka `Deserializer` for interpreting value bytes read from Kafka along with a `Coder` for helping the Beam runner materialize value objects at runtime if necessary.
`KafkaIO.Read<K,V>`	`withWatermarkFn(SerializableFunction<KV<K,V>,Instant> watermarkFn)` Deprecated. as of version 2.4. Use `withTimestampPolicyFactory(TimestampPolicyFactory)` instead.
`KafkaIO.Read<K,V>`	`withWatermarkFn2(SerializableFunction<KafkaRecord<K,V>,Instant> watermarkFn)` Deprecated. as of version 2.4. Use `withTimestampPolicyFactory(TimestampPolicyFactory)` instead.

Methods inherited from class org.apache.beam.sdk.transforms.PTransform
compose, compose, getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, toString, validate

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - Read
```
public Read()
```
- Method Detail
  - withBootstrapServers
```
public KafkaIO.Read<K,V> withBootstrapServers(java.lang.String bootstrapServers)
```
    Sets the bootstrap servers for the Kafka consumer.
  - withTopic
```
public KafkaIO.Read<K,V> withTopic(java.lang.String topic)
```
    Sets the topic to read from.
    See KafkaUnboundedSource.split(int, PipelineOptions) for description of how the partitions are distributed among the splits.
  - withTopics
```
public KafkaIO.Read<K,V> withTopics(java.util.List<java.lang.String> topics)
```
    Sets a list of topics to read from. All the partitions from each of the topics are read.
    See KafkaUnboundedSource.split(int, PipelineOptions) for description of how the partitions are distributed among the splits.
  - withTopicPartitions
```
public KafkaIO.Read<K,V> withTopicPartitions(java.util.List<org.apache.kafka.common.TopicPartition> topicPartitions)
```
    Sets a list of partitions to read from. This allows reading only a subset of partitions for one or more topics when (if ever) needed.
    See KafkaUnboundedSource.split(int, PipelineOptions) for description of how the partitions are distributed among the splits.
  - withKeyDeserializer
```
public KafkaIO.Read<K,V> withKeyDeserializer(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<K>> keyDeserializer)
```
    Sets a Kafka Deserializer to interpret key bytes read from Kafka.
    In addition, Beam also needs a Coder to serialize and deserialize key objects at runtime. KafkaIO tries to infer a coder for the key based on the Deserializer class, however in case that fails, you can use withKeyDeserializerAndCoder(Class, Coder) to provide the key coder explicitly.
  - withKeyDeserializerAndCoder
```
public KafkaIO.Read<K,V> withKeyDeserializerAndCoder(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<K>> keyDeserializer,
                                                     Coder<K> keyCoder)
```
    Sets a Kafka Deserializer for interpreting key bytes read from Kafka along with a Coder for helping the Beam runner materialize key objects at runtime if necessary.
    Use this method only if your pipeline doesn't work with plain withKeyDeserializer(Class).
  - withValueDeserializer
```
public KafkaIO.Read<K,V> withValueDeserializer(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<V>> valueDeserializer)
```
    Sets a Kafka Deserializer to interpret value bytes read from Kafka.
    In addition, Beam also needs a Coder to serialize and deserialize value objects at runtime. KafkaIO tries to infer a coder for the value based on the Deserializer class, however in case that fails, you can use withValueDeserializerAndCoder(Class, Coder) to provide the value coder explicitly.
  - withValueDeserializerAndCoder
```
public KafkaIO.Read<K,V> withValueDeserializerAndCoder(java.lang.Class<? extends org.apache.kafka.common.serialization.Deserializer<V>> valueDeserializer,
                                                       Coder<V> valueCoder)
```
    Sets a Kafka Deserializer for interpreting value bytes read from Kafka along with a Coder for helping the Beam runner materialize value objects at runtime if necessary.
    Use this method only if your pipeline doesn't work with plain withValueDeserializer(Class).
  - withConsumerFactoryFn
```
public KafkaIO.Read<K,V> withConsumerFactoryFn(SerializableFunction<java.util.Map<java.lang.String,java.lang.Object>,org.apache.kafka.clients.consumer.Consumer<byte[],byte[]>> consumerFactoryFn)
```
    A factory to create Kafka Consumer from consumer configuration. This is useful for supporting another version of Kafka consumer. Default is KafkaConsumer.
  - updateConsumerProperties
```
public KafkaIO.Read<K,V> updateConsumerProperties(java.util.Map<java.lang.String,java.lang.Object> configUpdates)
```
    Update consumer configuration with new properties.
  - withMaxNumRecords
```
public KafkaIO.Read<K,V> withMaxNumRecords(long maxNumRecords)
```
    Similar to Read.Unbounded.withMaxNumRecords(long). Mainly used for tests and demo applications.
  - withStartReadTime
```
public KafkaIO.Read<K,V> withStartReadTime(Instant startReadTime)
```
    Use timestamp to set up start offset. It is only supported by Kafka Client 0.10.1.0 onwards and the message format version after 0.10.0.
    Note that this take priority over start offset configuration ConsumerConfig.AUTO_OFFSET_RESET_CONFIG and any auto committed offsets.
    This results in hard failures in either of the following two cases : 1. If one of more partitions do not contain any messages with timestamp larger than or equal to desired timestamp. 2. If the message format version in a partition is before 0.10.0, i.e. the messages do not have timestamps.
  - withMaxReadTime
```
public KafkaIO.Read<K,V> withMaxReadTime(Duration maxReadTime)
```
    Similar to Read.Unbounded.withMaxReadTime(Duration). Mainly used for tests and demo applications.
  - withLogAppendTime
```
public KafkaIO.Read<K,V> withLogAppendTime()
```
    Sets TimestampPolicy to TimestampPolicyFactory.LogAppendTimePolicy. The policy assigns Kafka's log append time (server side ingestion time) to each record. The watermark for each Kafka partition is the timestamp of the last record read. If a partition is idle, the watermark advances to couple of seconds behind wall time. Every record consumed from Kafka is expected to have its timestamp type set to 'LOG_APPEND_TIME'.
    In Kafka, log append time needs to be enabled for each topic, and all the subsequent records wil have their timestamp set to log append time. If a record does not have its timestamp type set to 'LOG_APPEND_TIME' for any reason, it's timestamp is set to previous record timestamp or latest watermark, whichever is larger.
    The watermark for the entire source is the oldest of each partition's watermark. If one of the readers falls behind possibly due to uneven distribution of records among Kafka partitions, it ends up holding the watermark for the entire source.
  - withProcessingTime
```
public KafkaIO.Read<K,V> withProcessingTime()
```
    Sets TimestampPolicy to TimestampPolicyFactory.ProcessingTimePolicy. This is the default timestamp policy. It assigns processing time to each record. Specifically, this is the timestamp when the record becomes 'current' in the reader. The watermark aways advances to current time. If server side time (log append time) is enabled in Kafka, withLogAppendTime() is recommended over this.
  - withCreateTime
```
public KafkaIO.Read<K,V> withCreateTime(Duration maxDelay)
```
    Sets the timestamps policy based on KafkaTimestampType.CREATE_TIME timestamp of the records. It is an error if a record's timestamp type is not KafkaTimestampType.CREATE_TIME. The timestamps within a partition are expected to be roughly monotonically increasing with a cap on out of order delays (e.g. 'max delay' of 1 minute). The watermark at any time is '(Min(now(), Max(event timestamp so far)) - max delay)'. However, watermark is never set in future and capped to 'now - max delay'. In addition, watermark advanced to 'now - max delay' when a partition is idle.
    
    Parameters:
    
    maxDelay - For any record in the Kafka partition, the timestamp of any subsequent record is expected to be after current record timestamp - maxDelay.
  - withTimestampPolicyFactory
```
public KafkaIO.Read<K,V> withTimestampPolicyFactory(TimestampPolicyFactory<K,V> timestampPolicyFactory)
```
    Provide custom TimestampPolicyFactory to set event times and watermark for each partition. TimestampPolicyFactory.createTimestampPolicy(TopicPartition, Optional) is invoked for each partition when the reader starts.
    
    See Also:
    
    withLogAppendTime(), withCreateTime(Duration), withProcessingTime()
  - withTimestampFn2
```
@Deprecated
public KafkaIO.Read<K,V> withTimestampFn2(SerializableFunction<KafkaRecord<K,V>,Instant> timestampFn)
```
    Deprecated. as of version 2.4. Use withTimestampPolicyFactory(TimestampPolicyFactory) instead.
    
    A function to assign a timestamp to a record. Default is processing timestamp.
  - withWatermarkFn2
```
@Deprecated
public KafkaIO.Read<K,V> withWatermarkFn2(SerializableFunction<KafkaRecord<K,V>,Instant> watermarkFn)
```
    Deprecated. as of version 2.4. Use withTimestampPolicyFactory(TimestampPolicyFactory) instead.
    
    A function to calculate watermark after a record. Default is last record timestamp.
    
    See Also:
    
    withTimestampFn(SerializableFunction)
  - withTimestampFn
```
@Deprecated
public KafkaIO.Read<K,V> withTimestampFn(SerializableFunction<KV<K,V>,Instant> timestampFn)
```
    Deprecated. as of version 2.4. Use withTimestampPolicyFactory(TimestampPolicyFactory) instead.
    
    A function to assign a timestamp to a record. Default is processing timestamp.
  - withWatermarkFn
```
@Deprecated
public KafkaIO.Read<K,V> withWatermarkFn(SerializableFunction<KV<K,V>,Instant> watermarkFn)
```
    Deprecated. as of version 2.4. Use withTimestampPolicyFactory(TimestampPolicyFactory) instead.
    
    A function to calculate watermark after a record. Default is last record timestamp.
    
    See Also:
    
    withTimestampFn(SerializableFunction)
  - withReadCommitted
```
public KafkaIO.Read<K,V> withReadCommitted()
```
    Sets "isolation_level" to "read_committed" in Kafka consumer configuration. This is ensures that the consumer does not read uncommitted messages. Kafka version 0.11 introduced transactional writes. Applications requiring end-to-end exactly-once semantics should only read committed messages. See JavaDoc for KafkaConsumer for more description.
  - commitOffsetsInFinalize
```
public KafkaIO.Read<K,V> commitOffsetsInFinalize()
```
    Finalized offsets are committed to Kafka. See UnboundedSource.CheckpointMark.finalizeCheckpoint(). It helps with minimizing gaps or duplicate processing of records while restarting a pipeline from scratch. But it does not provide hard processing guarantees. There could be a short delay to commit after UnboundedSource.CheckpointMark.finalizeCheckpoint() is invoked, as reader might be blocked on reading from Kafka. Note that it is independent of 'AUTO_COMMIT' Kafka consumer configuration. Usually either this or AUTO_COMMIT in Kafka consumer is enabled, but not both.
  - withOffsetConsumerConfigOverrides
```
public KafkaIO.Read<K,V> withOffsetConsumerConfigOverrides(java.util.Map<java.lang.String,java.lang.Object> offsetConsumerConfig)
```
    Set additional configuration for the backend offset consumer. It may be required for a secured Kafka cluster, especially when you see similar WARN log message 'exception while fetching latest offset for partition {}. will be retried'.
    In KafkaIO.read(), there're two consumers running in the backend actually:
    1. the main consumer, which reads data from kafka;
    2. the secondary offset consumer, which is used to estimate backlog, by fetching latest offset;
    
    By default, offset consumer inherits the configuration from main consumer, with an auto-generated ConsumerConfig.GROUP_ID_CONFIG. This may not work in a secured Kafka which requires more configurations.
  - withoutMetadata
```
public PTransform<PBegin,PCollection<KV<K,V>>> withoutMetadata()
```
    Returns a PTransform for PCollection of KV, dropping Kafka metatdata.
  - expand
```
public PCollection<KafkaRecord<K,V>> expand(PBegin input)
```
    Description copied from class: PTransform
    
    Override this method to specify how this PTransform should be expanded on the given InputT.
    NOTE: This method should not be called directly. Instead apply the PTransform should be applied to the InputT using the apply method.
    Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
    
    Specified by:
    
    expand in class PTransform<PBegin,PCollection<KafkaRecord<K,V>>>
  - populateDisplayData
```
public void populateDisplayData(DisplayData.Builder builder)
```
    Description copied from class: PTransform
    
    Register display data for the given transform or component.
    populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace of the subcomponent.
    By default, does not register any display data. Implementors may override this method to provide their own display data.
    
    Specified by:
    
    populateDisplayData in interface HasDisplayData
    
    Overrides:
    
    populateDisplayData in class PTransform<PBegin,PCollection<KafkaRecord<K,V>>>
    
    Parameters:
    
    builder - The builder to populate with display data.
    
    See Also:
    
    HasDisplayData

Class KafkaIO.Read<K,V>

Nested Class Summary

Field Summary

Fields inherited from class org.apache.beam.sdk.transforms.PTransform

Constructor Summary

Method Summary

Methods inherited from class org.apache.beam.sdk.transforms.PTransform

Methods inherited from class java.lang.Object

Constructor Detail

Read

Method Detail

withBootstrapServers

withTopic

withTopics

withTopicPartitions

withKeyDeserializer

withKeyDeserializerAndCoder

withValueDeserializer

withValueDeserializerAndCoder

withConsumerFactoryFn

updateConsumerProperties

withMaxNumRecords

withStartReadTime

withMaxReadTime

withLogAppendTime

withProcessingTime

withCreateTime

withTimestampPolicyFactory

withTimestampFn2

withWatermarkFn2

withTimestampFn

withWatermarkFn

withReadCommitted

commitOffsetsInFinalize

withOffsetConsumerConfigOverrides

withoutMetadata

expand

populateDisplayData