Class DebeziumIO.Read<T>

java.lang.Object
org.apache.beam.sdk.transforms.PTransform<PBegin,PCollection<T>>
org.apache.beam.io.debezium.DebeziumIO.Read<T>
All Implemented Interfaces:
Serializable, HasDisplayData
Enclosing class:
DebeziumIO

public abstract static class DebeziumIO.Read<T> extends PTransform<PBegin,PCollection<T>>
Implementation of DebeziumIO.read().
See Also:
  • Constructor Details

    • Read

      public Read()
  • Method Details

    • withConnectorConfiguration

      public DebeziumIO.Read<T> withConnectorConfiguration(DebeziumIO.ConnectorConfiguration config)
      Applies the given configuration to the connector. It cannot be null.
      Parameters:
      config - Configuration to be used within the connector.
      Returns:
      PTransform DebeziumIO.read()
    • withFormatFunction

      public DebeziumIO.Read<T> withFormatFunction(SourceRecordMapper<T> mapperFn)
      Applies a SourceRecordMapper to the connector. It cannot be null.
      Parameters:
      mapperFn - the mapper function to be used on each SourceRecord.
      Returns:
      PTransform DebeziumIO.read()
    • withCoder

      public DebeziumIO.Read<T> withCoder(Coder<T> coder)
      Applies a Coder to the connector. It cannot be null
      Parameters:
      coder - The Coder to be used over the data.
      Returns:
      PTransform DebeziumIO.read()
    • withMaxNumberOfRecords

      public DebeziumIO.Read<T> withMaxNumberOfRecords(Integer maxNumberOfRecords)
      Once the specified number of records has been reached, it will stop fetching them. The value can be null (default) which means it will not stop.
      Parameters:
      maxNumberOfRecords - The maximum number of records to be fetched before stop.
      Returns:
      PTransform DebeziumIO.read()
    • withMaxTimeToRun

      public DebeziumIO.Read<T> withMaxTimeToRun(Long miliseconds)
      Once the connector has run for the determined amount of time, it will stop. The value can be null (default) which means it will not stop. This parameter is mainly intended for testing.
      Parameters:
      miliseconds - The maximum number of miliseconds to run before stopping the connector.
      Returns:
      PTransform DebeziumIO.read()
    • withPollingTimeout

      public DebeziumIO.Read<T> withPollingTimeout(Long miliseconds)
      Sets the timeout in milliseconds for consumer polling request in the KafkaSourceConsumerFn. A lower timeout optimizes for latency. Increase the timeout if the consumer is not fetching any records. The default is 1000 milliseconds.
    • withStartOffset

      public DebeziumIO.Read<T> withStartOffset(Map<String,Object> startOffset)
      Sets a starting offset so the connector resumes consuming changes from a previously seen position rather than from the beginning of the change stream.

      The offset format is connector-specific. You can capture the current offset for each processed record inside your SourceRecordMapper via SourceRecord.sourceOffset() and persist it externally (for example in Cloud Storage, a database, or a local file). On the next pipeline run, pass the last saved offset here.

      Example (PostgreSQL):

      
       // Capture the offset inside the SourceRecordMapper:
       Map<String, Object> offset = sourceRecord.sourceOffset();
       // Persist 'offset' externally, then on restart:
       DebeziumIO.read()
           .withConnectorConfiguration(config)
           .withStartOffset(savedOffset)
           .withFormatFunction(myMapper);
       
      Parameters:
      startOffset - A map representing the resumption point, as returned by SourceRecord#sourceOffset().
      Returns:
      PTransform DebeziumIO.read()
    • withOffsetRetainer

      public DebeziumIO.Read<T> withOffsetRetainer(OffsetRetainer retainer)
      Sets an OffsetRetainer that automatically saves and restores the connector offset, allowing the pipeline to resume from where it left off after a restart without any manual offset management.

      When a retainer is configured:

      1. At pipeline startup, OffsetRetainer.loadOffset() is called. If a saved offset is found, the connector resumes from that position; otherwise it starts from the beginning of the change stream.
      2. After each successful checkpoint (task.commit()), OffsetRetainer.saveOffset(Map) is called with the latest committed offset.

      The built-in FileSystemOffsetRetainer persists the offset as a JSON file on any Beam-compatible filesystem (local, GCS, S3, etc.):

      
       DebeziumIO.read()
           .withConnectorConfiguration(config)
           .withOffsetRetainer(
               new FileSystemOffsetRetainer("gs://my-bucket/debezium/orders-offset.json"))
           .withFormatFunction(myMapper);
       

      When both a retainer and withStartOffset(Map) are set, the retainer takes precedence. Use withStartOffset(Map) alone for a one-time manual override.

      Parameters:
      retainer - The OffsetRetainer to use for loading and saving offsets.
      Returns:
      PTransform DebeziumIO.read()
    • getRecordSchema

      protected Schema getRecordSchema()
    • expand

      public PCollection<T> expand(PBegin input)
      Description copied from class: PTransform
      Override this method to specify how this PTransform should be expanded on the given InputT.

      NOTE: This method should not be called directly. Instead apply the PTransform should be applied to the InputT using the apply method.

      Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).

      Specified by:
      expand in class PTransform<PBegin,PCollection<T>>