Class DebeziumIO.Read<T>
- All Implemented Interfaces:
Serializable,HasDisplayData
- Enclosing class:
DebeziumIO
DebeziumIO.read().- See Also:
-
Field Summary
Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionOverride this method to specify how thisPTransformshould be expanded on the givenInputT.protected SchemaApplies aCoderto the connector.Applies the given configuration to the connector.withFormatFunction(SourceRecordMapper<T> mapperFn) Applies aSourceRecordMapperto the connector.withMaxNumberOfRecords(Integer maxNumberOfRecords) Once the specified number of records has been reached, it will stop fetching them.withMaxTimeToRun(Long miliseconds) Once the connector has run for the determined amount of time, it will stop.withOffsetRetainer(OffsetRetainer retainer) Sets anOffsetRetainerthat automatically saves and restores the connector offset, allowing the pipeline to resume from where it left off after a restart without any manual offset management.withPollingTimeout(Long miliseconds) Sets the timeout in milliseconds for consumer polling request in theKafkaSourceConsumerFn.withStartOffset(Map<String, Object> startOffset) Sets a starting offset so the connector resumes consuming changes from a previously seen position rather than from the beginning of the change stream.Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setDisplayData, setResourceHints, toString, validate, validate
-
Constructor Details
-
Read
public Read()
-
-
Method Details
-
withConnectorConfiguration
Applies the given configuration to the connector. It cannot be null.- Parameters:
config- Configuration to be used within the connector.- Returns:
- PTransform
DebeziumIO.read()
-
withFormatFunction
Applies aSourceRecordMapperto the connector. It cannot be null.- Parameters:
mapperFn- the mapper function to be used on eachSourceRecord.- Returns:
- PTransform
DebeziumIO.read()
-
withCoder
Applies aCoderto the connector. It cannot be null- Parameters:
coder- The Coder to be used over the data.- Returns:
- PTransform
DebeziumIO.read()
-
withMaxNumberOfRecords
Once the specified number of records has been reached, it will stop fetching them. The value can be null (default) which means it will not stop.- Parameters:
maxNumberOfRecords- The maximum number of records to be fetched before stop.- Returns:
- PTransform
DebeziumIO.read()
-
withMaxTimeToRun
Once the connector has run for the determined amount of time, it will stop. The value can be null (default) which means it will not stop. This parameter is mainly intended for testing.- Parameters:
miliseconds- The maximum number of miliseconds to run before stopping the connector.- Returns:
- PTransform
DebeziumIO.read()
-
withPollingTimeout
Sets the timeout in milliseconds for consumer polling request in theKafkaSourceConsumerFn. A lower timeout optimizes for latency. Increase the timeout if the consumer is not fetching any records. The default is 1000 milliseconds. -
withStartOffset
Sets a starting offset so the connector resumes consuming changes from a previously seen position rather than from the beginning of the change stream.The offset format is connector-specific. You can capture the current offset for each processed record inside your
SourceRecordMapperviaSourceRecord.sourceOffset()and persist it externally (for example in Cloud Storage, a database, or a local file). On the next pipeline run, pass the last saved offset here.Example (PostgreSQL):
// Capture the offset inside the SourceRecordMapper: Map<String, Object> offset = sourceRecord.sourceOffset(); // Persist 'offset' externally, then on restart: DebeziumIO.read() .withConnectorConfiguration(config) .withStartOffset(savedOffset) .withFormatFunction(myMapper);- Parameters:
startOffset- A map representing the resumption point, as returned bySourceRecord#sourceOffset().- Returns:
- PTransform
DebeziumIO.read()
-
withOffsetRetainer
Sets anOffsetRetainerthat automatically saves and restores the connector offset, allowing the pipeline to resume from where it left off after a restart without any manual offset management.When a retainer is configured:
- At pipeline startup,
OffsetRetainer.loadOffset()is called. If a saved offset is found, the connector resumes from that position; otherwise it starts from the beginning of the change stream. - After each successful checkpoint (
task.commit()),OffsetRetainer.saveOffset(Map)is called with the latest committed offset.
The built-in
FileSystemOffsetRetainerpersists the offset as a JSON file on any Beam-compatible filesystem (local, GCS, S3, etc.):DebeziumIO.read() .withConnectorConfiguration(config) .withOffsetRetainer( new FileSystemOffsetRetainer("gs://my-bucket/debezium/orders-offset.json")) .withFormatFunction(myMapper);When both a retainer and
withStartOffset(Map)are set, the retainer takes precedence. UsewithStartOffset(Map)alone for a one-time manual override.- Parameters:
retainer- TheOffsetRetainerto use for loading and saving offsets.- Returns:
- PTransform
DebeziumIO.read()
- At pipeline startup,
-
getRecordSchema
-
expand
Description copied from class:PTransformOverride this method to specify how thisPTransformshould be expanded on the givenInputT.NOTE: This method should not be called directly. Instead apply the
PTransformshould be applied to theInputTusing theapplymethod.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
- Specified by:
expandin classPTransform<PBegin,PCollection<T>>
-