Class QueryChangeStreamAction

java.lang.Object
org.apache.beam.sdk.io.gcp.spanner.changestreams.action.QueryChangeStreamAction

public class QueryChangeStreamAction extends Object
Main action class for querying a partition change stream. This class will perform the change stream query and depending on the record type received, it will dispatch the processing of it to one of the following: ChildPartitionsRecordAction, HeartbeatRecordAction, DataChangeRecordAction, PartitionStartRecordAction, PartitionEndRecordAction or PartitionEventRecordAction.

This class will also make sure to mirror the current watermark (event timestamp processed) in the Connector's metadata tables, by registering a bundle after commit action.

When the change stream query for the partition is finished, this class will update the state of the partition in the metadata tables as FINISHED, indicating completion.

  • Method Details

    • run

      public DoFn.ProcessContinuation run(PartitionMetadata partition, RestrictionTracker<TimestampRange,com.google.cloud.Timestamp> tracker, DoFn.OutputReceiver<DataChangeRecord> receiver, ManualWatermarkEstimator<Instant> watermarkEstimator, DoFn.BundleFinalizer bundleFinalizer)
      This method will dispatch a change stream query for the given partition, it delegate the processing of the records to one of the corresponding action classes registered and it will keep the state of the partition up to date in the Connector's metadata table.

      The algorithm is as follows:

      1. A change stream query for the partition is performed.
      2. For each record, we check the type of the record and dispatch the processing to one of the actions registered.
      3. If an Optional with a DoFn.ProcessContinuation.stop() is returned from the actions, we stop processing and return.
      4. Before returning we register a bundle finalizer callback to update the watermark of the partition in the metadata tables to the latest processed timestamp.
      5. When a change stream query finishes successfully (no more records) we update the partition state to FINISHED.
      There might be cases where due to a split at the exact end timestamp of a partition's change stream query, this function could process a residual with an invalid timestamp. In this case, the error is ignored and no work is done for the residual.
      Parameters:
      partition - the current partition being processed
      tracker - the restriction tracker of the ReadChangeStreamPartitionDoFn SDF
      receiver - the output receiver of the ReadChangeStreamPartitionDoFn SDF
      watermarkEstimator - the watermark estimator of the ReadChangeStreamPartitionDoFn SDF
      bundleFinalizer - the bundle finalizer for ReadChangeStreamPartitionDoFn SDF bundles
      Returns:
      a DoFn.ProcessContinuation.stop() if a record timestamp could not be claimed or if the partition processing has finished