Class DetectNewPartitionsAction
java.lang.Object
org.apache.beam.sdk.io.gcp.spanner.changestreams.action.DetectNewPartitionsAction
This class is responsible for scheduling partitions. It obtains partitions to be scheduled from
the partition metadata table. The full algorithm is described in
run(RestrictionTracker, OutputReceiver, ManualWatermarkEstimator)
.-
Constructor Summary
ConstructorsConstructorDescriptionDetectNewPartitionsAction
(PartitionMetadataDao dao, PartitionMetadataMapper mapper, WatermarkCache cache, ChangeStreamMetrics metrics, Duration resumeDuration) Constructs an action class for detecting / scheduling new partitions. -
Method Summary
Modifier and TypeMethodDescriptionrun
(RestrictionTracker<TimestampRange, com.google.cloud.Timestamp> tracker, DoFn.OutputReceiver<PartitionMetadata> receiver, ManualWatermarkEstimator<Instant> watermarkEstimator) Executes the main logic to schedule new partitions.
-
Constructor Details
-
DetectNewPartitionsAction
public DetectNewPartitionsAction(PartitionMetadataDao dao, PartitionMetadataMapper mapper, WatermarkCache cache, ChangeStreamMetrics metrics, Duration resumeDuration) Constructs an action class for detecting / scheduling new partitions.
-
-
Method Details
-
run
public DoFn.ProcessContinuation run(RestrictionTracker<TimestampRange, com.google.cloud.Timestamp> tracker, DoFn.OutputReceiver<PartitionMetadata> receiver, ManualWatermarkEstimator<Instant> watermarkEstimator) Executes the main logic to schedule new partitions. It follows this procedure periodically:- Fetches the min watermark from all the unfinished partitions in the metadata tables.
- If there are no unfinished partitions, this function will stop and not be re-scheduled.
- Updates the component's watermark to the min fetched.
- Fetches the read timestamp from the restriction.
- Fetches all the partitions with a createdAt timestamp > read timestamp.
- Groups the partitions by createdAt timestamp.
- Process the groups in ascending order of createdAt timestamp (oldest first)
- For each group, updates the state to
PartitionMetadata.State.SCHEDULED
. - Tries to claim the createdAt timestamp of the group within the restriction.
- If it is possible to claim the timestamp, outputs each partition to the next stage. It then proceeds to process the next batch. When there are no more batches to process, schedules the function to resume after the configured resume duration.
- If it is not possible to claim the timestamp, stops.
- Parameters:
tracker
- an instance ofDetectNewPartitionsRangeTracker
receiver
- aPartitionMetadata
DoFn.OutputReceiver
watermarkEstimator
- aManualWatermarkEstimator
ofInstant
- Returns:
- a
DoFn.ProcessContinuation.stop()
if there are no more partitions to process orDoFn.ProcessContinuation.resume()
to re-schedule the function after the configured interval.
-