Class ReadChangeStreamPartitionDoFn
java.lang.Object
org.apache.beam.sdk.transforms.DoFn<PartitionMetadata,DataChangeRecord>
org.apache.beam.sdk.io.gcp.spanner.changestreams.dofn.ReadChangeStreamPartitionDoFn
- All Implemented Interfaces:
Serializable
,HasDisplayData
@UnboundedPerElement
public class ReadChangeStreamPartitionDoFn
extends DoFn<PartitionMetadata,DataChangeRecord>
implements Serializable
A SDF (Splittable DoFn) class which is responsible for performing a change stream query for a
given partition. A different action will be taken depending on the type of record received from
the query. This component will also reflect the partition state in the partition metadata tables.
The processing of a partition is delegated to the QueryChangeStreamAction
.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.beam.sdk.transforms.DoFn
DoFn.AlwaysFetched, DoFn.BoundedPerElement, DoFn.BundleFinalizer, DoFn.Element, DoFn.FieldAccess, DoFn.FinishBundle, DoFn.FinishBundleContext, DoFn.GetInitialRestriction, DoFn.GetInitialWatermarkEstimatorState, DoFn.GetRestrictionCoder, DoFn.GetSize, DoFn.GetWatermarkEstimatorStateCoder, DoFn.Key, DoFn.MultiOutputReceiver, DoFn.NewTracker, DoFn.NewWatermarkEstimator, DoFn.OnTimer, DoFn.OnTimerContext, DoFn.OnTimerFamily, DoFn.OnWindowExpiration, DoFn.OnWindowExpirationContext, DoFn.OutputReceiver<T>, DoFn.ProcessContext, DoFn.ProcessContinuation, DoFn.ProcessElement, DoFn.RequiresStableInput, DoFn.RequiresTimeSortedInput, DoFn.Restriction, DoFn.Setup, DoFn.SideInput, DoFn.SplitRestriction, DoFn.StartBundle, DoFn.StartBundleContext, DoFn.StateId, DoFn.Teardown, DoFn.TimerFamily, DoFn.TimerId, DoFn.Timestamp, DoFn.TruncateRestriction, DoFn.UnboundedPerElement, DoFn.WatermarkEstimatorState, DoFn.WindowedContext
-
Constructor Summary
ConstructorsConstructorDescriptionReadChangeStreamPartitionDoFn
(DaoFactory daoFactory, MapperFactory mapperFactory, ActionFactory actionFactory, ChangeStreamMetrics metrics) This class needs aDaoFactory
to build DAOs to access the partition metadata tables and to perform the change streams query. -
Method Summary
Modifier and TypeMethodDescriptiondouble
getSize
(PartitionMetadata partition, TimestampRange range) initialRestriction
(PartitionMetadata partition) The restriction for a partition will be defined from the start and end timestamp to query the partition for.newTracker
(PartitionMetadata partition, TimestampRange range) newWatermarkEstimator
(Instant watermarkEstimatorState) processElement
(PartitionMetadata partition, RestrictionTracker<TimestampRange, com.google.cloud.Timestamp> tracker, DoFn.OutputReceiver<DataChangeRecord> receiver, ManualWatermarkEstimator<Instant> watermarkEstimator, DoFn.BundleFinalizer bundleFinalizer) Performs a change stream query for a given partition.void
setThroughputEstimator
(BytesThroughputEstimator<DataChangeRecord> throughputEstimator) Sets the estimator to calculate the backlog of this function.void
setup()
Constructs instances for thePartitionMetadataDao
,ChangeStreamDao
,ChangeStreamRecordMapper
,PartitionMetadataMapper
,DataChangeRecordAction
,HeartbeatRecordAction
,ChildPartitionsRecordAction
,PartitionStartRecordAction
,PartitionEndRecordAction
,PartitionEventRecordAction
andQueryChangeStreamAction
.Methods inherited from class org.apache.beam.sdk.transforms.DoFn
getAllowedTimestampSkew, getInputTypeDescriptor, getOutputTypeDescriptor, populateDisplayData, prepareForProcessing
-
Constructor Details
-
ReadChangeStreamPartitionDoFn
public ReadChangeStreamPartitionDoFn(DaoFactory daoFactory, MapperFactory mapperFactory, ActionFactory actionFactory, ChangeStreamMetrics metrics) This class needs aDaoFactory
to build DAOs to access the partition metadata tables and to perform the change streams query. It uses mappers to transform database rows into theChangeStreamRecord
model. It uses theActionFactory
to construct the action dispatchers, which will perform the change stream query and process each type of record received. It emits metrics for the partition using theChangeStreamMetrics
.- Parameters:
daoFactory
- theDaoFactory
to constructPartitionMetadataDao
s andChangeStreamDao
smapperFactory
- theMapperFactory
to constructChangeStreamRecordMapper
sactionFactory
- theActionFactory
to construct actionsmetrics
- theChangeStreamMetrics
to emit partition related metrics
-
-
Method Details
-
getInitialWatermarkEstimatorState
@GetInitialWatermarkEstimatorState public Instant getInitialWatermarkEstimatorState(@Element PartitionMetadata partition) -
newWatermarkEstimator
@NewWatermarkEstimator public ManualWatermarkEstimator<Instant> newWatermarkEstimator(@WatermarkEstimatorState Instant watermarkEstimatorState) -
initialRestriction
@GetInitialRestriction public TimestampRange initialRestriction(@Element PartitionMetadata partition) The restriction for a partition will be defined from the start and end timestamp to query the partition for. TheTimestampRange
restriction represents a closed-open interval, while the start / end timestamps represent a closed-closed interval, so we add 1 nanosecond to the end timestamp to convert it to closed-open.In this function we also update the partition state to
PartitionMetadata.State.RUNNING
.- Parameters:
partition
- the partition to be queried- Returns:
- the timestamp range from the partition start timestamp to the partition end timestamp + 1 nanosecond
-
getSize
@GetSize public double getSize(@Element PartitionMetadata partition, @Restriction TimestampRange range) throws Exception - Throws:
Exception
-
newTracker
@NewTracker public ReadChangeStreamPartitionRangeTracker newTracker(@Element PartitionMetadata partition, @Restriction TimestampRange range) -
setup
Constructs instances for thePartitionMetadataDao
,ChangeStreamDao
,ChangeStreamRecordMapper
,PartitionMetadataMapper
,DataChangeRecordAction
,HeartbeatRecordAction
,ChildPartitionsRecordAction
,PartitionStartRecordAction
,PartitionEndRecordAction
,PartitionEventRecordAction
andQueryChangeStreamAction
. -
processElement
@ProcessElement public DoFn.ProcessContinuation processElement(@Element PartitionMetadata partition, RestrictionTracker<TimestampRange, com.google.cloud.Timestamp> tracker, DoFn.OutputReceiver<DataChangeRecord> receiver, ManualWatermarkEstimator<Instant> watermarkEstimator, DoFn.BundleFinalizer bundleFinalizer) Performs a change stream query for a given partition. A different action will be taken depending on the type of record received from the query. This component will also reflect the partition state in the partition metadata tables.The processing of a partition is delegated to the
QueryChangeStreamAction
.- Parameters:
partition
- the partition to be queriedtracker
- an instance ofReadChangeStreamPartitionRangeTracker
receiver
- aDataChangeRecord
DoFn.OutputReceiver
watermarkEstimator
- aManualWatermarkEstimator
ofInstant
bundleFinalizer
- the bundle finalizer- Returns:
- a
DoFn.ProcessContinuation.stop()
if a record timestamp could not be claimed or if the partition processing has finished
-
setThroughputEstimator
Sets the estimator to calculate the backlog of this function. Must be called after the initialization of this DoFn.- Parameters:
throughputEstimator
- an estimator to calculate local throughput.
-