Class DetectNewPartitionsDoFn
java.lang.Object
org.apache.beam.sdk.transforms.DoFn<PartitionMetadata,PartitionMetadata>
org.apache.beam.sdk.io.gcp.spanner.changestreams.dofn.DetectNewPartitionsDoFn
- All Implemented Interfaces:
Serializable
,HasDisplayData
@UnboundedPerElement
public class DetectNewPartitionsDoFn
extends DoFn<PartitionMetadata,PartitionMetadata>
A SplittableDoFn (SDF) that is responsible for scheduling partitions to be queried. This
component will periodically scan the partition metadata table looking for partitions in the
PartitionMetadata.State.CREATED
, update their state to PartitionMetadata.State.SCHEDULED
and output them to the next
stage in the pipeline.- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.beam.sdk.transforms.DoFn
DoFn.AlwaysFetched, DoFn.BoundedPerElement, DoFn.BundleFinalizer, DoFn.Element, DoFn.FieldAccess, DoFn.FinishBundle, DoFn.FinishBundleContext, DoFn.GetInitialRestriction, DoFn.GetInitialWatermarkEstimatorState, DoFn.GetRestrictionCoder, DoFn.GetSize, DoFn.GetWatermarkEstimatorStateCoder, DoFn.Key, DoFn.MultiOutputReceiver, DoFn.NewTracker, DoFn.NewWatermarkEstimator, DoFn.OnTimer, DoFn.OnTimerContext, DoFn.OnTimerFamily, DoFn.OnWindowExpiration, DoFn.OnWindowExpirationContext, DoFn.OutputReceiver<T>, DoFn.ProcessContext, DoFn.ProcessContinuation, DoFn.ProcessElement, DoFn.RequiresStableInput, DoFn.RequiresTimeSortedInput, DoFn.Restriction, DoFn.Setup, DoFn.SideInput, DoFn.SplitRestriction, DoFn.StartBundle, DoFn.StartBundleContext, DoFn.StateId, DoFn.Teardown, DoFn.TimerFamily, DoFn.TimerId, DoFn.Timestamp, DoFn.TruncateRestriction, DoFn.UnboundedPerElement, DoFn.WatermarkEstimatorState, DoFn.WindowedContext
-
Constructor Summary
ConstructorsConstructorDescriptionDetectNewPartitionsDoFn
(DaoFactory daoFactory, MapperFactory mapperFactory, ActionFactory actionFactory, CacheFactory cacheFactory, ChangeStreamMetrics metrics) This class needs aDaoFactory
to build DAOs to access the partition metadata tables. -
Method Summary
Modifier and TypeMethodDescriptiondouble
getSize
(TimestampRange restriction) initialRestriction
(PartitionMetadata partition) Uses anTimestampRange
with a max range.newTracker
(TimestampRange restriction) newWatermarkEstimator
(Instant watermarkEstimatorState) processElement
(RestrictionTracker<TimestampRange, com.google.cloud.Timestamp> tracker, DoFn.OutputReceiver<PartitionMetadata> receiver, ManualWatermarkEstimator<Instant> watermarkEstimator) Main processing function for theDetectNewPartitionsDoFn
function.void
setAveragePartitionBytesSize
(long averagePartitionBytesSize) Sets the average partition bytes size to estimate the backlog of this DoFn.void
setup()
Obtains the instance ofDetectNewPartitionsAction
.Methods inherited from class org.apache.beam.sdk.transforms.DoFn
getAllowedTimestampSkew, getInputTypeDescriptor, getOutputTypeDescriptor, populateDisplayData, prepareForProcessing
-
Constructor Details
-
DetectNewPartitionsDoFn
public DetectNewPartitionsDoFn(DaoFactory daoFactory, MapperFactory mapperFactory, ActionFactory actionFactory, CacheFactory cacheFactory, ChangeStreamMetrics metrics) This class needs aDaoFactory
to build DAOs to access the partition metadata tables. It uses mappers to transform database rows into thePartitionMetadata
model. It builds the delegating action class using theActionFactory
. It emits metrics for the partitions read using theChangeStreamMetrics
. It re-schedules the process element function to be executed according to the default resume interval as inDEFAULT_RESUME_DURATION
(best effort).- Parameters:
daoFactory
- theDaoFactory
to constructPartitionMetadataDao
smapperFactory
- theMapperFactory
to constructPartitionMetadataMapper
sactionFactory
- theActionFactory
to construct actionsmetrics
- theChangeStreamMetrics
to emit partition related metrics
-
-
Method Details
-
getInitialWatermarkEstimatorState
@GetInitialWatermarkEstimatorState public Instant getInitialWatermarkEstimatorState(@Element PartitionMetadata partition) -
newWatermarkEstimator
@NewWatermarkEstimator public ManualWatermarkEstimator<Instant> newWatermarkEstimator(@WatermarkEstimatorState Instant watermarkEstimatorState) -
initialRestriction
@GetInitialRestriction public TimestampRange initialRestriction(@Element PartitionMetadata partition) Uses anTimestampRange
with a max range. This is because it does not know beforehand how many partitions it will schedule.- Returns:
- the timestamp range for the component
-
getSize
-
newTracker
@NewTracker public DetectNewPartitionsRangeTracker newTracker(@Restriction TimestampRange restriction) -
setup
Obtains the instance ofDetectNewPartitionsAction
. -
processElement
@ProcessElement public DoFn.ProcessContinuation processElement(RestrictionTracker<TimestampRange, com.google.cloud.Timestamp> tracker, DoFn.OutputReceiver<PartitionMetadata> receiver, ManualWatermarkEstimator<Instant> watermarkEstimator) Main processing function for theDetectNewPartitionsDoFn
function. It will delegate to theDetectNewPartitionsAction
class. -
setAveragePartitionBytesSize
public void setAveragePartitionBytesSize(long averagePartitionBytesSize) Sets the average partition bytes size to estimate the backlog of this DoFn. Must be called after the initialization of this DoFn.- Parameters:
averagePartitionBytesSize
- the estimated average size of a partition record used in the backlog bytes calculation (DoFn.GetSize
)
-