Class DetectNewPartitionsDoFn
java.lang.Object
org.apache.beam.sdk.transforms.DoFn<PartitionMetadata,PartitionMetadata>
org.apache.beam.sdk.io.gcp.spanner.changestreams.dofn.DetectNewPartitionsDoFn
- All Implemented Interfaces:
Serializable,HasDisplayData
@UnboundedPerElement
public class DetectNewPartitionsDoFn
extends DoFn<PartitionMetadata,PartitionMetadata>
A SplittableDoFn (SDF) that is responsible for scheduling partitions to be queried. This
component will periodically scan the partition metadata table looking for partitions in the
PartitionMetadata.State.CREATED, update their state to PartitionMetadata.State.SCHEDULED and output them to the next
stage in the pipeline.- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.beam.sdk.transforms.DoFn
DoFn.AlwaysFetched, DoFn.BoundedPerElement, DoFn.BundleFinalizer, DoFn.Element, DoFn.FieldAccess, DoFn.FinishBundle, DoFn.FinishBundleContext, DoFn.GetInitialRestriction, DoFn.GetInitialWatermarkEstimatorState, DoFn.GetRestrictionCoder, DoFn.GetSize, DoFn.GetWatermarkEstimatorStateCoder, DoFn.Key, DoFn.MultiOutputReceiver, DoFn.NewTracker, DoFn.NewWatermarkEstimator, DoFn.OnTimer, DoFn.OnTimerContext, DoFn.OnTimerFamily, DoFn.OnWindowExpiration, DoFn.OnWindowExpirationContext, DoFn.OutputReceiver<T>, DoFn.ProcessContext, DoFn.ProcessContinuation, DoFn.ProcessElement, DoFn.RequiresStableInput, DoFn.RequiresTimeSortedInput, DoFn.Restriction, DoFn.Setup, DoFn.SideInput, DoFn.SplitRestriction, DoFn.StartBundle, DoFn.StartBundleContext, DoFn.StateId, DoFn.Teardown, DoFn.TimerFamily, DoFn.TimerId, DoFn.Timestamp, DoFn.TruncateRestriction, DoFn.UnboundedPerElement, DoFn.WatermarkEstimatorState, DoFn.WindowedContext -
Constructor Summary
ConstructorsConstructorDescriptionDetectNewPartitionsDoFn(DaoFactory daoFactory, MapperFactory mapperFactory, ActionFactory actionFactory, CacheFactory cacheFactory, ChangeStreamMetrics metrics) This class needs aDaoFactoryto build DAOs to access the partition metadata tables. -
Method Summary
Modifier and TypeMethodDescriptiondoublegetSize(TimestampRange restriction) initialRestriction(PartitionMetadata partition) Uses anTimestampRangewith a max range.newTracker(TimestampRange restriction) newWatermarkEstimator(Instant watermarkEstimatorState) processElement(RestrictionTracker<TimestampRange, com.google.cloud.Timestamp> tracker, DoFn.OutputReceiver<PartitionMetadata> receiver, ManualWatermarkEstimator<Instant> watermarkEstimator) Main processing function for theDetectNewPartitionsDoFnfunction.voidsetAveragePartitionBytesSize(long averagePartitionBytesSize) Sets the average partition bytes size to estimate the backlog of this DoFn.voidsetup()Obtains the instance ofDetectNewPartitionsAction.Methods inherited from class org.apache.beam.sdk.transforms.DoFn
getAllowedTimestampSkew, getInputTypeDescriptor, getOutputTypeDescriptor, populateDisplayData, prepareForProcessing
-
Constructor Details
-
DetectNewPartitionsDoFn
public DetectNewPartitionsDoFn(DaoFactory daoFactory, MapperFactory mapperFactory, ActionFactory actionFactory, CacheFactory cacheFactory, ChangeStreamMetrics metrics) This class needs aDaoFactoryto build DAOs to access the partition metadata tables. It uses mappers to transform database rows into thePartitionMetadatamodel. It builds the delegating action class using theActionFactory. It emits metrics for the partitions read using theChangeStreamMetrics. It re-schedules the process element function to be executed according to the default resume interval as inDEFAULT_RESUME_DURATION(best effort).- Parameters:
daoFactory- theDaoFactoryto constructPartitionMetadataDaosmapperFactory- theMapperFactoryto constructPartitionMetadataMappersactionFactory- theActionFactoryto construct actionsmetrics- theChangeStreamMetricsto emit partition related metrics
-
-
Method Details
-
getInitialWatermarkEstimatorState
@GetInitialWatermarkEstimatorState public Instant getInitialWatermarkEstimatorState(@Element PartitionMetadata partition) -
newWatermarkEstimator
@NewWatermarkEstimator public ManualWatermarkEstimator<Instant> newWatermarkEstimator(@WatermarkEstimatorState Instant watermarkEstimatorState) -
initialRestriction
@GetInitialRestriction public TimestampRange initialRestriction(@Element PartitionMetadata partition) Uses anTimestampRangewith a max range. This is because it does not know beforehand how many partitions it will schedule.- Returns:
- the timestamp range for the component
-
getSize
-
newTracker
@NewTracker public DetectNewPartitionsRangeTracker newTracker(@Restriction TimestampRange restriction) -
setup
Obtains the instance ofDetectNewPartitionsAction. -
processElement
@ProcessElement public DoFn.ProcessContinuation processElement(RestrictionTracker<TimestampRange, com.google.cloud.Timestamp> tracker, DoFn.OutputReceiver<PartitionMetadata> receiver, ManualWatermarkEstimator<Instant> watermarkEstimator) Main processing function for theDetectNewPartitionsDoFnfunction. It will delegate to theDetectNewPartitionsActionclass. -
setAveragePartitionBytesSize
public void setAveragePartitionBytesSize(long averagePartitionBytesSize) Sets the average partition bytes size to estimate the backlog of this DoFn. Must be called after the initialization of this DoFn.- Parameters:
averagePartitionBytesSize- the estimated average size of a partition record used in the backlog bytes calculation (DoFn.GetSize)
-