Annotation for the method that returns the corresponding size for an element and restriction
double getSize(InputT element, RestrictionT restriction);
Returns a double representing the size of the element and restriction.
A representation for the amount of known work represented as a size. Size representations
should preferably represent a linear space and be comparable within the same partition (see
DoFn.GetPartition for details on partition identifiers}).
DoFns should only provide this method if the default implementation
RestrictionTracker is an inaccurate representation of known work.
It is up to each splittable to convert between their natural representation of
outstanding work and this representation. For example:
- Block based file source (e.g. Avro): From the end of the current block, the remaining
number of bytes to the end of the restriction.
- Pull based queue based source (e.g. Pubsub): The local/global size available in number of
messages or number of
message bytes that have not been processed.
- Key range based source (e.g. Shuffle, Bigtable, ...): Scale the start key to be one and
end key to be zero and interpolate the position of the next splittable key as the size.
If information about the probability density function or cumulative distribution function
is available, size interpolation can be improved. Alternatively, if the number of encoded
bytes for the keys and values is known for the key range, the number of remaining bytes
can be used.