Annotation for the method that returns the corresponding size for an element and restriction
pair.
Signature: double getSize(InputT element, RestrictionT restriction);
Returns a double representing the size of the element and restriction.
Splittable DoFn
s should only provide this method if the default implementation
within the RestrictionTracker
is an inaccurate representation of known work.
It is up to each splittable to convert between their natural representation of
outstanding work and this representation. For example:
- Block based file source (e.g. Avro): From the end of the current block, the remaining
number of bytes to the end of the restriction.
- Pull based queue based source (e.g. Pubsub): The local/global size available in number of
messages or number of
message bytes
that have not been processed.
- Key range based source (e.g. Shuffle, Bigtable, ...): Scale the start key to be one and
end key to be zero and interpolate the position of the next splittable key as the size.
If information about the probability density function or cumulative distribution function
is available, size interpolation can be improved. Alternatively, if the number of encoded
bytes for the keys and values is known for the key range, the number of remaining bytes
can be used.