apache_beam.transforms.xlang.io module
Cross-language transforms in this module can be imported from the
apache_beam.io
package.
- class apache_beam.transforms.xlang.io.GenerateSequence(start, end=None, rate=None, expansion_service=None)[source]
Bases:
ExternalTransform
Outputs a PCollection of Beam Rows, each containing a single INT64 number called “value”. The count is produced from the given “start” value and either up to the given “end” or until 2^63 - 1. To produce an unbounded PCollection, simply do not specify an “end” value. Unbounded sequences can specify a “rate” for output elements. In all cases, the sequence of numbers is generated in parallel, so there is no inherent ordering between the generated values
- Parameters:
start – (int64) The minimum number to generate (inclusive).
end – (int64) The maximum number to generate (exclusive). Will be an unbounded sequence if left unspecified.
rate – (Row(elements=<class ‘int64’>, seconds=typing.Optional[int64])) Specifies the rate to generate a given number of elements per a given number of seconds. Applicable only to unbounded sequences.
- class apache_beam.transforms.xlang.io.TfrecordRead(compression, file_pattern, validate, error_handling=None, expansion_service=None)[source]
Bases:
ExternalTransform
- Parameters:
compression – (str) Decompression type to use when reading input files.
file_pattern – (str) Filename or file pattern used to find input files.
validate – (boolean) Validate file pattern.
error_handling – (Row(output=<class ‘str’>)) This option specifies whether and where to output unwritable rows.
- class apache_beam.transforms.xlang.io.TfrecordWrite(compression, num_shards, output_prefix, error_handling=None, filename_suffix=None, no_spilling=None, shard_template=None, expansion_service=None)[source]
Bases:
ExternalTransform
- Parameters:
compression – (str) Option to indicate the output sink’s compression type. Default is NONE.
num_shards – (int32) The number of shards to use, or 0 for automatic.
output_prefix – (str) The directory to which files will be written.
error_handling – (Row(output=<class ‘str’>)) This option specifies whether and where to output unwritable rows.
filename_suffix – (str) The suffix of each file written, combined with prefix and shardTemplate.
no_spilling – (boolean) Whether to skip the spilling of data caused by having maxNumWritersPerBundle.
shard_template – (str) The shard template of each file written, combined with prefix and suffix.