apache_beam.transforms.xlang.io module

Cross-language transforms in this module can be imported from the apache_beam.io package.

class apache_beam.transforms.xlang.io.GenerateSequence(start, end=None, rate=None, expansion_service=None)[source]

Bases: ExternalTransform

Outputs a PCollection of Beam Rows, each containing a single INT64 number called “value”. The count is produced from the given “start” value and either up to the given “end” or until 2^63 - 1. To produce an unbounded PCollection, simply do not specify an “end” value. Unbounded sequences can specify a “rate” for output elements. In all cases, the sequence of numbers is generated in parallel, so there is no inherent ordering between the generated values

Parameters:
  • start – (int64) The minimum number to generate (inclusive).

  • end – (int64) The maximum number to generate (exclusive). Will be an unbounded sequence if left unspecified.

  • rate – (Row(elements=<class ‘int64’>, seconds=typing.Optional[int64])) Specifies the rate to generate a given number of elements per a given number of seconds. Applicable only to unbounded sequences.

identifier: str = 'beam:schematransform:org.apache.beam:generate_sequence:v1'
class apache_beam.transforms.xlang.io.TfrecordRead(compression, file_pattern, validate, error_handling=None, expansion_service=None)[source]

Bases: ExternalTransform

Parameters:
  • compression – (str) Decompression type to use when reading input files.

  • file_pattern – (str) Filename or file pattern used to find input files.

  • validate – (boolean) Validate file pattern.

  • error_handling – (Row(output=<class ‘str’>)) This option specifies whether and where to output unwritable rows.

identifier: str = 'beam:schematransform:org.apache.beam:tfrecord_read:v1'
class apache_beam.transforms.xlang.io.TfrecordWrite(compression, num_shards, output_prefix, error_handling=None, filename_suffix=None, no_spilling=None, shard_template=None, expansion_service=None)[source]

Bases: ExternalTransform

Parameters:
  • compression – (str) Option to indicate the output sink’s compression type. Default is NONE.

  • num_shards – (int32) The number of shards to use, or 0 for automatic.

  • output_prefix – (str) The directory to which files will be written.

  • error_handling – (Row(output=<class ‘str’>)) This option specifies whether and where to output unwritable rows.

  • filename_suffix – (str) The suffix of each file written, combined with prefix and shardTemplate.

  • no_spilling – (boolean) Whether to skip the spilling of data caused by having maxNumWritersPerBundle.

  • shard_template – (str) The shard template of each file written, combined with prefix and suffix.

identifier: str = 'beam:schematransform:org.apache.beam:tfrecord_write:v1'