apache_beam.io.tfrecordio module¶
TFRecord sources and sinks.
-
class
apache_beam.io.tfrecordio.
ReadAllFromTFRecord
(coder=BytesCoder, compression_type='auto', with_filename=False)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
A
PTransform
for reading aPCollection
of TFRecord files.Initialize the
ReadAllFromTFRecord
transform.Parameters: - coder – Coder used to decode each record.
- compression_type – Used to handle compressed input files. Default value is CompressionTypes.AUTO, in which case the file_path’s extension will be used to detect the compression.
- with_filename – If True, returns a Key Value with the key being the file name and the value being the actual data. If False, it only returns the data.
-
class
apache_beam.io.tfrecordio.
ReadFromTFRecord
(file_pattern, coder=BytesCoder, compression_type='auto', validate=True)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
Transform for reading TFRecord sources.
Initialize a ReadFromTFRecord transform.
Parameters: - file_pattern – A file glob pattern to read TFRecords from.
- coder – Coder used to decode each record.
- compression_type – Used to handle compressed input files. Default value is CompressionTypes.AUTO, in which case the file_path’s extension will be used to detect the compression.
- validate – Boolean flag to verify that the files exist during the pipeline creation time.
Returns: A ReadFromTFRecord transform object.
-
class
apache_beam.io.tfrecordio.
WriteToTFRecord
(file_path_prefix, coder=BytesCoder, file_name_suffix='', num_shards=0, shard_name_template=None, compression_type='auto')[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
Transform for writing to TFRecord sinks.
Initialize WriteToTFRecord transform.
Parameters: - file_path_prefix – The file path to write to. The files written will begin with this prefix, followed by a shard identifier (see num_shards), and end in a common extension, if given by file_name_suffix.
- coder – Coder used to encode each record.
- file_name_suffix – Suffix for the files written.
- num_shards – The number of files (shards) used for output. If not set, the default value will be used.
- shard_name_template – A template string containing placeholders for the shard number and shard count. When constructing a filename for a particular shard number, the upper-case letters ‘S’ and ‘N’ are replaced with the 0-padded shard number and shard count respectively. This argument can be ‘’ in which case it behaves as if num_shards was set to 1 and only one file will be generated. The default pattern used is ‘-SSSSS-of-NNNNN’ if None is passed as the shard_name_template.
- compression_type – Used to handle compressed output files. Typical value is CompressionTypes.AUTO, in which case the file_path’s extension will be used to detect the compression.
Returns: A WriteToTFRecord transform object.