apache_beam.io.fileio module¶
PTransforms for manipulating files in Apache Beam.
Provides reading PTransforms, MatchFiles,
MatchAll, that produces a PCollection of records representing a file
and its metadata; and ReadMatches, which takes in a PCollection of file
metadata records, and produces a PCollection of ReadableFile objects.
These transforms currently do not support splitting by themselves.
No backward compatibility guarantees. Everything in this module is experimental.
-
class
apache_beam.io.fileio.EmptyMatchTreatment[source]¶ Bases:
objectHow to treat empty matches in
MatchAllandMatchFilestransforms.If empty matches are disallowed, an error will be thrown if a pattern does not match any files.
-
ALLOW= 'ALLOW'¶
-
DISALLOW= 'DISALLOW'¶
-
ALLOW_IF_WILDCARD= 'ALLOW_IF_WILDCARD'¶
-
-
class
apache_beam.io.fileio.MatchFiles(file_pattern, empty_match_treatment='ALLOW_IF_WILDCARD')[source]¶ Bases:
apache_beam.transforms.ptransform.PTransformMatches a file pattern using
FileSystems.match.This
PTransformreturns aPCollectionof matching files in the form ofFileMetadataobjects.
-
class
apache_beam.io.fileio.MatchAll(empty_match_treatment='ALLOW')[source]¶ Bases:
apache_beam.transforms.ptransform.PTransformMatches file patterns from the input PCollection via
FileSystems.match.This
PTransformreturns aPCollectionof matching files in the form ofFileMetadataobjects.
-
class
apache_beam.io.fileio.ReadableFile(metadata)[source]¶ Bases:
objectA utility class for accessing files.
-
class
apache_beam.io.fileio.ReadMatches(compression=None, skip_directories=True)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransformConverts each result of MatchFiles() or MatchAll() to a ReadableFile.
This helps read in a file’s contents or obtain a file descriptor.