apache_beam.io.fileio module¶
PTransforms
for manipulating files in Apache Beam.
Provides reading PTransform
s, MatchFiles
,
MatchAll
, that produces a PCollection
of records representing a file
and its metadata; and ReadMatches
, which takes in a PCollection
of file
metadata records, and produces a PCollection
of ReadableFile
objects.
These transforms currently do not support splitting by themselves.
No backward compatibility guarantees. Everything in this module is experimental.
-
class
apache_beam.io.fileio.
EmptyMatchTreatment
[source]¶ Bases:
object
How to treat empty matches in
MatchAll
andMatchFiles
transforms.If empty matches are disallowed, an error will be thrown if a pattern does not match any files.
-
ALLOW
= 'ALLOW'¶
-
DISALLOW
= 'DISALLOW'¶
-
ALLOW_IF_WILDCARD
= 'ALLOW_IF_WILDCARD'¶
-
-
class
apache_beam.io.fileio.
MatchFiles
(file_pattern, empty_match_treatment='ALLOW_IF_WILDCARD')[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
Matches a file pattern using
FileSystems.match
.This
PTransform
returns aPCollection
of matching files in the form ofFileMetadata
objects.
-
class
apache_beam.io.fileio.
MatchAll
(empty_match_treatment='ALLOW')[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
Matches file patterns from the input PCollection via
FileSystems.match
.This
PTransform
returns aPCollection
of matching files in the form ofFileMetadata
objects.
-
class
apache_beam.io.fileio.
ReadableFile
(metadata)[source]¶ Bases:
object
A utility class for accessing files.
-
class
apache_beam.io.fileio.
ReadMatches
(compression=None, skip_directories=True)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
Converts each result of MatchFiles() or MatchAll() to a ReadableFile.
This helps read in a file’s contents or obtain a file descriptor.