apache_beam.io.fileio module

PTransforms for manipulating files in Apache Beam.

Provides reading PTransforms, MatchFiles, MatchAll, that produces a PCollection of records representing a file and its metadata; and ReadMatches, which takes in a PCollection of file metadata records, and produces a PCollection of ReadableFile objects. These transforms currently do not support splitting by themselves.

No backward compatibility guarantees. Everything in this module is experimental.

class apache_beam.io.fileio.EmptyMatchTreatment[source]

Bases: object

How to treat empty matches in MatchAll and MatchFiles transforms.

If empty matches are disallowed, an error will be thrown if a pattern does not match any files.

ALLOW = 'ALLOW'
DISALLOW = 'DISALLOW'
ALLOW_IF_WILDCARD = 'ALLOW_IF_WILDCARD'
static allow_empty_match(pattern, setting)[source]
class apache_beam.io.fileio.MatchFiles(file_pattern, empty_match_treatment='ALLOW_IF_WILDCARD')[source]

Bases: apache_beam.transforms.ptransform.PTransform

Matches a file pattern using FileSystems.match.

This PTransform returns a PCollection of matching files in the form of FileMetadata objects.

expand(pcoll)[source]
class apache_beam.io.fileio.MatchAll(empty_match_treatment='ALLOW')[source]

Bases: apache_beam.transforms.ptransform.PTransform

Matches file patterns from the input PCollection via FileSystems.match.

This PTransform returns a PCollection of matching files in the form of FileMetadata objects.

expand(pcoll)[source]
class apache_beam.io.fileio.ReadableFile(metadata)[source]

Bases: object

A utility class for accessing files.

open(mime_type='text/plain')[source]
read()[source]
read_utf8()[source]
class apache_beam.io.fileio.ReadMatches(compression=None, skip_directories=True)[source]

Bases: apache_beam.transforms.ptransform.PTransform

Converts each result of MatchFiles() or MatchAll() to a ReadableFile.

This helps read in a file’s contents or obtain a file descriptor.

expand(pcoll)[source]