apache_beam.io.filesystems module

FileSystems interface class for accessing the correct filesystem

class apache_beam.io.filesystems.FileSystems[source]

Bases: object

A class that defines the functions that can be performed on a filesystem. All methods are static and access the underlying registered filesystems.

URI_SCHEMA_PATTERN = <_sre.SRE_Pattern object>
classmethod set_options(pipeline_options)[source]

Set filesystem options.

Parameters:pipeline_options – Instance of PipelineOptions.
static get_scheme(path)[source]
static get_filesystem(path)[source]

Get the correct filesystem for the specified path

static join(basepath, *paths)[source]

Join two or more pathname components for the filesystem

Parameters:
  • basepath – string path of the first component of the path
  • paths – path components to be added

Returns: full path after combining all the passed components

static split(path)[source]

Splits the given path into two parts.

Splits the path into a pair (head, tail) such that tail contains the last component of the path and head contains everything up to that.

For file-systems other than the local file-system, head should include the prefix.

Parameters:path – path as a string
Returns:a pair of path components as strings.
static mkdirs(path)[source]

Recursively create directories for the provided path.

Parameters:path – string path of the directory structure that should be created
Raises:IOError if leaf directory already exists.
static match(patterns, limits=None)[source]

Find all matching paths to the patterns provided.

Parameters:
  • patterns – list of string for the file path pattern to match against
  • limits – list of maximum number of responses that need to be fetched

Returns: list of MatchResult objects.

Raises:BeamIOError if any of the pattern match operations fail
static create(path, mime_type='application/octet-stream', compression_type='auto')[source]

Returns a write channel for the given file path.

Parameters:
  • path – string path of the file object to be written to the system
  • mime_type – MIME type to specify the type of content in the file object
  • compression_type – Type of compression to be used for this object. See CompressionTypes for possible values.

Returns: file handle with a close function for the user to use.

static open(path, mime_type='application/octet-stream', compression_type='auto')[source]

Returns a read channel for the given file path.

Parameters:
  • path – string path of the file object to be written to the system
  • mime_type – MIME type to specify the type of content in the file object
  • compression_type – Type of compression to be used for this object. See CompressionTypes for possible values.

Returns: file handle with a close function for the user to use.

static copy(source_file_names, destination_file_names)[source]

Recursively copy the file list from the source to the destination

Parameters:
  • source_file_names – list of source file objects that needs to be copied
  • destination_file_names – list of destination of the new object
Raises:

BeamIOError if any of the copy operations fail

static rename(source_file_names, destination_file_names)[source]

Rename the files at the source list to the destination list. Source and destination lists should be of the same size.

Parameters:
  • source_file_names – List of file paths that need to be moved
  • destination_file_names – List of destination_file_names for the files
Raises:

BeamIOError if any of the rename operations fail

static exists(path)[source]

Check if the provided path exists on the FileSystem.

Parameters:path – string path that needs to be checked.

Returns: boolean flag indicating if path exists

static delete(paths)[source]

Deletes files or directories at the provided paths. Directories will be deleted recursively.

Parameters:paths – list of paths that give the file objects to be deleted
Raises:BeamIOError if any of the delete operations fail
static get_chunk_size(path)[source]

Get the correct chunk size for the FileSystem.

Parameters:path – string path that needs to be checked.

Returns: integer size for parallelization in the FS operations.