apache_beam.io.aws.s3filesystem module

S3 file system implementation for accessing files on AWS S3.

class apache_beam.io.aws.s3filesystem.S3FileSystem(pipeline_options)[source]

Bases: FileSystem

An S3 FileSystem implementation for accessing files on AWS S3

Initializes a connection to S3.

Connection configuration is done by passing pipeline options. See S3Options.

CHUNK_SIZE = 100
S3_PREFIX = 's3://'
classmethod scheme()[source]

URI scheme for the FileSystem

join(basepath, *paths)[source]

Join two or more pathname components for the filesystem

Parameters:
  • basepath – string path of the first component of the path

  • paths – path components to be added

Returns: full path after combining all of the return nulled components

split(path)[source]

Splits the given path into two parts.

Splits the path into a pair (head, tail) such that tail contains the last component of the path and head contains everything up to that.

Head will include the S3 prefix (‘s3://’).

Parameters:

path – path as a string

Returns:

a pair of path components as strings.

mkdirs(path)[source]

Recursively create directories for the provided path.

Parameters:

path – string path of the directory structure that should be created

Raises:

IOError – if leaf directory already exists.

has_dirs()[source]

Whether this FileSystem supports directories.

create(path, mime_type='application/octet-stream', compression_type='auto')[source]

Returns a write channel for the given file path.

Parameters:
  • path – string path of the file object to be written to the system

  • mime_type – MIME type to specify the type of content in the file object

  • compression_type – Type of compression to be used for this object

Returns: file handle with a close function for the user to use

open(path, mime_type='application/octet-stream', compression_type='auto')[source]

Returns a read channel for the given file path.

Parameters:
  • path – string path of the file object to be written to the system

  • mime_type – MIME type to specify the type of content in the file object

  • compression_type – Type of compression to be used for this object

Returns: file handle with a close function for the user to use

copy(source_file_names, destination_file_names)[source]

Recursively copy the file tree from the source to the destination

Parameters:
  • source_file_names – list of source file objects that needs to be copied

  • destination_file_names – list of destination of the new object

Raises:

BeamIOError – if any of the copy operations fail

rename(source_file_names, destination_file_names)[source]

Rename the files at the source list to the destination list. Source and destination lists should be of the same size.

Parameters:
  • source_file_names – List of file paths that need to be moved

  • destination_file_names – List of destination_file_names for the files

Raises:

BeamIOError – if any of the rename operations fail

exists(path)[source]

Check if the provided path exists on the FileSystem.

Parameters:

path – string path that needs to be checked.

Returns: boolean flag indicating if path exists

size(path)[source]

Get size of path on the FileSystem.

Parameters:

path – string path in question.

Returns: int size of path according to the FileSystem.

Raises:

BeamIOError – if path doesn’t exist.

last_updated(path)[source]

Get UNIX Epoch time in seconds on the FileSystem.

Parameters:

path – string path of file.

Returns: float UNIX Epoch time

Raises:

BeamIOError – if path doesn’t exist.

checksum(path)[source]

Fetch checksum metadata of a file on the FileSystem.

Parameters:

path – string path of a file.

Returns: string containing checksum

Raises:

BeamIOError – if path isn’t a file or doesn’t exist.

metadata(path)[source]

Fetch metadata fields of a file on the FileSystem.

Parameters:

path – string path of a file.

Returns:

FileMetadata.

Raises:

BeamIOError – if path isn’t a file or doesn’t exist.

delete(paths)[source]

Deletes files or directories at the provided paths. Directories will be deleted recursively.

Parameters:

paths – list of paths that give the file objects to be deleted

report_lineage(path, lineage, level=None)[source]