apache_beam.io.gcp.gcsfilesystem module
GCS file system implementation for accessing files on GCS.
Updates to the I/O connector code
For any significant updates to this I/O connector, please consider involving corresponding code reviewers mentioned in https://github.com/apache/beam/blob/master/sdks/python/OWNERS
- class apache_beam.io.gcp.gcsfilesystem.GCSFileSystem(pipeline_options)[source]
 Bases:
FileSystemA GCS
FileSystemimplementation for accessing files on GCS.- CHUNK_SIZE = 100
 
- GCS_PREFIX = 'gs://'
 
- join(basepath, *paths)[source]
 Join two or more pathname components for the filesystem
- Parameters:
 basepath – string path of the first component of the path
paths – path components to be added
Returns: full path after combining all the passed components
- split(path)[source]
 Splits the given path into two parts.
Splits the path into a pair (head, tail) such that tail contains the last component of the path and head contains everything up to that.
Head will include the GCS prefix (‘gs://’).
- Parameters:
 path – path as a string
- Returns:
 a pair of path components as strings.
- mkdirs(path)[source]
 Recursively create directories for the provided path.
- Parameters:
 path – string path of the directory structure that should be created
- Raises:
 IOError – if leaf directory already exists.
- create(path, mime_type='application/octet-stream', compression_type='auto') BinaryIO[source]
 Returns a write channel for the given file path.
- Parameters:
 path – string path of the file object to be written to the system
mime_type – MIME type to specify the type of content in the file object
compression_type – Type of compression to be used for this object
Returns: file handle with a close function for the user to use
- open(path, mime_type='application/octet-stream', compression_type='auto') BinaryIO[source]
 Returns a read channel for the given file path.
- Parameters:
 path – string path of the file object to be written to the system
mime_type – MIME type to specify the type of content in the file object
compression_type – Type of compression to be used for this object
Returns: file handle with a close function for the user to use
- copy(source_file_names, destination_file_names)[source]
 Recursively copy the file tree from the source to the destination
- Parameters:
 source_file_names – list of source file objects that needs to be copied
destination_file_names – list of destination of the new object
- Raises:
 BeamIOError – if any of the copy operations fail
- rename(source_file_names, destination_file_names)[source]
 Rename the files at the source list to the destination list. Source and destination lists should be of the same size.
- Parameters:
 source_file_names – List of file paths that need to be moved
destination_file_names – List of destination_file_names for the files
- Raises:
 BeamIOError – if any of the rename operations fail
- exists(path)[source]
 Check if the provided path exists on the FileSystem.
- Parameters:
 path – string path that needs to be checked.
Returns: boolean flag indicating if path exists
- size(path)[source]
 Get size of path on the FileSystem.
- Parameters:
 path – string path in question.
Returns: int size of path according to the FileSystem.
- Raises:
 BeamIOError – if path doesn’t exist.
- last_updated(path)[source]
 Get UNIX Epoch time in seconds on the FileSystem.
- Parameters:
 path – string path of file.
Returns: float UNIX Epoch time
- Raises:
 BeamIOError – if path doesn’t exist.
- checksum(path)[source]
 Fetch checksum metadata of a file on the
FileSystem.- Parameters:
 path – string path of a file.
Returns: string containing checksum
- Raises:
 BeamIOError – if path isn’t a file or doesn’t exist.
- metadata(path)[source]
 Fetch metadata fields of a file on the FileSystem.
- Parameters:
 path – string path of a file.
- Returns:
 - Raises:
 BeamIOError – if path isn’t a file or doesn’t exist.