apache_beam.io.gcp.gcsfilesystem module
GCS file system implementation for accessing files on GCS.
Updates to the I/O connector code
For any significant updates to this I/O connector, please consider involving corresponding code reviewers mentioned in https://github.com/apache/beam/blob/master/sdks/python/OWNERS
- class apache_beam.io.gcp.gcsfilesystem.GCSFileSystem(pipeline_options)[source]
- Bases: - FileSystem- A GCS - FileSystemimplementation for accessing files on GCS.- CHUNK_SIZE = 100
 - GCS_PREFIX = 'gs://'
 - join(basepath, *paths)[source]
- Join two or more pathname components for the filesystem - Parameters:
- basepath – string path of the first component of the path 
- paths – path components to be added 
 
 - Returns: full path after combining all the passed components 
 - split(path)[source]
- Splits the given path into two parts. - Splits the path into a pair (head, tail) such that tail contains the last component of the path and head contains everything up to that. - Head will include the GCS prefix (‘gs://’). - Parameters:
- path – path as a string 
- Returns:
- a pair of path components as strings. 
 
 - mkdirs(path)[source]
- Recursively create directories for the provided path. - Parameters:
- path – string path of the directory structure that should be created 
- Raises:
- IOError – if leaf directory already exists. 
 
 - create(path, mime_type='application/octet-stream', compression_type='auto') BinaryIO[source]
- Returns a write channel for the given file path. - Parameters:
- path – string path of the file object to be written to the system 
- mime_type – MIME type to specify the type of content in the file object 
- compression_type – Type of compression to be used for this object 
 
 - Returns: file handle with a close function for the user to use 
 - open(path, mime_type='application/octet-stream', compression_type='auto') BinaryIO[source]
- Returns a read channel for the given file path. - Parameters:
- path – string path of the file object to be written to the system 
- mime_type – MIME type to specify the type of content in the file object 
- compression_type – Type of compression to be used for this object 
 
 - Returns: file handle with a close function for the user to use 
 - copy(source_file_names, destination_file_names)[source]
- Recursively copy the file tree from the source to the destination - Parameters:
- source_file_names – list of source file objects that needs to be copied 
- destination_file_names – list of destination of the new object 
 
- Raises:
- BeamIOError – if any of the copy operations fail 
 
 - rename(source_file_names, destination_file_names)[source]
- Rename the files at the source list to the destination list. Source and destination lists should be of the same size. - Parameters:
- source_file_names – List of file paths that need to be moved 
- destination_file_names – List of destination_file_names for the files 
 
- Raises:
- BeamIOError – if any of the rename operations fail 
 
 - exists(path)[source]
- Check if the provided path exists on the FileSystem. - Parameters:
- path – string path that needs to be checked. 
 - Returns: boolean flag indicating if path exists 
 - size(path)[source]
- Get size of path on the FileSystem. - Parameters:
- path – string path in question. 
 - Returns: int size of path according to the FileSystem. - Raises:
- BeamIOError – if path doesn’t exist. 
 
 - last_updated(path)[source]
- Get UNIX Epoch time in seconds on the FileSystem. - Parameters:
- path – string path of file. 
 - Returns: float UNIX Epoch time - Raises:
- BeamIOError – if path doesn’t exist. 
 
 - checksum(path)[source]
- Fetch checksum metadata of a file on the - FileSystem.- Parameters:
- path – string path of a file. 
 - Returns: string containing checksum - Raises:
- BeamIOError – if path isn’t a file or doesn’t exist. 
 
 - metadata(path)[source]
- Fetch metadata fields of a file on the FileSystem. - Parameters:
- path – string path of a file. 
- Returns:
- Raises:
- BeamIOError – if path isn’t a file or doesn’t exist.