apache_beam.io.gcp.gcsio module¶
Google Cloud Storage client.
This library evolved from the Google App Engine GCS client available at https://github.com/GoogleCloudPlatform/appengine-gcs-client.
Updates to the I/O connector code
For any significant updates to this I/O connector, please consider involving corresponding code reviewers mentioned in https://github.com/apache/beam/blob/master/sdks/python/OWNERS
- 
class apache_beam.io.gcp.gcsio.GcsIO(storage_client=None, pipeline_options=None)[source]¶
- Bases: - object- Google Cloud Storage I/O client. - 
get_bucket(bucket_name)[source]¶
- Returns an object bucket from its name, or None if it does not exist. 
 - 
create_bucket(bucket_name, project, kms_key=None, location=None)[source]¶
- Create and return a GCS bucket in a specific project. 
 - 
open(filename, mode='r', read_buffer_size=16777216, mime_type='application/octet-stream')[source]¶
- Open a GCS file path for reading or writing. - Parameters: - Returns: - GCS file object. - Raises: - ValueError– Invalid open file mode.
 - 
delete(path)[source]¶
- Deletes the object at the given GCS path. - Parameters: - path – GCS file path pattern in the form gs://<bucket>/<name>. 
 - 
delete_batch(paths)[source]¶
- Deletes the objects at the given GCS paths. - Parameters: - paths – List of GCS file path patterns or Dict with GCS file path patterns as keys. The patterns are in the form gs://<bucket>/<name>, but not to exceed MAX_BATCH_OPERATION_SIZE in length. - Returns: List of tuples of (path, exception) in the same order as the
- paths argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
 
 - 
copy(src, dest)[source]¶
- Copies the given GCS object from src to dest. - Parameters: - src – GCS file path pattern in the form gs://<bucket>/<name>.
- dest – GCS file path pattern in the form gs://<bucket>/<name>.
 - Raises: - TimeoutError– on timeout.
 - 
copy_batch(src_dest_pairs)[source]¶
- Copies the given GCS objects from src to dest. - Parameters: - src_dest_pairs – list of (src, dest) tuples of gs://<bucket>/<name> files paths to copy from src to dest, not to exceed MAX_BATCH_OPERATION_SIZE in length. - Returns: List of tuples of (src, dest, exception) in the same order as the
- src_dest_pairs argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
 
 - 
copytree(src, dest)[source]¶
- Renames the given GCS “directory” recursively from src to dest. - Parameters: - src – GCS file path pattern in the form gs://<bucket>/<name>/.
- dest – GCS file path pattern in the form gs://<bucket>/<name>/.
 
 - 
rename(src, dest)[source]¶
- Renames the given GCS object from src to dest. - Parameters: - src – GCS file path pattern in the form gs://<bucket>/<name>.
- dest – GCS file path pattern in the form gs://<bucket>/<name>.
 
 - 
exists(path)[source]¶
- Returns whether the given GCS object exists. - Parameters: - path – GCS file path pattern in the form gs://<bucket>/<name>. 
 - 
checksum(path)[source]¶
- Looks up the checksum of a GCS object. - Parameters: - path – GCS file path pattern in the form gs://<bucket>/<name>. 
 - 
size(path)[source]¶
- Returns the size of a single GCS object. - This method does not perform glob expansion. Hence the given path must be for a single GCS object. - Returns: size of the GCS object in bytes. 
 - 
kms_key(path)[source]¶
- Returns the KMS key of a single GCS object. - This method does not perform glob expansion. Hence the given path must be for a single GCS object. - Returns: KMS key name of the GCS object as a string, or None if it doesn’t
- have one.
 
 - 
last_updated(path)[source]¶
- Returns the last updated epoch time of a single GCS object. - This method does not perform glob expansion. Hence the given path must be for a single GCS object. - Returns: last updated time of the GCS object in second. 
 - 
list_prefix(path, with_metadata=False)[source]¶
- Lists files matching the prefix. - list_prefixhas been deprecated. Use list_files instead, which returns a generator of file information instead of a dict.- Parameters: - path – GCS file path pattern in the form gs://<bucket>/[name].
- with_metadata – Experimental. Specify whether returns file metadata.
 - Returns: - dict of file name -> size; if
- with_metadatais True: dict of file name -> tuple(size, timestamp).
 - Return type: - If - with_metadatais False
 - 
list_files(path, with_metadata=False)[source]¶
- Lists files matching the prefix. - Parameters: - path – GCS file path pattern in the form gs://<bucket>/[name].
- with_metadata – Experimental. Specify whether returns file metadata.
 - Returns: - generator of tuple(file name, size); if - with_metadatais True: generator of tuple(file name, tuple(size, timestamp)).- Return type: - If - with_metadatais False
 
-