apache_beam.io.gcp.gcsio module

Google Cloud Storage client.

This library evolved from the Google App Engine GCS client available at https://github.com/GoogleCloudPlatform/appengine-gcs-client.

class apache_beam.io.gcp.gcsio.GcsIO(storage_client=None)[source]

Bases: object

Google Cloud Storage I/O client.

open(filename, mode='r', read_buffer_size=16777216, mime_type='application/octet-stream')[source]

Open a GCS file path for reading or writing.

Parameters:
  • filename (str) – GCS file path in the form gs://<bucket>/<object>.
  • mode (str) – 'r' for reading or 'w' for writing.
  • read_buffer_size (int) – Buffer size to use during read operations.
  • mime_type (str) – Mime type to set for write operations.
Returns:

GCS file object.

Raises:

ValueError – Invalid open file mode.

glob(*args, **kwargs)
delete(*args, **kwargs)
delete_batch(paths)[source]

Deletes the objects at the given GCS paths.

Parameters:paths – List of GCS file path patterns in the form gs://<bucket>/<name>, not to exceed MAX_BATCH_OPERATION_SIZE in length.
Returns: List of tuples of (path, exception) in the same order as the paths
argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
copy(*args, **kwargs)
copy_batch(src_dest_pairs)[source]

Copies the given GCS object from src to dest.

Parameters:src_dest_pairs – list of (src, dest) tuples of gs://<bucket>/<name> files paths to copy from src to dest, not to exceed MAX_BATCH_OPERATION_SIZE in length.
Returns: List of tuples of (src, dest, exception) in the same order as the
src_dest_pairs argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
copytree(src, dest)[source]

Renames the given GCS “directory” recursively from src to dest.

Parameters:
  • src – GCS file path pattern in the form gs://<bucket>/<name>/.
  • dest – GCS file path pattern in the form gs://<bucket>/<name>/.
rename(src, dest)[source]

Renames the given GCS object from src to dest.

Parameters:
  • src – GCS file path pattern in the form gs://<bucket>/<name>.
  • dest – GCS file path pattern in the form gs://<bucket>/<name>.
exists(*args, **kwargs)
size(*args, **kwargs)
size_of_files_in_glob(*args, **kwargs)