apache_beam.io.azure.blobstorageio module

Azure Blob Storage client.

apache_beam.io.azure.blobstorageio.parse_azfs_path(azfs_path, blob_optional=False, get_account=False)[source]

Return the storage account, the container and blob names of the given azfs:// path.

apache_beam.io.azure.blobstorageio.get_azfs_url(storage_account, container, blob='')[source]

Returns the url in the form of https://account.blob.core.windows.net/container/blob-name

class apache_beam.io.azure.blobstorageio.Blob(etag, name, last_updated, size, mime_type)[source]

Bases: object

A Blob in Azure Blob Storage.

exception apache_beam.io.azure.blobstorageio.BlobStorageIOError[source]

Bases: OSError, apache_beam.utils.retry.PermanentException

Blob Strorage IO error that should not be retried.

exception apache_beam.io.azure.blobstorageio.BlobStorageError(message=None, code=None)[source]

Bases: Exception

Blob Storage client error.

class apache_beam.io.azure.blobstorageio.BlobStorageIO(client=None, pipeline_options=None)[source]

Bases: object

Azure Blob Storage I/O client.

open(filename, mode='r', read_buffer_size=16777216, mime_type='application/octet-stream')[source]

Open an Azure Blob Storage file path for reading or writing.

Parameters:
  • filename (str) – Azure Blob Storage file path in the form azfs://<storage-account>/<container>/<path>.
  • mode (str) – 'r' for reading or 'w' for writing.
  • read_buffer_size (int) – Buffer size to use during read operations.
  • mime_type (str) – Mime type to set for write operations.
Returns:

Azure Blob Storage file object.

Raises:

ValueError – Invalid open file mode.

copy(src, dest)[source]

Copies a single Azure Blob Storage blob from src to dest.

Parameters:
  • src – Blob Storage file path pattern in the form azfs://<storage-account>/<container>/[name].
  • dest – Blob Storage file path pattern in the form azfs://<storage-account>/<container>/[name].
Raises:

TimeoutError – on timeout.

copy_tree(src, dest)[source]

Renames the given Azure Blob storage directory and its contents recursively from src to dest.

Parameters:
  • src – Blob Storage file path pattern in the form azfs://<storage-account>/<container>/[name].
  • dest – Blob Storage file path pattern in the form azfs://<storage-account>/<container>/[name].
Returns:

List of tuples of (src, dest, exception) where exception is None if the operation succeeded or the relevant exception if the operation failed.

copy_paths(src_dest_pairs)[source]

Copies the given Azure Blob Storage blobs from src to dest. This can handle directory or file paths.

Parameters:src_dest_pairs – List of (src, dest) tuples of azfs://<storage-account>/<container>/[name] file paths to copy from src to dest.
Returns:List of tuples of (src, dest, exception) in the same order as the src_dest_pairs argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
rename(src, dest)[source]

Renames the given Azure Blob Storage blob from src to dest.

Parameters:
  • src – Blob Storage file path pattern in the form azfs://<storage-account>/<container>/[name].
  • dest – Blob Storage file path pattern in the form azfs://<storage-account>/<container>/[name].
rename_files(src_dest_pairs)[source]

Renames the given Azure Blob Storage blobs from src to dest.

Parameters:src_dest_pairs – List of (src, dest) tuples of azfs://<storage-account>/<container>/[name] file paths to rename from src to dest.
Returns: List of tuples of (src, dest, exception) in the same order as the
src_dest_pairs argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
exists(path)[source]

Returns whether the given Azure Blob Storage blob exists.

Parameters:path – Azure Blob Storage file path pattern in the form azfs://<storage-account>/<container>/[name].
size(path)[source]

Returns the size of a single Blob Storage blob.

This method does not perform glob expansion. Hence the given path must be for a single Blob Storage blob.

Returns: size of the Blob Storage blob in bytes.

last_updated(path)[source]

Returns the last updated epoch time of a single Azure Blob Storage blob.

This method does not perform glob expansion. Hence the given path must be for a single Azure Blob Storage blob.

Returns: last updated time of the Azure Blob Storage blob in seconds.

checksum(path)[source]

Looks up the checksum of an Azure Blob Storage blob.

Parameters:path – Azure Blob Storage file path pattern in the form azfs://<storage-account>/<container>/[name].
delete(path)[source]

Deletes a single blob at the given Azure Blob Storage path.

Parameters:path – Azure Blob Storage file path pattern in the form azfs://<storage-account>/<container>/[name].
delete_paths(paths)[source]

Deletes the given Azure Blob Storage blobs from src to dest. This can handle directory or file paths.

Parameters:paths – list of Azure Blob Storage paths in the form azfs://<storage-account>/<container>/[name] that give the file blobs to be deleted.
Returns:List of tuples of (src, dest, exception) in the same order as the src_dest_pairs argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
delete_tree(root)[source]

Deletes all blobs under the given Azure BlobStorage virtual directory.

Parameters:path – Azure Blob Storage file path pattern in the form azfs://<storage-account>/<container>/[name] (ending with a “/”).
Returns:List of tuples of (path, exception), where each path is a blob under the given root. exception is None if the operation succeeded or the relevant exception if the operation failed.
delete_files(paths)[source]

Deletes the given Azure Blob Storage blobs from src to dest.

Parameters:paths – list of Azure Blob Storage paths in the form azfs://<storage-account>/<container>/[name] that give the file blobs to be deleted.
Returns:List of tuples of (src, dest, exception) in the same order as the src_dest_pairs argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
list_prefix(path, with_metadata=False)[source]

Lists files matching the prefix.

Parameters:
  • path – Azure Blob Storage file path pattern in the form azfs://<storage-account>/<container>/[name].
  • with_metadata – Experimental. Specify whether returns file metadata.
Returns:

dict of file name -> size; if

with_metadata is True: dict of file name -> tuple(size, timestamp).

Return type:

If with_metadata is False

list_files(path, with_metadata=False)[source]

Lists files matching the prefix.

Parameters:
  • path – Azure Blob Storage file path pattern in the form azfs://<storage-account>/<container>/[name].
  • with_metadata – Experimental. Specify whether returns file metadata.
Returns:

generator of tuple(file name, size); if with_metadata is True: generator of tuple(file name, tuple(size, timestamp)).

Return type:

If with_metadata is False

class apache_beam.io.azure.blobstorageio.BlobStorageDownloader(client, path, buffer_size)[source]

Bases: apache_beam.io.filesystemio.Downloader

size
get_range(start, end)[source]
class apache_beam.io.azure.blobstorageio.BlobStorageUploader(client, path, mime_type='application/octet-stream')[source]

Bases: apache_beam.io.filesystemio.Uploader

put(data)[source]
finish()[source]
apache_beam.io.azure.blobstorageio.deprecated(*, label='deprecated', since, current=None, extra_message=None, custom_message=None)

Decorates an API with a deprecated or experimental annotation.

Parameters:
  • label – the kind of annotation (‘deprecated’ or ‘experimental’).
  • since – the version that causes the annotation.
  • current – the suggested replacement function.
  • extra_message – an optional additional message.
  • custom_message – if the default message does not suffice, the message can be changed using this argument. A string whit replacement tokens. A replecement string is were the previus args will be located on the custom message. The following replacement strings can be used: %name% -> API.__name__ %since% -> since (Mandatory for the decapreted annotation) %current% -> current %extra% -> extra_message
Returns:

The decorator for the API.