apache_beam.io.aws.s3io module¶
AWS S3 client
-
apache_beam.io.aws.s3io.
parse_s3_path
(s3_path, object_optional=False)[source]¶ Return the bucket and object names of the given s3:// path.
-
class
apache_beam.io.aws.s3io.
S3IO
(client=None, options=None)[source]¶ Bases:
object
S3 I/O client.
-
open
(filename, mode='r', read_buffer_size=16777216, mime_type='application/octet-stream')[source]¶ Open an S3 file path for reading or writing.
Parameters: Returns: S3 file object.
Raises: ValueError
– Invalid open file mode.
-
list_prefix
(path)[source]¶ Lists files matching the prefix.
Parameters: path – S3 file path pattern in the form s3://<bucket>/[name]. Returns: Dictionary of file name -> size.
-
checksum
(path)[source]¶ Looks up the checksum of an S3 object.
Parameters: path – S3 file path pattern in the form s3://<bucket>/<name>.
-
copy
(src, dest)[source]¶ Copies a single S3 file object from src to dest.
Parameters: - src – S3 file path pattern in the form s3://<bucket>/<name>.
- dest – S3 file path pattern in the form s3://<bucket>/<name>.
Raises: TimeoutError
– on timeout.
-
copy_paths
(src_dest_pairs)[source]¶ Copies the given S3 objects from src to dest. This can handle directory or file paths.
Parameters: src_dest_pairs – list of (src, dest) tuples of s3://<bucket>/<name> file paths to copy from src to dest - Returns: List of tuples of (src, dest, exception) in the same order as the
- src_dest_pairs argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
-
copy_tree
(src, dest)[source]¶ Renames the given S3 directory and it’s contents recursively from src to dest.
Parameters: - src – S3 file path pattern in the form s3://<bucket>/<name>/.
- dest – S3 file path pattern in the form s3://<bucket>/<name>/.
Returns: List of tuples of (src, dest, exception) where exception is None if the operation succeeded or the relevant exception if the operation failed.
-
delete
(path)[source]¶ Deletes a single S3 file object from src to dest.
Parameters: - src – S3 file path pattern in the form s3://<bucket>/<name>/.
- dest – S3 file path pattern in the form s3://<bucket>/<name>/.
Returns: List of tuples of (src, dest, exception) in the same order as the src_dest_pairs argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
-
delete_paths
(paths)[source]¶ Deletes the given S3 objects from src to dest. This can handle directory or file paths.
Parameters: - src – S3 file path pattern in the form s3://<bucket>/<name>/.
- dest – S3 file path pattern in the form s3://<bucket>/<name>/.
Returns: List of tuples of (src, dest, exception) in the same order as the src_dest_pairs argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
-
delete_files
(paths, max_batch_size=1000)[source]¶ Deletes the given S3 file object from src to dest.
Parameters: - paths – List of S3 file paths in the form s3://<bucket>/<name>
- max_batch_size – Largest number of keys to send to the client to be deleted
- simultaneously –
- Returns: List of tuples of (path, exception) in the same order as the paths
- argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
-
delete_tree
(root)[source]¶ Deletes all objects under the given S3 directory.
Parameters: path – S3 root path in the form s3://<bucket>/<name>/ (ending with a “/”) - Returns: List of tuples of (path, exception), where each path is an object
- under the given root. exception is None if the operation succeeded or the relevant exception if the operation failed.
-
size
(path)[source]¶ Returns the size of a single S3 object.
This method does not perform glob expansion. Hence the given path must be for a single S3 object.
Returns: size of the S3 object in bytes.
-
rename
(src, dest)[source]¶ Renames the given S3 object from src to dest.
Parameters: - src – S3 file path pattern in the form s3://<bucket>/<name>.
- dest – S3 file path pattern in the form s3://<bucket>/<name>.
-
last_updated
(path)[source]¶ Returns the last updated epoch time of a single S3 object.
This method does not perform glob expansion. Hence the given path must be for a single S3 object.
Returns: last updated time of the S3 object in second.
-
exists
(path)[source]¶ Returns whether the given S3 object exists.
Parameters: path – S3 file path pattern in the form s3://<bucket>/<name>.
-
rename_files
(src_dest_pairs)[source]¶ Renames the given S3 objects from src to dest.
Parameters: src_dest_pairs – list of (src, dest) tuples of s3://<bucket>/<name> file paths to rename from src to dest - Returns: List of tuples of (src, dest, exception) in the same order as the
- src_dest_pairs argument, where exception is None if the operation succeeded or the relevant exception if the operation failed.
-
-
class
apache_beam.io.aws.s3io.
S3Downloader
(client, path, buffer_size)[source]¶ Bases:
apache_beam.io.filesystemio.Downloader
-
size
¶
-