apache_beam.io.filesystemio module

Utilities for FileSystem implementations.

class apache_beam.io.filesystemio.Downloader[source]

Bases: object

Download interface for a single file.

Implementations should support random access reads.

size

Size of file to download.

get_range(start, end)[source]

Retrieve a given byte range [start, end) from this download.

Range must be in this form:
0 <= start < end: Fetch the bytes from start to end.
Parameters:
  • start – (int) Initial byte offset.
  • end – (int) Final byte offset, exclusive.
Returns:

(string) A buffer containing the requested data.

class apache_beam.io.filesystemio.Uploader[source]

Bases: object

Upload interface for a single file.

put(data)[source]

Write data to file sequentially.

Parameters:data – (memoryview) Data to write.
finish()[source]

Signal to upload any remaining data and close the file.

File should be fully written upon return from this method.

Raises:Any error encountered during the upload.
class apache_beam.io.filesystemio.DownloaderStream(downloader, read_buffer_size=8192, mode='rb')[source]

Bases: io.RawIOBase

Provides a stream interface for Downloader objects.

Initializes the stream.

Parameters:
  • downloader – (Downloader) Filesystem dependent implementation.
  • read_buffer_size – (int) Buffer size to use during read operations.
  • mode – (string) Python mode attribute for this stream.
readinto(b)[source]

Read up to len(b) bytes into b.

Returns number of bytes read (0 for EOF).

Parameters:b – (bytearray/memoryview) Buffer to read into.
seek(offset, whence=0)[source]

Set the stream’s current offset.

Note if the new offset is out of bound, it is adjusted to either 0 or EOF.

Parameters:
  • offset – seek offset as number.
  • whence – seek mode. Supported modes are os.SEEK_SET (absolute seek), os.SEEK_CUR (seek relative to the current position), and os.SEEK_END (seek relative to the end, offset should be negative).
Raises:

ValueError – When this stream is closed or if whence is invalid.

tell()[source]

Tell the stream’s current offset.

Returns:current offset in reading this stream.
Raises:ValueError – When this stream is closed.
seekable()[source]
readable()[source]
readall()[source]

Read until EOF, using multiple read() call.

class apache_beam.io.filesystemio.UploaderStream(uploader, mode='wb')[source]

Bases: io.RawIOBase

Provides a stream interface for Uploader objects.

Initializes the stream.

Parameters:
  • uploader – (Uploader) Filesystem dependent implementation.
  • mode – (string) Python mode attribute for this stream.
tell()[source]
write(b)[source]

Write bytes from b.

Returns number of bytes written (<= len(b)).

Parameters:b – (memoryview) Buffer with data to write.
close()[source]

Complete the upload and close this stream.

This method has no effect if the stream is already closed.

Raises:Any error encountered by the uploader.
writable()[source]
class apache_beam.io.filesystemio.PipeStream(recv_pipe)[source]

Bases: object

A class that presents a pipe connection as a readable stream.

Not thread-safe.

Remembers the last size bytes read and allows rewinding the stream by that amount exactly. See BEAM-6380 for more.

read(size)[source]

Read data from the wrapped pipe connection.

Parameters:size – Number of bytes to read. Actual number of bytes read is always equal to size unless EOF is reached.
Returns:data read as str.
tell()[source]

Tell the file’s current offset.

Returns:current offset in reading this file.
Raises:ValueError – When this stream is closed.
seek(offset, whence=0)[source]