apache_beam.runners.dataflow.native_io.iobase module

Dataflow native sources and sinks.

For internal use only; no backwards-compatibility guarantees.

class apache_beam.runners.dataflow.native_io.iobase.NativeSource[source]

Bases: apache_beam.io.iobase.SourceBase

A source implemented by Dataflow service.

This class is to be only inherited by sources natively implemented by Cloud Dataflow service, hence should not be sub-classed by users.

This class is deprecated and should not be used to define new sources.

coder = None
reader()[source]

Returns a NativeSourceReader instance associated with this source.

is_bounded()[source]
class apache_beam.runners.dataflow.native_io.iobase.NativeSourceReader[source]

Bases: object

A reader for a source implemented by Dataflow service.

returns_windowed_values

Returns whether this reader returns windowed values.

get_progress()[source]

Returns a representation of how far the reader has read.

Returns:A SourceReaderProgress object that gives the current progress of the reader.
request_dynamic_split(dynamic_split_request)[source]

Attempts to split the input in two parts.

The two parts are named the “primary” part and the “residual” part. The current ‘NativeSourceReader’ keeps processing the primary part, while the residual part will be processed elsewhere (e.g. perhaps on a different worker).

The primary and residual parts, if concatenated, must represent the same input as the current input of this ‘NativeSourceReader’ before this call.

The boundary between the primary part and the residual part is specified in a framework-specific way using ‘DynamicSplitRequest’ e.g., if the framework supports the notion of positions, it might be a position at which the input is asked to split itself (which is not necessarily the same position at which it will split itself); it might be an approximate fraction of input, or something else.

This function returns a ‘DynamicSplitResult’, which encodes, in a framework-specific way, the information sufficient to construct a description of the resulting primary and residual inputs. For example, it might, again, be a position demarcating these parts, or it might be a pair of fully-specified input descriptions, or something else.

After a successful call to ‘request_dynamic_split()’, subsequent calls should be interpreted relative to the new primary.

Parameters:dynamic_split_request – A ‘DynamicSplitRequest’ describing the split request.
Returns:‘None’ if the ‘DynamicSplitRequest’ cannot be honored (in that case the input represented by this ‘NativeSourceReader’ stays the same), or a ‘DynamicSplitResult’ describing how the input was split into a primary and residual part.
class apache_beam.runners.dataflow.native_io.iobase.ReaderProgress(position=None, percent_complete=None, remaining_time=None, consumed_split_points=None, remaining_split_points=None)[source]

Bases: object

A representation of how far a NativeSourceReader has read.

position

Returns progress, represented as a ReaderPosition object.

percent_complete

Returns progress, represented as a percentage of total work.

Progress range from 0.0 (beginning, nothing complete) to 1.0 (end of the work range, entire WorkItem complete).

Returns:Progress represented as a percentage of total work.
remaining_time

Returns progress, represented as an estimated time remaining.

consumed_split_points
remaining_split_points
class apache_beam.runners.dataflow.native_io.iobase.ReaderPosition(end=None, key=None, byte_offset=None, record_index=None, shuffle_position=None, concat_position=None)[source]

Bases: object

A representation of position in an iteration of a ‘NativeSourceReader’.

Initializes ReaderPosition.

A ReaderPosition may get instantiated for one of these position types. Only one of these should be specified.

Parameters:
  • end – position is past all other positions. For example, this may be used to represent the end position of an unbounded range.
  • key – position is a string key.
  • byte_offset – position is a byte offset.
  • record_index – position is a record index
  • shuffle_position – position is a base64 encoded shuffle position.
  • concat_position – position is a ‘ConcatPosition’.
class apache_beam.runners.dataflow.native_io.iobase.ConcatPosition(index, position)[source]

Bases: object

A position that encapsulate an inner position and an index.

This is used to represent the position of a source that encapsulate several other sources.

Initializes ConcatPosition.

Parameters:
  • index – index of the source currently being read.
  • position – inner position within the source currently being read.
class apache_beam.runners.dataflow.native_io.iobase.DynamicSplitRequest(progress)[source]

Bases: object

Specifies how ‘NativeSourceReader.request_dynamic_split’ should split.

class apache_beam.runners.dataflow.native_io.iobase.DynamicSplitResult[source]

Bases: object

class apache_beam.runners.dataflow.native_io.iobase.DynamicSplitResultWithPosition(stop_position)[source]

Bases: apache_beam.runners.dataflow.native_io.iobase.DynamicSplitResult

class apache_beam.runners.dataflow.native_io.iobase.NativeSink[source]

Bases: apache_beam.transforms.display.HasDisplayData

A sink implemented by Dataflow service.

This class is to be only inherited by sinks natively implemented by Cloud Dataflow service, hence should not be sub-classed by users.

writer()[source]

Returns a SinkWriter for this source.

class apache_beam.runners.dataflow.native_io.iobase.NativeSinkWriter[source]

Bases: object

A writer for a sink implemented by Dataflow service.

takes_windowed_values

Returns whether this writer takes windowed values.

Write(o)[source]

Writes a record to the sink associated with this writer.