apache_beam.runners.dataflow.native_io package

Submodules

apache_beam.runners.dataflow.native_io.iobase module

Dataflow native sources and sinks.

For internal use only; no backwards-compatibility guarantees.

class apache_beam.runners.dataflow.native_io.iobase.ConcatPosition(index, position)[source]

Bases: object

A position that encapsulate an inner position and an index.

This is used to represent the position of a source that encapsulate several other sources.

class apache_beam.runners.dataflow.native_io.iobase.DynamicSplitRequest(progress)[source]

Bases: object

Specifies how ‘NativeSourceReader.request_dynamic_split’ should split.

class apache_beam.runners.dataflow.native_io.iobase.DynamicSplitResult[source]

Bases: object

class apache_beam.runners.dataflow.native_io.iobase.DynamicSplitResultWithPosition(stop_position)[source]

Bases: apache_beam.runners.dataflow.native_io.iobase.DynamicSplitResult

class apache_beam.runners.dataflow.native_io.iobase.NativeSink[source]

Bases: apache_beam.transforms.display.HasDisplayData

A sink implemented by Dataflow service.

This class is to be only inherited by sinks natively implemented by Cloud Dataflow service, hence should not be sub-classed by users.

writer()[source]

Returns a SinkWriter for this source.

class apache_beam.runners.dataflow.native_io.iobase.NativeSinkWriter[source]

Bases: object

A writer for a sink implemented by Dataflow service.

Write(o)[source]

Writes a record to the sink associated with this writer.

takes_windowed_values

Returns whether this writer takes windowed values.

class apache_beam.runners.dataflow.native_io.iobase.NativeSource[source]

Bases: apache_beam.transforms.display.HasDisplayData

A source implemented by Dataflow service.

This class is to be only inherited by sources natively implemented by Cloud Dataflow service, hence should not be sub-classed by users.

This class is deprecated and should not be used to define new sources.

reader()[source]

Returns a NativeSourceReader instance associated with this source.

class apache_beam.runners.dataflow.native_io.iobase.NativeSourceReader[source]

Bases: object

A reader for a source implemented by Dataflow service.

get_progress()[source]

Returns a representation of how far the reader has read.

Returns:A SourceReaderProgress object that gives the current progress of the reader.
request_dynamic_split(dynamic_split_request)[source]

Attempts to split the input in two parts.

The two parts are named the “primary” part and the “residual” part. The current ‘NativeSourceReader’ keeps processing the primary part, while the residual part will be processed elsewhere (e.g. perhaps on a different worker).

The primary and residual parts, if concatenated, must represent the same input as the current input of this ‘NativeSourceReader’ before this call.

The boundary between the primary part and the residual part is specified in a framework-specific way using ‘DynamicSplitRequest’ e.g., if the framework supports the notion of positions, it might be a position at which the input is asked to split itself (which is not necessarily the same position at which it will split itself); it might be an approximate fraction of input, or something else.

This function returns a ‘DynamicSplitResult’, which encodes, in a framework-specific way, the information sufficient to construct a description of the resulting primary and residual inputs. For example, it might, again, be a position demarcating these parts, or it might be a pair of fully-specified input descriptions, or something else.

After a successful call to ‘request_dynamic_split()’, subsequent calls should be interpreted relative to the new primary.

Parameters:dynamic_split_request – A ‘DynamicSplitRequest’ describing the split request.
Returns:‘None’ if the ‘DynamicSplitRequest’ cannot be honored (in that case the input represented by this ‘NativeSourceReader’ stays the same), or a ‘DynamicSplitResult’ describing how the input was split into a primary and residual part.
returns_windowed_values

Returns whether this reader returns windowed values.

class apache_beam.runners.dataflow.native_io.iobase.ReaderPosition(end=None, key=None, byte_offset=None, record_index=None, shuffle_position=None, concat_position=None)[source]

Bases: object

A representation of position in an iteration of a ‘NativeSourceReader’.

class apache_beam.runners.dataflow.native_io.iobase.ReaderProgress(position=None, percent_complete=None, remaining_time=None, consumed_split_points=None, remaining_split_points=None)[source]

Bases: object

A representation of how far a NativeSourceReader has read.

consumed_split_points
percent_complete

Returns progress, represented as a percentage of total work.

Progress range from 0.0 (beginning, nothing complete) to 1.0 (end of the work range, entire WorkItem complete).

Returns:Progress represented as a percentage of total work.
position

Returns progress, represented as a ReaderPosition object.

remaining_split_points
remaining_time

Returns progress, represented as an estimated time remaining.

apache_beam.runners.dataflow.native_io.streaming_create module

Create transform for streaming.

class apache_beam.runners.dataflow.native_io.streaming_create.StreamingCreate(values, coder)[source]

Bases: apache_beam.transforms.ptransform.PTransform

A specialized implementation for Create transform in streaming mode.

Note: There is no unbounded source API in python to wrap the Create source, so we map this to composite of Impulse primitive and an SDF.

class DecodeAndEmitDoFn(encoded_values, coder)[source]

Bases: apache_beam.transforms.core.DoFn

A DoFn which stores encoded versions of elements.

It also stores a Coder to decode and emit those elements. TODO: BEAM-2422 - Make this a SplittableDoFn.

process(unused_element)[source]
class Impulse(label=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

The Dataflow specific override for the impulse primitive.

expand(pbegin)[source]
get_windowing(inputs)[source]
infer_output_type(unused_input_type)[source]
expand(pbegin)[source]

Module contents