PTransform
s for transforming
data in a pipeline.See: Description
Interface | Description |
---|---|
Combine.AccumulatingCombineFn.Accumulator<InputT,AccumT,OutputT> |
The type of mutable accumulator values used by this
AccumulatingCombineFn . |
CombineFnBase.GlobalCombineFn<InputT,AccumT,OutputT> |
For internal use only; no backwards-compatibility guarantees.
|
CombineWithContext.RequiresContextInternal |
An internal interface for signaling that a
GloballyCombineFn
or a PerKeyCombineFn needs to access CombineWithContext.Context . |
DoFn.OutputReceiver<T> |
Receives values of the given type.
|
Materialization<T> |
For internal use only; no backwards-compatibility guarantees.
|
Partition.PartitionFn<T> |
A function object that chooses an output partition for an element.
|
SerializableComparator<T> |
A
Comparator that is also Serializable . |
SerializableFunction<InputT,OutputT> |
A function that computes an output value of type
OutputT from an input value of type
InputT and is Serializable . |
Watch.Growth.PollFn<InputT,OutputT> |
A function that computes the current set of outputs for the given input (given as a
TimestampedValue ), in the form of a Watch.Growth.PollResult . |
Watch.Growth.TerminationCondition<InputT,StateT> |
A strategy for determining whether it is time to stop polling the current input regardless of
whether its output is complete or not.
|
Class | Description |
---|---|
ApproximateQuantiles |
PTransform s for getting an idea of a PCollection 's
data distribution using approximate N -tiles (e.g. |
ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> |
The
ApproximateQuantilesCombineFn combiner gives an idea
of the distribution of a collection of values using approximate
N -tiles. |
ApproximateUnique |
PTransform s for estimating the number of distinct elements
in a PCollection , or the number of distinct values
associated with each key in a PCollection of KV s. |
ApproximateUnique.ApproximateUniqueCombineFn<T> |
CombineFn that computes an estimate of the number of
distinct values that were combined. |
ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique |
A heap utility class to efficiently track the largest added elements.
|
Combine |
PTransform s for combining PCollection elements
globally and per-key. |
Combine.AccumulatingCombineFn<InputT,AccumT extends Combine.AccumulatingCombineFn.Accumulator<InputT,AccumT,OutputT>,OutputT> |
A
CombineFn that uses a subclass of
Combine.AccumulatingCombineFn.Accumulator as its accumulator
type. |
Combine.BinaryCombineDoubleFn |
An abstract subclass of
Combine.CombineFn for implementing combiners that are more
easily and efficiently expressed as binary operations on double s. |
Combine.BinaryCombineFn<V> |
An abstract subclass of
Combine.CombineFn for implementing combiners that are more
easily expressed as binary operations. |
Combine.BinaryCombineIntegerFn |
An abstract subclass of
Combine.CombineFn for implementing combiners that are more
easily and efficiently expressed as binary operations on int s |
Combine.BinaryCombineLongFn |
An abstract subclass of
Combine.CombineFn for implementing combiners that are more
easily and efficiently expressed as binary operations on long s. |
Combine.CombineFn<InputT,AccumT,OutputT> |
A
CombineFn<InputT, AccumT, OutputT> specifies how to combine a
collection of input values of type InputT into a single
output value of type OutputT . |
Combine.Globally<InputT,OutputT> |
Combine.Globally<InputT, OutputT> takes a PCollection<InputT>
and returns a PCollection<OutputT> whose elements are the result of
combining all the elements in each window of the input PCollection ,
using a specified CombineFn<InputT, AccumT, OutputT> . |
Combine.GloballyAsSingletonView<InputT,OutputT> |
Combine.GloballyAsSingletonView<InputT, OutputT> takes a PCollection<InputT>
and returns a PCollectionView<OutputT> whose elements are the result of
combining all the elements in each window of the input PCollection ,
using a specified CombineFn<InputT, AccumT, OutputT> . |
Combine.GroupedValues<K,InputT,OutputT> |
GroupedValues<K, InputT, OutputT> takes a PCollection<KV<K, Iterable<InputT>>> ,
such as the result of GroupByKey , applies a specified CombineFn<InputT, AccumT, OutputT> to each of the input KV<K,
Iterable<InputT>> elements to produce a combined output KV<K, OutputT> element, and
returns a PCollection<KV<K, OutputT>> containing all the combined output elements. |
Combine.Holder<V> |
Holds a single value value of type
V which may or may not be present. |
Combine.IterableCombineFn<V> | |
Combine.PerKey<K,InputT,OutputT> |
PerKey<K, InputT, OutputT> takes a
PCollection<KV<K, InputT>> , groups it by key, applies a
combining function to the InputT values associated with each
key to produce a combined OutputT value, and returns a
PCollection<KV<K, OutputT>> representing a map from each
distinct key of the input PCollection to the corresponding
combined value. |
Combine.PerKeyWithHotKeyFanout<K,InputT,OutputT> |
Like
Combine.PerKey , but sharding the combining of hot keys. |
Combine.SimpleCombineFn<V> | Deprecated |
CombineFnBase |
For internal use only; no backwards-compatibility guarantees.
|
CombineFns |
Static utility methods that create combine function instances.
|
CombineFns.CoCombineResult |
A tuple of outputs produced by a composed combine functions.
|
CombineFns.ComposeCombineFnBuilder |
A builder class to construct a composed
CombineFnBase.GlobalCombineFn . |
CombineFns.ComposedCombineFn<DataT> |
A composed
Combine.CombineFn that applies multiple CombineFns . |
CombineFns.ComposedCombineFnWithContext<DataT> |
A composed
CombineWithContext.CombineFnWithContext that applies multiple
CombineFnWithContexts . |
CombineWithContext |
This class contains combine functions that have access to
PipelineOptions and side inputs
through CombineWithContext.Context . |
CombineWithContext.CombineFnWithContext<InputT,AccumT,OutputT> |
A combine function that has access to
PipelineOptions and side inputs through
CombineWithContext.Context . |
CombineWithContext.Context |
Information accessible to all methods in
CombineFnWithContext
and KeyedCombineFnWithContext . |
Count |
PTransforms to count the elements in a PCollection . |
Create<T> |
Create<T> takes a collection of elements of type T
known when the pipeline is constructed and returns a
PCollection<T> containing the elements. |
Create.OfValueProvider<T> | |
Create.TimestampedValues<T> |
A
PTransform that creates a PCollection whose elements have
associated timestamps. |
Create.Values<T> |
A
PTransform that creates a PCollection from a set of in-memory objects. |
Distinct<T> |
Distinct<T> takes a PCollection<T> and
returns a PCollection<T> that has all distinct elements of the
input. |
Distinct.WithRepresentativeValues<T,IdT> |
A
Distinct PTransform that uses a SerializableFunction to
obtain a representative value for each input element. |
DoFn<InputT,OutputT> |
The argument to
ParDo providing the code to use to process
elements of the input
PCollection . |
DoFn.ProcessContinuation |
When used as a return value of
DoFn.ProcessElement , indicates whether there is more work to
be done for the current element. |
DoFnTester<InputT,OutputT> |
A harness for unit-testing a
DoFn . |
Filter<T> |
PTransform s for filtering from a PCollection the
elements satisfying a predicate, or satisfying an inequality with
a given value based on the elements' natural ordering. |
FlatMapElements<InputT,OutputT> |
PTransform s for mapping a simple function that returns iterables over the elements of a
PCollection and merging the results. |
Flatten |
Flatten<T> takes multiple PCollection<T> s bundled
into a PCollectionList<T> and returns a single
PCollection<T> containing all the elements in all the input
PCollection s. |
Flatten.Iterables<T> |
FlattenIterables<T> takes a PCollection<Iterable<T>> and returns a
PCollection<T> that contains all the elements from each iterable. |
Flatten.PCollections<T> |
A
PTransform that flattens a PCollectionList
into a PCollection containing all the elements of all
the PCollection s in its input. |
GroupByKey<K,V> |
GroupByKey<K, V> takes a PCollection<KV<K, V>> ,
groups the values by key and windows, and returns a
PCollection<KV<K, Iterable<V>>> representing a map from
each distinct key and window of the input PCollection to an
Iterable over all the values associated with that key in
the input per window. |
GroupIntoBatches<K,InputT> |
A
PTransform that batches inputs to a desired batch size. |
Keys<K> |
Keys<K> takes a PCollection of KV<K, V> s and
returns a PCollection<K> of the keys. |
KvSwap<K,V> |
KvSwap<K, V> takes a PCollection<KV<K, V>> and
returns a PCollection<KV<V, K>> , where all the keys and
values have been swapped. |
Latest | |
MapElements<InputT,OutputT> |
PTransform s for mapping a simple function over the elements of a PCollection . |
Materializations |
For internal use only; no backwards-compatibility guarantees.
|
Max |
PTransform s for computing the maximum of the elements in a PCollection , or the
maximum of the values associated with each key in a PCollection of KV s. |
Mean |
PTransform s for computing the arithmetic mean
(a.k.a. |
Min |
PTransform s for computing the minimum of the elements in a PCollection , or the
minimum of the values associated with each key in a PCollection of KV s. |
ParDo |
ParDo is the core element-wise transform in Apache Beam, invoking a user-specified
function on each of the elements of the input PCollection to produce zero or more output
elements, all of which are collected into the output PCollection . |
ParDo.MultiOutput<InputT,OutputT> |
A
PTransform that, when applied to a PCollection<InputT> , invokes a
user-specified DoFn<InputT, OutputT> on all its elements, which can emit elements to
any of the PTransform 's output PCollection s, which are bundled into a result
PCollectionTuple . |
ParDo.SingleOutput<InputT,OutputT> |
A
PTransform that, when applied to a PCollection<InputT> ,
invokes a user-specified DoFn<InputT, OutputT> on all its elements,
with all its outputs collected into an output
PCollection<OutputT> . |
Partition<T> |
Partition takes a PCollection<T> and a
PartitionFn , uses the PartitionFn to split the
elements of the input PCollection into N partitions, and
returns a PCollectionList<T> that bundles N
PCollection<T> s containing the split elements. |
PTransform<InputT extends PInput,OutputT extends POutput> | |
Regex |
PTransorm s to use Regular Expressions to process elements in a PCollection . |
Regex.AllMatches |
Regex.MatchesName<String> takes a PCollection<String> and returns a PCollection<List<String>> representing the value extracted from all the
Regex groups of the input
PCollection to the number of times that element occurs in the input. |
Regex.Find |
Regex.Find<String> takes a PCollection<String> and returns a PCollection<String> representing the value extracted from the Regex groups of the input PCollection to the number of times that element occurs in the input. |
Regex.FindAll |
Regex.Find<String> takes a PCollection<String> and returns a PCollection<List<String>> representing the value extracted from the
Regex groups of the input PCollection to the number of times that element occurs in the input. |
Regex.FindKV |
Regex.MatchesKV<KV<String, String>> takes a PCollection<String> and returns a
PCollection<KV<String, String>> representing the key and value extracted from the Regex
groups of the input PCollection to the number of times that element occurs in the
input. |
Regex.FindName |
Regex.Find<String> takes a PCollection<String> and returns a PCollection<String> representing the value extracted from the Regex groups of the input PCollection to the number of times that element occurs in the input. |
Regex.FindNameKV |
Regex.MatchesKV<KV<String, String>> takes a PCollection<String> and returns a
PCollection<KV<String, String>> representing the key and value extracted from the Regex
groups of the input PCollection to the number of times that element occurs in the
input. |
Regex.Matches |
Regex.Matches<String> takes a PCollection<String> and returns a PCollection<String> representing the value extracted from the Regex groups of the input PCollection to the number of times that element occurs in the input. |
Regex.MatchesKV |
Regex.MatchesKV<KV<String, String>> takes a PCollection<String> and returns a
PCollection<KV<String, String>> representing the key and value extracted from the Regex
groups of the input PCollection to the number of times that element occurs in the
input. |
Regex.MatchesName |
Regex.MatchesName<String> takes a PCollection<String> and returns a PCollection<String> representing the value extracted from the Regex groups of the input PCollection to the number of times that element occurs in the input. |
Regex.MatchesNameKV |
Regex.MatchesNameKV<KV<String, String>> takes a PCollection<String> and returns
a PCollection<KV<String, String>> representing the key and value extracted from the
Regex groups of the input PCollection to the number of times that element occurs in the
input. |
Regex.ReplaceAll |
Regex.ReplaceAll<String> takes a PCollection<String> and returns a PCollection<String> with all Strings that matched the Regex being replaced with the
replacement string. |
Regex.ReplaceFirst |
Regex.ReplaceFirst<String> takes a PCollection<String> and returns a PCollection<String> with the first Strings that matched the Regex being replaced with the
replacement string. |
Regex.Split |
Regex.Split<String> takes a PCollection<String> and returns a PCollection<String> with the input string split into individual items in a list. |
Reshuffle<K,V> | Deprecated
this transform's intended side effects are not portable; it will likely be removed
|
Reshuffle.ViaRandomKey<T> |
Implementation of
Reshuffle.viaRandomKey() . |
Sample |
PTransform s for taking samples of the elements in a
PCollection , or samples of the values associated with each
key in a PCollection of KV s. |
Sample.FixedSizedSampleFn<T> |
CombineFn that computes a fixed-size sample of a
collection of values. |
SerializableFunctions |
Useful
SerializableFunction overrides. |
SimpleFunction<InputT,OutputT> |
A
SerializableFunction which is not a functional interface. |
Sum |
PTransform s for computing the sum of the elements in a
PCollection , or the sum of the values associated with
each key in a PCollection of KV s. |
Top |
PTransform s for finding the largest (or smallest) set
of elements in a PCollection , or the largest (or smallest)
set of values associated with each key in a PCollection of
KV s. |
Top.Largest<T extends java.lang.Comparable<? super T>> | Deprecated
use
Top.Natural instead |
Top.Natural<T extends java.lang.Comparable<? super T>> |
A
Serializable Comparator that that uses the compared elements' natural
ordering. |
Top.Reversed<T extends java.lang.Comparable<? super T>> |
Serializable Comparator that that uses the reverse of the compared elements'
natural ordering. |
Top.Smallest<T extends java.lang.Comparable<? super T>> | Deprecated
use
Top.Reversed instead |
Top.TopCombineFn<T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> |
CombineFn for Top transforms that combines a
bunch of T s into a single count -long
List<T> , using compareFn to choose the largest
T s. |
ToString |
PTransforms for converting a PCollection<?> ,
PCollection<KV<?,?>> , or
PCollection<Iterable<?>>
to a PCollection<String> . |
Values<V> |
Values<V> takes a PCollection of KV<K, V> s and
returns a PCollection<V> of the values. |
View |
Transforms for creating
PCollectionViews from
PCollections (to read them as side inputs). |
View.AsIterable<T> |
For internal use only; no backwards-compatibility guarantees.
|
View.AsList<T> |
For internal use only; no backwards-compatibility guarantees.
|
View.AsMap<K,V> |
For internal use only; no backwards-compatibility guarantees.
|
View.AsMultimap<K,V> |
For internal use only; no backwards-compatibility guarantees.
|
View.AsSingleton<T> |
For internal use only; no backwards-compatibility guarantees.
|
View.CreatePCollectionView<ElemT,ViewT> |
For internal use only; no backwards-compatibility guarantees.
|
ViewFn<PrimitiveViewT,ViewT> |
For internal use only; no backwards-compatibility guarantees.
|
Watch |
Given a "poll function" that produces a potentially growing set of outputs for an input, this
transform simultaneously continuously watches the growth of output sets of all inputs, until a
per-input termination condition is reached.
|
Watch.Growth<InputT,OutputT> | |
Watch.Growth.PollResult<OutputT> |
The result of a single invocation of a
Watch.Growth.PollFn . |
WithKeys<K,V> |
WithKeys<K, V> takes a PCollection<V> , and either a
constant key of type K or a function from V to
K , and returns a PCollection<KV<K, V>> , where each
of the values in the input PCollection has been paired with
either the constant key or a key computed from the value. |
WithTimestamps<T> |
A
PTransform for assigning timestamps to all the elements of a PCollection . |
Enum | Description |
---|---|
DoFnTester.CloningBehavior |
When a
DoFnTester should clone the DoFn under test and how it should manage
the lifecycle of the DoFn . |
Annotation Type | Description |
---|---|
DoFn.BoundedPerElement |
Annotation on a splittable
DoFn
specifying that the DoFn performs a bounded amount of work per input element, so
applying it to a bounded PCollection will produce also a bounded PCollection . |
DoFn.FinishBundle |
Annotation for the method to use to finish processing a batch of elements.
|
DoFn.GetInitialRestriction |
Annotation for the method that maps an element to an initial restriction for a splittable
DoFn . |
DoFn.GetRestrictionCoder |
Annotation for the method that returns the coder to use for the restriction of a splittable
DoFn . |
DoFn.NewTracker |
Annotation for the method that creates a new
RestrictionTracker for the restriction of
a splittable DoFn . |
DoFn.OnTimer |
Annotation for registering a callback for a timer.
|
DoFn.ProcessElement |
Annotation for the method to use for processing elements.
|
DoFn.Setup |
Annotation for the method to use to prepare an instance for processing bundles of elements.
|
DoFn.SplitRestriction |
Annotation for the method that splits restriction of a splittable
DoFn into multiple parts to
be processed in parallel. |
DoFn.StartBundle |
Annotation for the method to use to prepare an instance for processing a batch of elements.
|
DoFn.StateId |
Annotation for declaring and dereferencing state cells.
|
DoFn.Teardown |
Annotation for the method to use to clean up this instance after processing bundles of
elements.
|
DoFn.TimerId |
Annotation for declaring and dereferencing timers.
|
DoFn.UnboundedPerElement |
Annotation on a splittable
DoFn
specifying that the DoFn performs an unbounded amount of work per input element, so
applying it to a bounded PCollection will produce an unbounded PCollection . |
PTransform
s for transforming
data in a pipeline.
A PTransform
is an operation that takes an
InputT
(some subtype of PInput
)
and produces an
OutputT
(some subtype of POutput
).
Common PTransforms include root PTransforms like
TextIO.Read
and
Create
, processing and
conversion operations like ParDo
,
GroupByKey
,
CoGroupByKey
,
Combine
, and
Count
, and outputting
PTransforms like
TextIO.Write
.
New PTransforms can be created by composing existing PTransforms. Most PTransforms in this package are composites, and users can also create composite PTransforms for their own application-specific logic.