Package org.apache.beam.sdk.transforms
@DefaultAnnotation(org.checkerframework.checker.nullness.qual.NonNull.class)
package org.apache.beam.sdk.transforms
Defines
PTransforms for transforming data in a pipeline.
A PTransform is an operation that takes an
InputT (some subtype of PInput) and produces an
OutputT (some subtype of POutput).
Common PTransforms include root PTransforms like TextIO.Read
and Create, processing and conversion operations like
ParDo, GroupByKey,
CoGroupByKey, Combine, and Count, and
outputting PTransforms like TextIO.Write.
New PTransforms can be created by composing existing PTransforms. Most PTransforms in this package are composites, and users can also create composite PTransforms for their own application-specific logic.
-
ClassDescription
PTransforms for getting an idea of aPCollection's data distribution using approximateN-tiles (e.g.ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT extends Comparator<T> & Serializable> TheApproximateQuantilesCombineFncombiner gives an idea of the distribution of a collection of values using approximateN-tiles.Deprecated.CombineFnthat computes an estimate of the number of distinct values that were combined.A heap utility class to efficiently track the largest added elements.PTransformfor estimating the number of distinct elements in aPCollection.PTransformfor estimating the number of distinct values associated with each key in aPCollectionofKVs.PTransforms for combiningPCollectionelements globally and per-key.Combine.AccumulatingCombineFn<InputT,AccumT extends Combine.AccumulatingCombineFn.Accumulator<InputT, AccumT, OutputT>, OutputT> ACombineFnthat uses a subclass ofCombine.AccumulatingCombineFn.Accumulatoras its accumulator type.Combine.AccumulatingCombineFn.Accumulator<InputT,AccumT, OutputT> The type of mutable accumulator values used by thisAccumulatingCombineFn.An abstract subclass ofCombine.CombineFnfor implementing combiners that are more easily and efficiently expressed as binary operations ondoubles.An abstract subclass ofCombine.CombineFnfor implementing combiners that are more easily expressed as binary operations.An abstract subclass ofCombine.CombineFnfor implementing combiners that are more easily and efficiently expressed as binary operations onintsAn abstract subclass ofCombine.CombineFnfor implementing combiners that are more easily and efficiently expressed as binary operations onlongs.Combine.CombineFn<InputT extends @Nullable Object,AccumT extends @Nullable Object, OutputT extends @Nullable Object> ACombineFn<InputT, AccumT, OutputT>specifies how to combine a collection of input values of typeInputTinto a single output value of typeOutputT.Combine.Globally<InputT,OutputT> Combine.Globally<InputT, OutputT>takes aPCollection<InputT>and returns aPCollection<OutputT>whose elements are the result of combining all the elements in each window of the inputPCollection, using a specifiedCombineFn<InputT, AccumT, OutputT>.Combine.GloballyAsSingletonView<InputT,OutputT> Combine.GloballyAsSingletonView<InputT, OutputT>takes aPCollection<InputT>and returns aPCollectionView<OutputT>whose elements are the result of combining all the elements in each window of the inputPCollection, using a specifiedCombineFn<InputT, AccumT, OutputT>.Combine.GroupedValues<K,InputT, OutputT> GroupedValues<K, InputT, OutputT>takes aPCollection<KV<K, Iterable<InputT>>>, such as the result ofGroupByKey, applies a specifiedCombineFn<InputT, AccumT, OutputT>to each of the inputKV<K, Iterable<InputT>>elements to produce a combined outputKV<K, OutputT>element, and returns aPCollection<KV<K, OutputT>>containing all the combined output elements.Holds a single value value of typeVwhich may or may not be present.Combine.PerKey<K,InputT, OutputT> PerKey<K, InputT, OutputT>takes aPCollection<KV<K, InputT>>, groups it by key, applies a combining function to theInputTvalues associated with each key to produce a combinedOutputTvalue, and returns aPCollection<KV<K, OutputT>>representing a map from each distinct key of the inputPCollectionto the corresponding combined value.Combine.PerKeyWithHotKeyFanout<K,InputT, OutputT> LikeCombine.PerKey, but sharding the combining of hot keys.Deprecated.For internal use only; no backwards-compatibility guarantees.CombineFnBase.GlobalCombineFn<InputT,AccumT, OutputT> For internal use only; no backwards-compatibility guarantees.Static utility methods that create combine function instances.A tuple of outputs produced by a composed combine functions.A builder class to construct a composedCombineFnBase.GlobalCombineFn.CombineFns.ComposedCombineFn<DataT>A composedCombine.CombineFnthat applies multipleCombineFns.A composedCombineWithContext.CombineFnWithContextthat applies multipleCombineFnWithContexts.This class contains combine functions that have access toPipelineOptionsand side inputs throughCombineWithContext.Context.CombineWithContext.CombineFnWithContext<InputT,AccumT, OutputT> A combine function that has access toPipelineOptionsand side inputs throughCombineWithContext.Context.Information accessible to all methods inCombineFnWithContextandKeyedCombineFnWithContext.An internal interface for signaling that aGloballyCombineFnor aPerKeyCombineFnneeds to accessCombineWithContext.Context.Contextful<ClosureT>Pair of a bit of user code (a "closure") and theRequirementsneeded to run it.Contextful.Fn<InputT,OutputT> A function from an input to an output that may additionally accessContextful.Fn.Contextwhen computing the result.An accessor for additional capabilities available inContextful.Fn.apply(InputT, org.apache.beam.sdk.transforms.Contextful.Fn.Context).PTransformsto count the elements in aPCollection.Create<T>Create<T>takes a collection of elements of typeTknown when the pipeline is constructed and returns aPCollection<T>containing the elements.APTransformthat creates aPCollectionwhose elements have associated timestamps.APTransformthat creates aPCollectionfrom a set of in-memory objects.APTransformthat creates aPCollectionwhose elements have associated windowing metadata.A set ofPTransforms which deduplicate input records over a time domain and threshold.Deduplicates keyed values using the key over a specified time domain and threshold.Deduplicates values over a specified time domain and threshold.APTransformthat uses aSerializableFunctionto obtain a representative value for each input element used for deduplication.Distinct<T>Distinct<T>takes aPCollection<T>and returns aPCollection<T>that has all distinct elements of the input.ADistinctPTransformthat uses aSerializableFunctionto obtain a representative value for each input element.The argument toParDoproviding the code to use to process elements of the inputPCollection.Annotation for declaring that a state parameter is always fetched.Annotation on a splittableDoFnspecifying that theDoFnperforms a bounded amount of work per input element, so applying it to a boundedPCollectionwill produce also a boundedPCollection.A parameter that is accessible during@StartBundle,@ProcessElementand@FinishBundlethat allows the caller to register a callback that will be invoked after the bundle has been successfully completed and the runner has commit the output.An instance of a function that will be invoked after bundle finalization.Parameter annotation for the input element forDoFn.ProcessElement,DoFn.GetInitialRestriction,DoFn.GetSize,DoFn.SplitRestriction,DoFn.GetInitialWatermarkEstimatorState,DoFn.NewWatermarkEstimator, andDoFn.NewTrackermethods.Annotation for specifying specific fields that are accessed in a Schema PCollection.Annotation for the method to use to finish processing a batch of elements.Annotation for the method that maps an element to an initial restriction for a splittableDoFn.Annotation for the method that maps an element and restriction to initial watermark estimator state for a splittableDoFn.Annotation for the method that returns the coder to use for the restriction of a splittableDoFn.Annotation for the method that returns the corresponding size for an element and restriction pair.Annotation for the method that returns the coder to use for the watermark estimator state of a splittableDoFn.Parameter annotation for dereferencing input element key inKVpair.Receives tagged output for a multi-output function.Annotation for the method that creates a newRestrictionTrackerfor the restriction of a splittableDoFn.Annotation for the method that creates a newWatermarkEstimatorfor the watermark state of a splittableDoFn.Annotation for registering a callback for a timer.Annotation for registering a callback for a timerFamily.Annotation for the method to use for performing actions on window expiration.Receives values of the given type.When used as a return value ofDoFn.ProcessElement, indicates whether there is more work to be done for the current element.Annotation for the method to use for processing elements.Annotation that may be added to aDoFn.ProcessElement,DoFn.OnTimer, orDoFn.OnWindowExpirationmethod to indicate that the runner must ensure that the observable contents of the inputPCollectionor mutable state must be stable upon retries.Annotation that may be added to aDoFn.ProcessElementmethod to indicate that the runner must ensure that the observable contents of the inputPCollectionis sorted by time, in ascending order.Parameter annotation for the restriction forDoFn.GetSize,DoFn.SplitRestriction,DoFn.GetInitialWatermarkEstimatorState,DoFn.NewWatermarkEstimator, andDoFn.NewTrackermethods.Annotation for the method to use to prepare an instance for processing bundles of elements.Parameter annotation for the SideInput for aDoFn.ProcessElementmethod.Annotation for the method that splits restriction of a splittableDoFninto multiple parts to be processed in parallel.Annotation for the method to use to prepare an instance for processing a batch of elements.Annotation for declaring and dereferencing state cells.Annotation for the method to use to clean up this instance before it is discarded.Parameter annotation for the TimerMap for aDoFn.ProcessElementmethod.Annotation for declaring and dereferencing timers.Parameter annotation for the input element timestamp forDoFn.ProcessElement,DoFn.GetInitialRestriction,DoFn.GetSize,DoFn.SplitRestriction,DoFn.GetInitialWatermarkEstimatorState,DoFn.NewWatermarkEstimator, andDoFn.NewTrackermethods.Annotation for the method that truncates the restriction of a splittableDoFninto a bounded one.Annotation on a splittableDoFnspecifying that theDoFnperforms an unbounded amount of work per input element, so applying it to a boundedPCollectionwill produce an unboundedPCollection.Parameter annotation for the watermark estimator state for theDoFn.NewWatermarkEstimatormethod.CommonDoFn.OutputReceiverandDoFn.MultiOutputReceiverclasses.Represents information about how a DoFn extracts schemas.The builder object.DoFnTester<InputT,OutputT> Deprecated.UseTestPipelinewith theDirectRunner.Deprecated.UseTestPipelinewith theDirectRunner.ExternalTransformBuilder<ConfigT,InputT extends PInput, OutputT extends POutput> An interface for building a transform from an externally provided configuration.Filter<T>PTransforms for filtering from aPCollectionthe elements satisfying a predicate, or satisfying an inequality with a given value based on the elements' natural ordering.FlatMapElements<InputT,OutputT> PTransforms for mapping a simple function that returns iterables over the elements of aPCollectionand merging the results.FlatMapElements.FlatMapWithFailures<InputT,OutputT, FailureT> APTransformthat adds exception handling toFlatMapElements.Flatten<T>takes multiplePCollection<T>s bundled into aPCollectionList<T>and returns a singlePCollection<T>containing all the elements in all the inputPCollections.FlattenIterables<T>takes aPCollection<Iterable<T>>and returns aPCollection<T>that contains all the elements from each iterable.APTransformthat flattens aPCollectionListinto aPCollectioncontaining all the elements of all thePCollections in its input.GroupByEncryptedKey<K,V> APTransformthat provides a secure alternative toGroupByKey.GroupByKey<K,V> GroupByKey<K, V>takes aPCollection<KV<K, V>>, groups the values by key and windows, and returns aPCollection<KV<K, Iterable<V>>>representing a map from each distinct key and window of the inputPCollectionto anIterableover all the values associated with that key in the input per window.GroupIntoBatches<K,InputT> APTransformthat batches inputs to a desired batch size.GroupIntoBatches.BatchingParams<InputT>Wrapper class for batching parameters supplied by users.For internal use only; no backwards-compatibility guarantees.InferableFunction<InputT,OutputT> AProcessFunctionwhich is not a functional interface.The result of aJsonToRow.withExceptionReporting(Schema)transform.Keys<K>Keys<K>takes aPCollectionofKV<K, V>s and returns aPCollection<K>of the keys.KvSwap<K,V> KvSwap<K, V>takes aPCollection<KV<K, V>>and returns aPCollection<KV<V, K>>, where all the keys and values have been swapped.MapElements<InputT,OutputT> PTransforms for mapping a simple function over the elements of aPCollection.MapElements.MapWithFailures<InputT,OutputT, FailureT> APTransformthat adds exception handling toMapElements.MapKeys<K1,K2, V> MapKeysmaps aSerializableFunction<K1,K2>over keys of aPCollection<KV<K1,V>>and returns aPCollection<KV<K2, V>>.MapValues<K,V1, V2> MapValuesmaps aSerializableFunction<V1,V2>over values of aPCollection<KV<K,V1>>and returns aPCollection<KV<K, V2>>.For internal use only; no backwards-compatibility guarantees.For internal use only; no backwards-compatibility guarantees.Represents thePrimitiveViewTsupplied to theViewFnwhen it declares to use theiterable materialization.Represents thePrimitiveViewTsupplied to theViewFnwhen it declares to use themultimap materialization.PTransforms for computing the maximum of the elements in aPCollection, or the maximum of the values associated with each key in aPCollectionofKVs.PTransforms for computing the arithmetic mean (a.k.a.PTransforms for computing the minimum of the elements in aPCollection, or the minimum of the values associated with each key in aPCollectionofKVs.ParDois the core element-wise transform in Apache Beam, invoking a user-specified function on each of the elements of the inputPCollectionto produce zero or more output elements, all of which are collected into the outputPCollection.ParDo.MultiOutput<InputT,OutputT> APTransformthat, when applied to aPCollection<InputT>, invokes a user-specifiedDoFn<InputT, OutputT>on all its elements, which can emit elements to any of thePTransform's outputPCollections, which are bundled into a resultPCollectionTuple.ParDo.SingleOutput<InputT,OutputT> APTransformthat, when applied to aPCollection<InputT>, invokes a user-specifiedDoFn<InputT, OutputT>on all its elements, with all its outputs collected into an outputPCollection<OutputT>.Partition<T>Partitiontakes aPCollection<T>and aPartitionFn, uses thePartitionFnto split the elements of the inputPCollectionintoNpartitions, and returns aPCollectionList<T>that bundlesNPCollection<T>s containing the split elements.A function object that chooses an output partition for an element.A function object that chooses an output partition for an element.APTransformwhich produces a sequence of elements at fixed runtime intervals.APTransformwhich generates a sequence of timestamped elements at given runtime intervals.ProcessFunction<InputT,OutputT> A function that computes an output value of typeOutputTfrom an input value of typeInputTand isSerializable.A family ofPTransformsthat returns aPCollectionequivalent to its input but functions as an operational hint to a runner that redistributing the data in some way is likely useful.Noop transform that hints to the runner to try to redistribute the work evenly, or via whatever clever strategy the runner comes up with.Registers translators for the Redistribute family of transforms.PTransforms to use Regular Expressions to process elements in aPCollection.Regex.MatchesName<String>takes aPCollection<String>and returns aPCollection<List<String>>representing the value extracted from all the Regex groups of the inputPCollectionto the number of times that element occurs in the input.Regex.Find<String>takes aPCollection<String>and returns aPCollection<String>representing the value extracted from the Regex groups of the inputPCollectionto the number of times that element occurs in the input.Regex.Find<String>takes aPCollection<String>and returns aPCollection<List<String>>representing the value extracted from the Regex groups of the inputPCollectionto the number of times that element occurs in the input.Regex.MatchesKV<KV<String, String>>takes aPCollection<String>and returns aPCollection<KV<String, String>>representing the key and value extracted from the Regex groups of the inputPCollectionto the number of times that element occurs in the input.Regex.Find<String>takes aPCollection<String>and returns aPCollection<String>representing the value extracted from the Regex groups of the inputPCollectionto the number of times that element occurs in the input.Regex.MatchesKV<KV<String, String>>takes aPCollection<String>and returns aPCollection<KV<String, String>>representing the key and value extracted from the Regex groups of the inputPCollectionto the number of times that element occurs in the input.Regex.Matches<String>takes aPCollection<String>and returns aPCollection<String>representing the value extracted from the Regex groups of the inputPCollectionto the number of times that element occurs in the input.Regex.MatchesKV<KV<String, String>>takes aPCollection<String>and returns aPCollection<KV<String, String>>representing the key and value extracted from the Regex groups of the inputPCollectionto the number of times that element occurs in the input.Regex.MatchesName<String>takes aPCollection<String>and returns aPCollection<String>representing the value extracted from the Regex groups of the inputPCollectionto the number of times that element occurs in the input.Regex.MatchesNameKV<KV<String, String>>takes aPCollection<String>and returns aPCollection<KV<String, String>>representing the key and value extracted from the Regex groups of the inputPCollectionto the number of times that element occurs in the input.Regex.ReplaceAll<String>takes aPCollection<String>and returns aPCollection<String>with all Strings that matched the Regex being replaced with the replacement string.Regex.ReplaceFirst<String>takes aPCollection<String>and returns aPCollection<String>with the first Strings that matched the Regex being replaced with the replacement string.Regex.Split<String>takes aPCollection<String>and returns aPCollection<String>with the input string split into individual items in a list.PTransformsfor converting between explicit and implicit form of various Beam values.Describes the run-time requirements of aContextful, such as access to side inputs.Reshuffle<K,V> For internal use only; no backwards compatibility guarantees.Implementation ofReshuffle.viaRandomKey().PTransforms for taking samples of the elements in aPCollection, or samples of the values associated with each key in aPCollectionofKVs.CombineFnthat computes a fixed-size sample of a collection of values.SerializableBiConsumer<FirstInputT,SecondInputT> A union of theBiConsumerandSerializableinterfaces.SerializableBiFunction<FirstInputT,SecondInputT, OutputT> A union of theBiFunctionandSerializableinterfaces.AComparatorthat is alsoSerializable.SerializableFunction<InputT,OutputT> A function that computes an output value of typeOutputTfrom an input value of typeInputT, isSerializable, and does not allow checked exceptions to be declared.UsefulSerializableFunctionoverrides.ThePTransforms that allow to compute different set functions acrossPCollections.SimpleFunction<InputT,OutputT> ASerializableFunctionwhich is not a functional interface.PTransforms for computing the sum of the elements in aPCollection, or the sum of the values associated with each key in aPCollectionofKVs.Tee<T>A PTransform that returns its input, but also applies its input to an auxiliary PTransform, akin to the shellteecommand, which is named after the T-splitter used in plumbing.ToJson<T>Creates aPTransformthat serializes UTF-8 JSON objects from aSchema-aware PCollection (i.e.PTransforms for finding the largest (or smallest) set of elements in aPCollection, or the largest (or smallest) set of values associated with each key in aPCollectionofKVs.Top.Largest<T extends Comparable<? super T>>Deprecated.useTop.NaturalinsteadTop.Natural<T extends Comparable<? super T>>ASerializableComparatorthat that uses the compared elements' natural ordering.Top.Reversed<T extends Comparable<? super T>>SerializableComparatorthat that uses the reverse of the compared elements' natural ordering.Top.Smallest<T extends Comparable<? super T>>Deprecated.useTop.ReversedinsteadTop.TopCombineFn<T,ComparatorT extends Comparator<T> & Serializable> CombineFnforToptransforms that combines a bunch ofTs into a singlecount-longList<T>, usingcompareFnto choose the largestTs.PTransformsfor converting aPCollection<?>,PCollection<KV<?,?>>, orPCollection<Iterable<?>>to aPCollection<String>.Values<V>Values<V>takes aPCollectionofKV<K, V>s and returns aPCollection<V>of the values.Transforms for creatingPCollectionViewsfromPCollections(to read them as side inputs).For internal use only; no backwards-compatibility guarantees.View.AsList<T>For internal use only; no backwards-compatibility guarantees.View.AsMap<K,V> For internal use only; no backwards-compatibility guarantees.View.AsMultimap<K,V> For internal use only; no backwards-compatibility guarantees.For internal use only; no backwards-compatibility guarantees.View.CreatePCollectionView<ElemT,ViewT> For internal use only; no backwards-compatibility guarantees.Provides an index to value mapping using a random starting index and also provides an offset range for each window seen.ViewFn<PrimitiveViewT,ViewT> For internal use only; no backwards-compatibility guarantees.Delays processing of each window in aPCollectionuntil signaled.Implementation ofWait.on(org.apache.beam.sdk.values.PCollection<?>...).Given a "poll function" that produces a potentially growing set of outputs for an input, this transform simultaneously continuously watches the growth of output sets of all inputs, until a per-input termination condition is reached.Watch.Growth<InputT,OutputT, KeyT> Watch.Growth.PollFn<InputT,OutputT> A function that computes the current set of outputs for the given input, in the form of aWatch.Growth.PollResult.Watch.Growth.PollResult<OutputT>The result of a single invocation of aWatch.Growth.PollFn.Watch.Growth.TerminationCondition<InputT,StateT> A strategy for determining whether it is time to stop polling the current input regardless of whether its output is complete or not.Watch.WatchGrowthFn<InputT,OutputT, KeyT, TerminationStateT> A collection of utilities for writing transforms that can handle exceptions raised during processing of elements.A simple handler that extracts information from an exception to aMap<String, String>and returns aKVwhere the key is the input element that failed processing, and the value is the map of exception attributes.The value type passed as input to exception handlers.An intermediate output type for PTransforms that allows an output collection to live alongside a collection of elements that failed the transform.WithKeys<K,V> WithKeys<K, V>takes aPCollection<V>, and either a constant key of typeKor a function fromVtoK, and returns aPCollection<KV<K, V>>, where each of the values in the inputPCollectionhas been paired with either the constant key or a key computed from the value.APTransformfor assigning timestamps to all the elements of aPCollection.
ApproximateCountDistinctin thezetasketchextension module, which makes use of theHllCountimplementation.