Class EvaluationContext
java.lang.Object
org.apache.beam.runners.spark.translation.EvaluationContext
The EvaluationContext allows us to define pipeline instructions and translate between
PObject<T>
s or PCollection<T>
s and Ts or DStreams/RDDs of Ts.-
Constructor Summary
ConstructorsConstructorDescriptionEvaluationContext
(org.apache.spark.api.java.JavaSparkContext jsc, Pipeline pipeline, PipelineOptions options) EvaluationContext
(org.apache.spark.api.java.JavaSparkContext jsc, Pipeline pipeline, PipelineOptions options, org.apache.spark.streaming.api.java.JavaStreamingContext jssc) -
Method Summary
Modifier and TypeMethodDescriptionborrowDataset
(PTransform<? extends PValue, ?> transform) borrowDataset
(PValue pvalue) void
Computes the outputs for all RDDs that are leaves in the DAG and do not have any actions (like saving to a file) registered on them (i.e.<T> T
Retrieve an object of Type T associated with the PValue passed in.Get the map of cache candidates hold by the evaluation context.Map
<GroupByKey<?, ?>, String> Get the map of GBK transforms to their full names, which are candidates for group by key and window translation which aims to reduce memory usage.org.apache.beam.sdk.runners.AppliedPTransform
<?, ?, ?> <T extends PValue>
TgetInput
(PTransform<T, ?> transform) <T> Map
<TupleTag<?>, PCollection<?>> getInputs
(PTransform<?, ?> transform) <T extends PValue>
TgetOutput
(PTransform<?, T> transform) Map
<TupleTag<?>, PCollection<?>> getOutputs
(PTransform<?, ?> transform) Map
<PCollection<?>, Integer> Get the map ofPCollection
to the number ofPTransform
consuming it.Return the current views creates in the pipeline.org.apache.beam.runners.core.construction.SerializablePipelineOptions
org.apache.spark.api.java.JavaSparkContext
org.apache.spark.streaming.api.java.JavaStreamingContext
<K,
V> boolean isCandidateForGroupByKeyAndWindow
(GroupByKey<K, V> transform) Returns if given GBK transform can be considered as candidate for group by key and window translation aiming to reduce memory usage.boolean
isLeaf
(PCollection<?> pCollection) Check if givenPCollection
is a leaf or not.boolean
Checks if any of the side inputs in the pipeline are streaming side inputs.void
putDataset
(PTransform<?, ? extends PValue> transform, Dataset dataset) Add single output of transform to context map and possibly cache if it conformsshouldCache(PTransform, PValue)
.void
putDataset
(PValue pvalue, Dataset dataset) Add output of transform to context map and possibly cache if it conformsshouldCache(PTransform, PValue)
.void
putPView
(PCollectionView<?> view, Iterable<WindowedValue<?>> value, Coder<Iterable<WindowedValue<?>>> coder) Adds/Replaces a view to the current views creates in the pipeline.void
reportPCollectionConsumed
(PCollection<?> pCollection) Reports that givenPCollection
is consumed by aPTransform
in the pipeline.void
reportPCollectionProduced
(PCollection<?> pCollection) Reports that givenPCollection
is consumed by aPTransform
in the pipeline.void
setCurrentTransform
(org.apache.beam.sdk.runners.AppliedPTransform<?, ?, ?> transform) boolean
shouldCache
(PTransform<?, ? extends PValue> transform, PValue pvalue) Cache PCollection if SparkPipelineOptions.isCacheDisabled is false or transform isn't GroupByKey transformation and PCollection is used more then once in Pipeline.void
Marks that the pipeline contains at least one streaming side input.
-
Constructor Details
-
EvaluationContext
public EvaluationContext(org.apache.spark.api.java.JavaSparkContext jsc, Pipeline pipeline, PipelineOptions options) -
EvaluationContext
public EvaluationContext(org.apache.spark.api.java.JavaSparkContext jsc, Pipeline pipeline, PipelineOptions options, org.apache.spark.streaming.api.java.JavaStreamingContext jssc)
-
-
Method Details
-
getSparkContext
public org.apache.spark.api.java.JavaSparkContext getSparkContext() -
getStreamingContext
public org.apache.spark.streaming.api.java.JavaStreamingContext getStreamingContext() -
getPipeline
-
getOptions
-
getSerializableOptions
public org.apache.beam.runners.core.construction.SerializablePipelineOptions getSerializableOptions() -
setCurrentTransform
public void setCurrentTransform(org.apache.beam.sdk.runners.AppliedPTransform<?, ?, ?> transform) -
getCurrentTransform
public org.apache.beam.sdk.runners.AppliedPTransform<?,?, getCurrentTransform()?> -
getInput
-
getInputs
-
getOutput
-
getOutputs
-
getOutputCoders
-
shouldCache
Cache PCollection if SparkPipelineOptions.isCacheDisabled is false or transform isn't GroupByKey transformation and PCollection is used more then once in Pipeline.PCollection is not cached in GroupByKey transformation, because Spark automatically persists some intermediate data in shuffle operations, even without users calling persist.
- Parameters:
transform
- the transform to checkpvalue
- output of transform- Returns:
- if PCollection will be cached
-
putDataset
Add single output of transform to context map and possibly cache if it conformsshouldCache(PTransform, PValue)
.- Parameters:
transform
- from which Dataset was createddataset
- created Dataset from transform
-
putDataset
Add output of transform to context map and possibly cache if it conformsshouldCache(PTransform, PValue)
. Used when PTransform has multiple outputs.- Parameters:
pvalue
- one of multiple outputs of transformdataset
- created Dataset from transform
-
borrowDataset
-
borrowDataset
-
computeOutputs
public void computeOutputs()Computes the outputs for all RDDs that are leaves in the DAG and do not have any actions (like saving to a file) registered on them (i.e. they are performed for side effects). -
get
Retrieve an object of Type T associated with the PValue passed in.- Type Parameters:
T
- Type of object to return.- Parameters:
value
- PValue to retrieve associated data for.- Returns:
- Native object.
-
getPViews
Return the current views creates in the pipeline.- Returns:
- SparkPCollectionView
-
putPView
public void putPView(PCollectionView<?> view, Iterable<WindowedValue<?>> value, Coder<Iterable<WindowedValue<?>>> coder) Adds/Replaces a view to the current views creates in the pipeline.- Parameters:
view
- - Identifier of the viewvalue
- - Actual value of the viewcoder
- - Coder of the value
-
getCacheCandidates
Get the map of cache candidates hold by the evaluation context.- Returns:
- The current
Map
of cache candidates.
-
getCandidatesForGroupByKeyAndWindowTranslation
Get the map of GBK transforms to their full names, which are candidates for group by key and window translation which aims to reduce memory usage.- Returns:
- The current
Map
of candidates
-
isCandidateForGroupByKeyAndWindow
Returns if given GBK transform can be considered as candidate for group by key and window translation aiming to reduce memory usage.- Type Parameters:
K
- type of GBK keyV
- type of GBK value- Parameters:
transform
- to evaluate- Returns:
- true if given transform is a candidate; false otherwise
-
reportPCollectionConsumed
Reports that givenPCollection
is consumed by aPTransform
in the pipeline.- See Also:
-
reportPCollectionProduced
Reports that givenPCollection
is consumed by aPTransform
in the pipeline.- See Also:
-
getPCollectionConsumptionMap
Get the map ofPCollection
to the number ofPTransform
consuming it.- Returns:
-
isLeaf
Check if givenPCollection
is a leaf or not.PCollection
is a leaf when there is no otherPTransform
consuming it / depending on it.- Parameters:
pCollection
- to be checked if it is a leaf- Returns:
- true if pCollection is leaf; otherwise false
-
storageLevel
-
isStreamingSideInput
public boolean isStreamingSideInput()Checks if any of the side inputs in the pipeline are streaming side inputs.If at least one of the side inputs is a streaming side input, this method returns true. When streaming side inputs are present, the
CachedSideInputReader
will not be used.- Returns:
- true if any of the side inputs in the pipeline are streaming side inputs, false otherwise
-
useStreamingSideInput
public void useStreamingSideInput()Marks that the pipeline contains at least one streaming side input.When this method is called, it sets the streamingSideInput flag to true, indicating that the
CachedSideInputReader
should not be used for processing side inputs.
-