What is being computed?

back to collapsed details

What is being computed?


ParDo
GroupByKey
Flatten
Combine
Composite Transforms
Side Inputs
Source API
Metrics
Stateful Processing

Google Cloud Dataflow	Apache Flink	Apache Spark (RDD/DStream based)	Apache Spark Structured Streaming (Dataset based)	Apache Samza	Apache Nemo	Hazelcast Jet	Twister2	Python Direct FnRunner	Go Direct Runner

Yes : fully supported Batch mode uses large bundle sizes. Streaming uses smaller bundle sizes.	Yes : fully supported ParDo itself, as per-element transformation with UDFs, is fully supported by Flink for both batch and streaming.	Yes : fully supported ParDo applies per-element transformations as Spark FlatMapFunction.	Partially : fully supported in batch mode ParDo applies per-element transformations as Spark FlatMapFunction.	Yes : fully supported Supported with per-element transformation.	Yes : fully supported	Yes : fully supported	Yes : fully supported
Yes : fully supported	Yes : fully supported Uses Flink's keyBy for key grouping. When grouping by window in streaming (creating the panes) the Flink runner uses the Beam code. This guarantees support for all windowing and triggering mechanisms.	Partially : fully supported in batch mode Using Spark's <tt>groupByKey</tt>. GroupByKey with multiple trigger firings in streaming mode is a work in progress.	Partially : fully supported in batch mode Using Spark's <tt>groupByKey</tt>.	Yes : fully supported Uses Samza's partitionBy for key grouping and Beam's logic for window aggregation and triggering.	Yes : fully supported	Yes : fully supported	Yes : fully supported
Yes : fully supported	Yes : fully supported	Yes : fully supported	Partially : fully supported in batch mode Some corner cases like flatten on empty collections are not yet supported.	Yes : fully supported	Yes : fully supported	Yes : fully supported	Yes : fully supported
Yes : efficient execution	Yes : fully supported Uses a combiner for pre-aggregation for batch and streaming.	Yes : fully supported Using Spark's <tt>combineByKey</tt> and <tt>aggregate</tt> functions.	Partially : fully supported in batch mode Using Spark's <tt>Aggregator</tt> and agg function	Yes : fully supported Use combiner for efficient pre-aggregation.	Yes : fully supported Batch mode uses pre-aggregation	Yes : fully supported Batch mode uses pre-aggregation	Yes : fully supported
Partially : supported via inlining Currently composite transformations are inlined during execution. The structure is later recreated from the names, but other transform level information (if added to the model) will be lost.	Partially : supported via inlining	Partially : supported via inlining	Partially : supported via inlining only in batch mode	Partially : supported via inlining	Yes : fully supported	Partially : supported via inlining	Partially : supported via inlining
Yes : some size restrictions in streaming Batch mode supports a distributed implementation, but streaming mode may force some size restrictions. Neither mode is able to push lookups directly up into key-based sources.	Yes : some size restrictions in streaming Batch mode supports a distributed implementation, but streaming mode may force some size restrictions. Neither mode is able to push lookups directly up into key-based sources.	Yes : fully supported Using Spark's broadcast variables. In streaming mode, side inputs may update but only between micro-batches.	Partially : fully supported in batch mode Using Spark's broadcast variables.	Yes : fully supported Uses Samza's broadcast operator to distribute the side inputs.	Yes : fully supported	Partially : with restrictions Supported only when the side input source is bounded and windowing uses global window	Yes : fully supported
Yes : fully supported Support includes autotuning features (https://cloud.google.com/dataflow/service/dataflow-service-desc#autotuning-features).	Yes : fully supported	Yes : fully supported	Partially : bounded source only Using Spark's DatasourceV2 API in microbatch mode (Continuous streaming mode is tagged experimental in spark and does not support aggregation).	Yes : fully supported	Yes : fully supported	Yes : fully supported	Yes : fully supported
Partially Gauge metrics are not supported. All other metric types are supported.	Partially : All metrics types are supported. Only attempted values are supported. No committed values for metrics.	Partially : All metric types are supported. Only attempted values are supported. No committed values for metrics.	Partially : All metric types are supported in batch mode. Only attempted values are supported. No committed values for metrics.	Partially : Counter and Gauge are supported. Only attempted values are supported. No committed values for metrics.	No : not implemented	Partially : All metrics types supported, both in batching and streaming mode. Doesn't differentiate between committed and attempted values.	No : not implemented
Partially : non-merging windows State is supported for non-merging windows. The MapState, SetState, and MultimapState state types are supported in the following scenarios: Java pipelines that don't use Streaming Engine; Java pipelines that use Streaming Engine and version 2.58.0 or later of the Java SDK. SetState, MapState, and MultimapState are not supported for pipelines that use Runner v2.	Partially : non-merging windows State is supported for non-merging windows. SetState and MapState are not yet supported.	Partially : full support in batch mode	No : not implemented	Partially : non-merging windows States are backed up by either rocksDb KV store or in-memory hash map, and persist using changelog.	No : not implemented	Partially : non-merging windows	No : not implemented

Last updated on 2025/07/01

Have you found everything you were looking for?

Was it all useful and clear? Is there anything that you would like to change? Let us know!