Additional common features not yet part of the Beam model
Drain |
---|
Checkpoint |
Key-ordered delivery |
Google Cloud Dataflow | Apache Flink | Apache Spark (RDD/DStream based) | Apache Spark Structured Streaming (Dataset based) | Apache Samza | Apache Nemo | Hazelcast Jet | Twister2 | Python Direct FnRunner | Go Direct Runner |
---|
Partially : Dataflow has a native drain operation, but it does not work in the presence of event time timer loops. Final implemention pending model support. | Partially : Flink supports taking a "savepoint" of the pipeline and shutting the pipeline down after its completion. | : | : | : | : | : | : | |
No : | Partially : Flink has a native savepoint capability. | Partially : Spark has a native savepoint capability. | No : not implemented | Partially : Samza has a native checkpoint capability. | : | : | : | |
Partially : Dataflow performs different shuffling algorithms for batch and streaming. Dataflow guarantees key-ordered delivery in streaming, though not in batch. | Partially : Flink may perform different shuffling algorithms for batch and streaming. Flink guarantees key-ordered delivery in streaming, though not in batch. | Unverified : | Unverified : | Unverified : | Partially : Samza may perform different shuffling algorithms for batch and streaming. Samza guarantees key-ordered delivery in streaming, though not in batch. | Unverified : | Unverified : | Unverified : |
Last updated on 2025/01/20
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!