Portability Framework


Interoperability between SDKs and runners is a key aspect of Apache Beam. So far, however, the reality is that most runners support the Java SDK only, because each SDK-runner combination requires non-trivial work on both sides. All runners are also currently written in Java, which makes support of non-Java SDKs far more expensive. The portability framework aims to rectify this situation and provide full interoperability across the Beam ecosystem.

The portability framework introduces well-defined, language-neutral data structures and protocols between the SDK and runner. This interop layer – called the portability API – ensures that SDKs and runners can work with each other uniformly, reducing the interoperability burden for both SDKs and runners to a constant effort. It notably ensures that new SDKs automatically work with existing runners and vice versa. The framework introduces a new runner, the Universal Local Runner (ULR), as a practical reference implementation that complements the direct runners. Finally, it enables cross-language pipelines (sharing I/O or transformations across SDKs) and user-customized execution environments (“custom containers”).

The portability API consists of a set of smaller contracts that isolate SDKs and runners for job submission, management and execution. These contracts use protobufs and gRPC for broad language support.

The goal is that all (non-direct) runners and SDKs eventually support the portability API, perhaps exclusively.


The model protos contain all aspects of the portability API and is the truth on the ground. The proto definitions supercede any design documents. The main design documents are the following:

In discussion:


The portability framework is a substantial effort that touches every Beam component. In addition to the sheer magnitude, a major challenge is engineering an interop layer that does not significantly compromise performance due to the additional serialization overhead of a language-neutral protocol.


The proposed project phases are roughly as follows and are not strictly sequential, as various components will likely move at different speeds. Additionally, there have been (and continues to be) supporting refactorings that are not always tracked as part of the portability effort. Work already done is not tracked here either.


The portability effort touches every component, so the “portability” label is used to identify all portability-related issues. Pure design or proto definitions should use the “beam-model” component. A common pattern for new portability features is that the overall feature is in “beam-model” with subtasks for each SDK and runner in their respective components.

JIRA: query


MVP in progress (near completion for Flink runner). See the Portability support table for details.

The Flink runner is currently the only runner to support portable pipeline execution. To run a basic Python wordcount (in batch mode) with embedded Flink:

  1. Run once to build the SDK harness container: ./gradlew -p sdks/python/container docker
  2. Start the Flink portable JobService endpoint: ./gradlew :beam-runners-flink_2.11-job-server:runShadow
  3. Submit the wordcount pipeline to above endpoint: ./gradlew :beam-sdks-python:portableWordCount

To run on a separate Flink cluster:

  1. Start local Flink cluster
  2. Create shaded JobService jar: ./gradlew :beam-runners-flink_2.11-job-server:installShadowDist
  3. Start JobService with Flink web service endpoint: java -jar ./runners/flink/job-server/build/install/beam-runners-flink_2.11-job-server-shadow/lib/beam-runners-flink_2.11-job-server-*.jar "--job-host=localhost:8099" "--artifacts-dir=/tmp/flink-artifacts" "--flink-master-url=localhost:8081"
  4. Submit the pipeline.

Note: A subset of the functionality is also supported in streaming mode; use --streaming in the command line to enable it.