Connector-related efforts that will benefit multiple SDKs.
Splittable DoFn is the next generation sources framework for Beam that will replace current frameworks for developing bounded and unbounded sources. Splittable DoFn is being developed along side current Beam portability efforts. See Beam portability framework roadmap for more details.
Last updated on May 2020.
As an added benefit of Beam portability effort, we are able to utilize Beam transforms across SDKs. This has many benefits.
- Connector sharing across SDKs. For example,
- Beam pipelines written using Python and Go SDKs will be able to utilize the vast selection of connectors that are currently implemented for Java SDK.
- Java SDK will be able to utilize connectors for systems that only offer a Python API.
- Go SDK, will be able to utilize connectors currently available for Java and Python SDKs.
- Ease of developing and maintaining Beam transforms - in general, with cross-language transforms, Beam transform authors will be able to implement new Beam transforms using a language of choice and utilize these transforms from other languages reducing the maintenance and support overheads.
- Beam SQL, that is currently only available to Java SDK, will become available to Python and Go SDKs.
- Beam TFX transforms, that are currently only available to Beam Python SDK pipelines will become available to Java and Go SDKs.
Completed and Ongoing Efforts
Many efforts related to cross-language transforms are currently in flux. Some of the completed and ongoing efforts are given below.
Cross-language transforms API and expansion service
Work related to developing/updating the cross-language transforms API for Java/Python/Go SDKs and work related to cross-language transform expansion services.
- Basic API for Java SDK - completed
- Basic API for Python SDK - completed
- Basic API for Go SDK - In progress
- Basic cross-language transform expansion service for Java and Python SDKs - completed
- Artifact staging - mostly completed - email thread, doc
Support for Flink runner
Work related to making cross-language transforms available for Flink runner.
- Basic support for executing cross-language transforms on portable Flink runner - completed
Support for Dataflow runner
Work related to making cross-language transforms available for Dataflow runner.
- Basic support for executing cross-language transforms on Dataflow runner
- This work requires updates to Dataflow service’s job submission and job execution logic. This is currently being developed at Google.
Support for Direct runner
Work related to making cross-language transforms available on Direct runner
- Basic support for executing cross-language transforms on Pyton Direct runner - completed
- Basic support for executing cross-language transforms on Java Direct runner - Not started
Ongoing and planned work related to making existing connectors/transforms available to other SDKs through the cross-language transforms framework.
- Java JdbcIO - completed - BEAM-10135, BEAM-10136
- Java KafkaIO - completed - BEAM-7029
- Java KinesisIO - completed - BEAM-10137, BEAM-10138
- Java PubSubIO - In progress - BEAM-7738
- Java SnowflakeIO - completed - BEAM-9897, BEAM-9898
- Java SpannerIO - In progress - BEAM-10139, BEAM-10140
- Java SQL - completed - BEAM-8603
Portable Beam schema
Portable Beam schema support will provide a generalized mechanism for serializing and transferring data across language boundaries which will be extremely useful for pipelines that employ cross-language transforms.
- Make row coder a standard coder and implement in python - completed - BEAM-7886
- Add an integration test suite for cross-language transforms on Flink runner - In progress - BEAM-6683
Work related to adding documenting on cross-language transforms to Beam Website.
- Document cross-language transforms API for Java/Python - Not started
- Document API for making existing transforms available as cross-language transforms for Java/Python - Not started