Apache Beam 2.50.0

We are happy to present the new 2.50.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

For more information on changes in 2.50.0, check out the detailed release notes.

Highlights

  • Spark 3.2.2 is used as default version for Spark runner (#23804).
  • The Go SDK has a new default local runner, called Prism (#24789).
  • All Beam released container images are now multi-arch images that support both x86 and ARM CPU architectures.

I/Os

  • Java KafkaIO now supports picking up topics via topicPattern (#26948)
  • Support for read from Cosmos DB Core SQL API (#23604)
  • Upgraded to HBase 2.5.5 for HBaseIO. (Java) (#27711)
  • Added support for GoogleAdsIO source (Java) (#27681).

New Features / Improvements

  • The Go SDK now requires Go 1.20 to build. (#27558)
  • The Go SDK has a new default local runner, Prism. (#24789).
  • Hugging Face Model Handler for RunInference added to Python SDK. (#26632)
  • Hugging Face Pipelines support for RunInference added to Python SDK. (#27399)
  • Vertex AI Model Handler for RunInference now supports private endpoints (#27696)
  • MLTransform transform added with support for common ML pre/postprocessing operations (#26795)
  • Upgraded the Kryo extension for the Java SDK to Kryo 5.5.0. This brings in bug fixes, performance improvements, and serialization of Java 14 records. (#27635)
  • All Beam released container images are now multi-arch images that support both x86 and ARM CPU architectures. (#27674). The multi-arch container images include:
    • All versions of Go, Python, Java and Typescript SDK containers.
    • All versions of Flink job server containers.
    • Java and Python expansion service containers.
    • Transform service controller container.
    • Spark3 job server container.
  • Added support for batched writes to AWS SQS for improved throughput (Java, AWS 2).(#21429)

Breaking Changes

  • Python SDK: Legacy runner support removed from Dataflow, all pipelines must use runner v2.
  • Python SDK: Dataflow Runner will no longer stage Beam SDK from PyPI in the --staging_location at pipeline submission. Custom container images that are not based on Beam’s default image must include Apache Beam installation.(#26996)

Deprecations

  • The Go Direct Runner is now Deprecated. It remains available to reduce migration churn.
    • Tests can be set back to the direct runner by overriding TestMain: func TestMain(m *testing.M) { ptest.MainWithDefault(m, "direct") }
    • It’s recommended to fix issues seen in tests using Prism, as they can also happen on any portable runner.
    • Use the generic register package for your pipeline DoFns to ensure pipelines function on portable runners, like prism.
    • Do not rely on closures or using package globals for DoFn configuration. They don’t function on portable runners.

Bugfixes

Known Issues

  • Long-running Python pipelines might experience a memory leak: #28246.
  • Python Pipelines using BigQuery IO or orjson dependency might experience segmentation faults or get stuck: #28318.
  • Python SDK’s cross-language Bigtable sink mishandles records that don’t have an explicit timestamp set: #28632. To avoid this issue, set explicit timestamps for all records before writing to Bigtable.

List of Contributors

According to git shortlog, the following people contributed to the 2.50.0 release. Thank you to all contributors!

Abacn

acejune

AdalbertMemSQL

ahmedabu98

Ahmed Abualsaud

al97

Aleksandr Dudko

Alexey Romanenko

Anand Inguva

Andrey Devyatkin

Anton Shalkovich

ArjunGHUB

Bjorn Pedersen

BjornPrime

Brett Morgan

Bruno Volpato

Buqian Zheng

Burke Davison

Byron Ellis

bzablocki

case-k

Celeste Zeng

Chamikara Jayalath

Clay Johnson

Connor Brett

Damon

Damon Douglas

Dan Hansen

Danny McCormick

Darkhan Nausharipov

Dip Patel

Dmytro Sadovnychyi

Florent Biville

Gabriel Lacroix

Hai Joey Tran

Hong Liang Teoh

Jack McCluskey

James Fricker

Jeff Kinard

Jeff Zhang

Jing

johnjcasey

jon esperanza

Josef Šimánek

Kenneth Knowles

Laksh

Liam Miller-Cushon

liferoad

magicgoody

Mahmud Ridwan

Manav Garg

Marco Vela

martin trieu

Mattie Fu

Michel Davit

Moritz Mack

mosche

Peter Sobot

Pranav Bhandari

Reeba Qureshi

Reuven Lax

Ritesh Ghorse

Robert Bradshaw

Robert Burke

RyuSA

Saba Sathya

Sam Whittle

Steven Niemitz

Steven van Rossum

Svetak Sundhar

Tony Tang

Valentyn Tymofieiev

Vitaly Terentyev

Vlado Djerek

Yichi Zhang

Yi Hu

Zechen Jiang