Apache Beam 2.56.0

We are happy to present the new 2.56.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

For more information on changes in 2.56.0, check out the detailed release notes.

Highlights

  • Added FlinkRunner for Flink 1.17, removed support for Flink 1.12 and 1.13. Previous version of Pipeline running on Flink 1.16 and below can be upgraded to 1.17, if the Pipeline is first updated to Beam 2.56.0 with the same Flink version. After Pipeline runs with Beam 2.56.0, it should be possible to upgrade to FlinkRunner with Flink 1.17. (#29939)
  • New Managed I/O Java API (#30830).
  • New Ordered Processing PTransform added for processing order-sensitive stateful data (#30735).

I/Os

  • Upgraded Avro version to 1.11.3, kafka-avro-serializer and kafka-schema-registry-client versions to 7.6.0 (Java) (#30638). The newer Avro package is known to have breaking changes. If you are affected, you can keep pinned to older Avro versions which are also tested with Beam.
  • Iceberg read/write support is available through the new Managed I/O Java API (#30830).

New Features / Improvements

  • Profiling of Cythonized code has been disabled by default. This might improve performance for some Python pipelines (#30938).
  • Bigtable enrichment handler now accepts a custom function to build a composite row key. (Python) (#30974).

Breaking Changes

  • Default consumer polling timeout for KafkaIO.Read was increased from 1 second to 2 seconds. Use KafkaIO.read().withConsumerPollingTimeout(Duration duration) to configure this timeout value when necessary (#30870).
  • Python Dataflow users no longer need to manually specify –streaming for pipelines using unbounded sources such as ReadFromPubSub.

Bugfixes

  • Fixed locking issue when shutting down inactive bundle processors. Symptoms of this issue include slowness or stuckness in long-running jobs (Python) (#30679).
  • Fixed logging issue that caused silecing the pip output when installing of dependencies provided in --requirements_file (Python).

Known Issues

  • Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue (#32169). The issue will be fixed in 2.59.0 (#32135). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

List of Contributors

According to git shortlog, the following people contributed to the 2.56.0 release. Thank you to all contributors!

Abacn

Ahmed Abualsaud

Andrei Gurau

Andrey Devyatkin

Aravind Pedapudi

Arun Pandian

Arvind Ram

Bartosz Zablocki

Brachi Packter

Byron Ellis

Chamikara Jayalath

Clement DAL PALU

Damon

Danny McCormick

Daria Bezkorovaina

Dip Patel

Evan Burrell

Hai Joey Tran

Jack McCluskey

Jan Lukavský

JayajP

Jeff Kinard

Julien Tournay

Kenneth Knowles

Luís Bianchin

Maciej Szwaja

Melody Shen

Oleh Borysevych

Pablo Estrada

Rebecca Szper

Ritesh Ghorse

Robert Bradshaw

Sam Whittle

Sergei Lilichenko

Shahar Epstein

Shunping Huang

Svetak Sundhar

Timothy Itodo

Veronica Wasson

Vitaly Terentyev

Vlado Djerek

Yi Hu

akashorabek

bzablocki

clmccart

damccorm

dependabot[bot]

dmitryor

github-actions[bot]

liferoad

martin trieu

tvalentyn

xianhualiu