Apache Beam 2.59.0

We are happy to present the new 2.59.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

For more information on changes in 2.59.0, check out the detailed release notes.

Highlights

  • Added support for setting a configureable timeout when loading a model and performing inference in the RunInference transform using with_exception_handling (#32137)
  • Initial experimental support for using Prism with the Java and Python SDKs
    • Prism is presently targeting local testing usage, or other small scale execution.
    • For Java, use ‘PrismRunner’, or ‘TestPrismRunner’ as an argument to the --runner flag.
    • For Python, use ‘PrismRunner’ as an argument to the --runner flag.
    • Go already uses Prism as the default local runner.

I/Os

  • Improvements to the performance of BigqueryIO when using withPropagateSuccessfulStorageApiWrites(true) method (Java) (#31840).
  • [Managed Iceberg] Added support for writing to partitioned tables (#32102)
  • Update ClickHouseIO to use the latest version of the ClickHouse JDBC driver (#32228).
  • Add ClickHouseIO dedicated User-Agent (#32252).

New Features / Improvements

  • BigQuery endpoint can be overridden via PipelineOptions, this enables BigQuery emulators (Java) (#28149).
  • Go SDK Minimum Go Version updated to 1.21 (#32092).
  • [BigQueryIO] Added support for withFormatRecordOnFailureFunction() for STORAGE_WRITE_API and STORAGE_API_AT_LEAST_ONCE methods (Java) (#31354).
  • Updated Go protobuf package to new version (Go) (#21515).
  • Added support for setting a configureable timeout when loading a model and performing inference in the RunInference transform using with_exception_handling (#32137)
  • Adds OrderedListState support for Java SDK via FnApi.
  • Initial support for using Prism from the Python and Java SDKs.

Bugfixes

  • Fixed incorrect service account impersonation flow for Python pipelines using BigQuery IOs (#32030).
  • Auto-disable broken and meaningless upload_graph feature when using Dataflow Runner V2 (#32159).
  • (Python) Upgraded google-cloud-storage to version 2.18.2 to fix a data corruption issue (#32135).
  • (Go) Fix corruption on State API writes. (#32245).

Known Issues

  • Prism is under active development and does not yet support all pipelines. See #29650 for progress.
    • In the 2.59.0 release, Prism passes most runner validations tests with the exceptions of pipelines using the following features: OrderedListState, OnWindowExpiry (eg. GroupIntoBatches), CustomWindows, MergingWindowFns, Trigger and WindowingStrategy associated features, Bundle Finalization, Looping Timers, and some Coder related issues such as with Python combiner packing, and Java Schema transforms, and heterogenous flatten coders. Processing Time timers do not yet have real time support.
    • If your pipeline is having difficulty with the Python or Java direct runners, but runs well on Prism, please let us know.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

List of Contributors

According to git shortlog, the following people contributed to the 2.59.0 release. Thank you to all contributors!

Ahmed Abualsaud,Ahmet Altay,Andrew Crites,atask-g,Axel Magnuson,Ayush Pandey,Bartosz Zablocki,Chamikara Jayalath,cutiepie-10,Damon,Danny McCormick,dependabot[bot],Eddie Phillips,Francis O’Hara,Hyeonho Kim,Israel Herraiz,Jack McCluskey,Jaehyeon Kim,Jan Lukavský,Jeff Kinard,Jeffrey Kinard,jonathan-lemos,jrmccluskey,Kirill Berezin,Kiruphasankaran Nataraj,lahariguduru,liferoad,lostluck,Maciej Szwaja,Manit Gupta,Mark Zitnik,martin trieu,Naireen Hussain,Prerit Chandok,Radosław Stankiewicz,Rebecca Szper,Robert Bradshaw,Robert Burke,ron-gal,Sam Whittle,Sergei Lilichenko,Shunping Huang,Svetak Sundhar,Thiago Nunes,Timothy Itodo,tvalentyn,twosom,Vatsal,Vitaly Terentyev,Vlado Djerek,Yifan Ye,Yi Hu