Apache Beam 2.73.0

We are happy to present the new 2.73.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

For more information on changes in 2.73.0, check out the detailed release notes.

Highlights

I/Os

  • DebeziumIO (Java): added OffsetRetainer interface and FileSystemOffsetRetainer implementation to persist and restore CDC offsets across pipeline restarts, and exposed withStartOffset / withOffsetRetainer on DebeziumIO.Read and the cross-language ReadBuilder (#28248).

New Features / Improvements

  • (Python) Added BigQuery CDC streaming source (#37724)
  • Added ADKAgentModelHandler for running Google Agent Development Kit (ADK) agents (Python) (#37917).
  • (Python) Added exception chaining to preserve error context in CloudSQLEnrichmentHandler, processes utilities, and core transforms (#37422).
  • (Python) Added a pipeline option --experiments=pip_no_build_isolation to disable build isolation when installing dependencies in the runtime environment (#37331).
  • (Go) Added OrderedListState support to the Go SDK stateful DoFn API (#37629).
  • Added support for large pipeline options via a file (Python) (#37370).
  • Supported infer schema from dataclass (Python) (#22085). Default coder for typehint-ed (or set with_output_type) for non-frozen dataclasses changed to RowCoder. To preserve the old behavior (fast primitive coder), explicitly register the type with FastPrimitiveCoder.
  • Updates minimum Go version to 1.26.1 (#37897).
  • (Python) Added image embedding support in apache_beam.ml.rag package (#37628).
  • (Python) Added support for Python version 3.14 (#37247).

Breaking Changes

  • The Python SDK container’s boot.go now passes pipeline options through a file instead of the PIPELINE_OPTIONS environment variable. If a user pairs a new Python SDK container with an older SDK version (which does not support the file-based approach), the pipeline options will not be recognized and the pipeline will fail. Users must ensure their SDK and container versions are synchronized (#37370).
  • Python DoFn.with_exception_handling now respects user DoFn typehints. This can break update compatibility if coders change. It can also break pipeline compilation if existing typehints are incorrect. To update safely sepcify the pipeline option --update_compatibility_version=2.72.0. To fix typehints replace any incorrect typehints that were previously ignored (#37590)

Bugfixes

  • Fixed ProcessManager not reaping child processes, causing zombie process accumulation on long-running Flink deployments (Java) (#37930).

Security Fixes

List of Contributors

According to git shortlog, the following people contributed to the 2.73.0 release. Thank you to all contributors!

Abdelrahman Ibrahim, Ahmed Abualsaud, Alex Malao, Alexander Nieuwenhuijse, Andres Tiko, Andrew Crites, Arun Pandian, Bentsi Leviav, Bruno Volpato, Chamikara Jayalath, Chandra Kiran Bolla, Danny McCormick, Deji Ibrahim, Derrick Williams, Elia LIU, Esmelealem, Hannes Gustafsson, Jack McCluskey, Joey Tran, Kenneth Knowles, M Junaid Shaukat, Mansi Singh, Matej Aleksandrov, Mathijs Deelen, Mattie Fu, Praneet Nadella, Radek Stankiewicz, Radosław Stankiewicz, Reuven Lax, RuiLong J., S. Veyrié, Sakthivel Subramanian, Sam Whittle, Shubham Thakur, Shunping Huang, Subramanya V, Tarun Annapareddy, Tobias Kaymak, Valentyn Tymofieiev, Vitaly Terentyev, XQ Hu, Yi Hu, ZIHAN DAI, claudevdm, kishorepola, parveensania