blog & release
2025/02/18
Apache Beam 2.63.0Jack R. McCluskey [@jrmccluskey]
We are happy to present the new 2.63.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.
For more information on changes in 2.63.0, check out the detailed release notes.
I/Os
- Support gcs-connector 3.x+ in GcsUtil (#33368)
- Support for X source added (Java/Python) (#X).
- Introduced
--groupFilesFileLoad
pipeline option to mitigate side-input related issues in BigQueryIO batch FILE_LOAD on certain runners (including Dataflow Runner V2) (Java) (#33587).
New Features / Improvements
- Add BigQuery vector/embedding ingestion and enrichment components to apache_beam.ml.rag (Python) (#33413).
- Upgraded to protobuf 4 (Java) (#33192).
- [GCSIO] Added retry logic to each batch method of the GCS IO (Python) (#33539)
- [GCSIO] Enable recursive deletion for GCSFileSystem Paths (Python) (#33611).
- External, Process based Worker Pool support added to the Go SDK container. (#33572)
- This is used to enable sidecar containers to run SDK workers for some runners.
- See https://beam.apache.org/documentation/runtime/sdk-harness-config/ for details.
- Support the Process Environment for execution in the Go SDK. (#33651)
- Prism
- Prism now uses the same single port for both pipeline submission and execution on workers. Requests are differentiated by worker-id. (#33438)
- This avoids port starvation and provides clarity on port use when running Prism in non-local environments.
- Support for @RequiresTimeSortedInputs added. (#33513)
- Initial support for AllowedLateness added. (#33542)
- The Go SDK’s inprocess Prism runner (AKA the Go SDK default runner) now supports non-loopback mode environment types. (#33572)
- Support the Process Environment for execution in Prism (#33651)
- Support the AnyOf Environment for execution in Prism (#33705)
- This improves support for developing Xlang pipelines, when using a compatible cross language service.
- Prism now uses the same single port for both pipeline submission and execution on workers. Requests are differentiated by worker-id. (#33438)
- Partitions are now configurable for the DaskRunner in the Python SDK (#33805).
- [Dataflow Streaming] Enable Windmill GetWork Response Batching by default (#33847).
- With this change user workers will request batched GetWork responses from backend and backend will send multiple WorkItems in the same response proto.
- The feature can be disabled by passing
--windmillRequestBatchedGetWorkResponse=false
Breaking Changes
- AWS V1 I/Os have been removed (Java). As part of this, x-lang Python Kinesis I/O has been updated to consume the V2 IO and it also no longer supports setting producer_properties (#33430).
- Upgraded to protobuf 4 (Java) (#33192), but forced Debezium IO to use protobuf 3 (#33541 because Debezium clients are not protobuf 4 compatible. This may cause conflicts when using clients which are only compatible with protobuf 4.
- Minimum Go version for Beam Go updated to 1.22.10 (#33609)
Bugfixes
- Fix data loss issues when reading gzipped files with TextIO (Python) (#18390, #31040).
- [BigQueryIO] Fixed an issue where Storage Write API sometimes doesn’t pick up auto-schema updates (#33231)
- Prism
- [Dataflow Streaming Appliance] Fixed commits failing with KeyCommitTooLargeException when a key outputs >180MB of results. #33588.
- Fixed a Dataflow template creation issue that ignores template file creation errors (Java) (#33636)
- Correctly documented Pane Encodings in the portability protocols (#33840).
- Fixed the user mailing list address (#26013).
- [Dataflow Streaming] Fixed an issue where Dataflow Streaming workers were reporting lineage metrics as cumulative rather than delta. (#33691)
List of Contributors
According to git shortlog, the following people contributed to the 2.62.0 release. Thank you to all contributors!
Ahmed Abualsaud, Alex Merose, Andrej Galad, Andrew Crites, Arun Pandian, Bartosz Zablocki, Chamikara Jayalath, Claire McGinty, Clay Johnson, Damon Douglas, Danish Amjad, Danny McCormick, Deep1998, Derrick Williams, Dmitry Labutin, Dmytro Sadovnychyi, Eduardo Ramírez, Filipe Regadas, Hai Joey Tran, Jack McCluskey, Jan Lukavský, Jeff Kinard, Jozef Vilcek, Julien Tournay, Kenneth Knowles, Michel Davit, Miguel Trigueira, Minbo Bae, Mohamed Awnallah, Mohit Paddhariya, Nahian-Al Hasan, Naireen Hussain, Niall Pemberton, Radosław Stankiewicz, Razvan Culea, Robert Bradshaw, Robert Burke, Rohit Sinha, S. Veyrié, Sam Whittle, Sergei Lilichenko, Shingo Furuyama, Shunping Huang, Thiago Nunes, Tim Heckman, Tobias Bredow, Tom Stepp, Tony Tang, VISHESH TRIPATHI, Vitaly Terentyev, Yi Hu, XQ Hu, akashorabek, claudevdm