blog & release
2024/08/06
Apache Beam 2.58.0Jack R. McCluskey [@jrmccluskey]
We are happy to present the new 2.58.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.
For more information about changes in 2.58.0, check out the detailed release notes.
I/Os
New Features / Improvements
- Multiple RunInference instances can now share the same model instance by setting the model_identifier parameter (Python) (#31665).
- Added options to control the number of Storage API multiplexing connections (#31721)
- [BigQueryIO] Better handling for batch Storage Write API when it hits AppendRows throughput quota (#31837)
- [IcebergIO] All specified catalog properties are passed through to the connector (#31726)
- Removed a third-party LGPL dependency from the Go SDK (#31765).
- Support for
MapState
andSetState
when using Dataflow Runner v1 with Streaming Engine (Java) ([#18200])
Breaking Changes
- [IcebergIO]
IcebergCatalogConfig
was changed to support specifying catalog properties in a key-store fashion (#31726) - [SpannerIO] Added validation that query and table cannot be specified at the same time for
SpannerIO.read()
. PreviouslywithQuery
overrideswithTable
, if set (#24956).
Bug fixes
- [BigQueryIO] Fixed a bug in batch Storage Write API that frequently exhausted concurrent connections quota (#31710)
Known Issues
- Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue (#32169). The issue will be fixed in 2.59.0 (#32135). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.
- [KafkaIO] Records read with
ReadFromKafkaViaSDF
are redistributed and may contain duplicates regardless of the configuration. This affects Java pipelines with Dataflow v2 runner and xlang pipelines reading from Kafka, (#32196) - BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform (#32780):
- Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
- Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
- Fixed in 2.61.0.
For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md
List of Contributors
According to git shortlog, the following people contributed to the 2.58.0 release. Thank you to all contributors!
Ahmed Abualsaud
Ahmet Altay
Alexandre Moueddene
Alexey Romanenko
Andrew Crites
Bartosz Zablocki
Celeste Zeng
Chamikara Jayalath
Clay Johnson
Damon Douglass
Danny McCormick
Dilnaz Amanzholova
Florian Bernard
Francis O’Hara
George Ma
Israel Herraiz
Jack McCluskey
Jaehyeon Kim
James Roseman
Kenneth Knowles
Maciej Szwaja
Michel Davit
Minh Son Nguyen
Naireen
Niel Markwick
Oliver Cardoza
Robert Bradshaw
Robert Burke
Rohit Sinha
S. VeyriƩ
Sam Whittle
Shunping Huang
Svetak Sundhar
TongruiLi
Tony Tang
Valentyn Tymofieiev
Vitaly Terentyev
Yi Hu