blog & release
We are happy to present the new 2.47.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.
For more information on changes in 2.47.0, check out the detailed release notes.
- Apache Beam adds Python 3.11 support (#23848).
- BigQuery Storage Write API is now available in Python SDK via cross-language (#21961).
- Added HbaseIO support for writing RowMutations (ordered by rowkey) to Hbase (Java) (#25830).
- Added fileio transforms MatchFiles, MatchAll and ReadMatches (Go) (#25779).
- Add integration test for JmsIO + fix issue with multiple connections (Java) (#25887).
New Features / Improvements
- The Flink runner now supports Flink 1.16.x (#25046).
- Schema’d PTransforms can now be directly applied to Beam dataframes just like PCollections.
(Note that when doing multiple operations, it may be more efficient to explicitly chain the operations
df | (Transform1 | Transform2 | ...)to avoid excessive conversions.)
- The Go SDK adds new transforms periodic.Impulse and periodic.Sequence that extends support for slowly updating side input patterns. (#23106)
- Several Google client libraries in Python SDK dependency chain were updated to latest available major versions. (#24599)
- If a main session fails to load, the pipeline will now fail at worker startup. (#25401).
- Python pipeline options will now ignore unparsed command line flags prefixed with a single dash. (#25943).
- The SmallestPerKey combiner now requires keyword-only arguments for specifying optional parameters, such as
- Cloud Debugger support and its pipeline options are deprecated and will be removed in the next Beam version, in response to the Google Cloud Debugger service turning down. (Java) (#25959).
- BigQuery sink in STORAGE_WRITE_API mode in batch pipelines might result in data consistency issues during the handling of other unrelated transient errors for Beam SDKs 2.35.0 - 2.46.0 (inclusive). For more details see: https://github.com/apache/beam/issues/26521
- BigQueryIO Storage API write with autoUpdateSchema may cause data corruption for Beam SDKs 2.45.0 - 2.47.0 (inclusive) (#26789)
- Long-running Python pipelines might experience a memory leak: #28246.
List of Contributors
According to git shortlog, the following people contributed to the 2.47.0 release. Thank you to all contributors!
Amrane Ait Zeouay
Jasper Van den Bossche
Jiangjie (Becket) Qin