blog & release
We are happy to present the new 2.52.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.
For more information on changes in 2.52.0, check out the detailed release notes.
- Previously deprecated Avro-dependent code (Beam Release 2.46.0) has been finally removed from Java SDK “core” package.
beam-sdks-java-extensions-avroinstead. This will allow to easily update Avro version in user code without potential breaking changes in Beam “core” since the Beam Avro extension already supports the latest Avro versions and should handle this. (#25252).
- Publishing Java 21 SDK container images now supported as part of Apache Beam release process. (#28120)
- Direct Runner and Dataflow Runner support running pipelines on Java21 (experimental until tests fully setup). For other runners (Flink, Spark, Samza, etc) support status depend on runner projects.
New Features / Improvements
UseDataStreamForBatchpipeline option to the Flink runner. When it is set to true, Flink runner will run batch jobs using the DataStream API. By default the option is set to false, so the batch jobs are still executed using the DataSet API.
upload_graphas one of the Experiments options for DataflowRunner is no longer required when the graph is larger than 10MB for Java SDK (PR#28621).
- state amd side input cache has been enabled to a default of 100 MB. Use
--max_cache_memory_usage_mb=Xto provide cache size for the user state API and side inputs. (Python) (#28770).
- Beam YAML stable release. Beam pipelines can now be written using YAML and leverage the Beam YAML framework which includes a preliminary set of IO’s and turnkey transforms. More information can be found in the YAML root folder and in the README.
CounterMarkCoderas a default coder since all Avro-dependent classes finally moved to
extensions/avro. In case if it’s still required to use
CounterMark, then, as a workaround, a copy of “old”
CountingSourceclass should be placed into a project code and used directly (#25252).
FirestoreOptionsto avoid potential conflict of command line arguments (Java) (#29201).
- Fixed “Desired bundle size 0 bytes must be greater than 0” in Java SDK’s BigtableIO.BigtableSource when you have more cores than bytes to read (Java) #28793.
watch_file_patternarg of the RunInference arg had no effect prior to 2.52.0. To use the behavior of arg
watch_file_patternprior to 2.52.0, follow the documentation at https://beam.apache.org/documentation/ml/side-input-updates/ and use
WatchFilePatternPTransform as a SideInput. (#28948)
MLTransformdoesn’t output artifacts such as min, max and quantiles. Instead,
MLTransformwill add a feature to output these artifacts as human readable format - #29017. For now, to use the artifacts such as min and max that were produced by the eariler
MLTransform, which reads artifacts that were produced earlier in a different
- Fixed a memory leak, which affected some long-running Python pipelines: #28246.
List of Contributors
According to git shortlog, the following people contributed to the 2.52.0 release. Thank you to all contributors!
Ferran Fernández Garrido
Hai Joey Tran
Steven van Rossum
pablo rodriguez defino