blog & release
We are happy to present the new 2.46.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.
For more information on changes in 2.46.0, check out the detailed release notes.
- Java SDK containers migrated to Eclipse Temurin as a base. This change migrates away from the deprecated OpenJDK container. Eclipse Temurin is currently based upon Ubuntu 22.04 while the OpenJDK container was based upon Debian 11.
- RunInference PTransform will accept model paths as SideInputs in Python SDK. (#24042)
- RunInference supports ONNX runtime in Python SDK (#22972)
- Tensorflow Model Handler for RunInference in Python SDK (#25366)
- Java SDK modules migrated to use
- Added in JmsIO a retry policy for failed publications (Java) (#24971).
- Support for
LZMAcompression/decompression of text files added to the Python SDK (#25316)
- Added ReadFrom/WriteTo Csv/Json as top-level transforms to the Python SDK.
New Features / Improvements
- Add UDF metrics support for Samza portable mode.
- Option for SparkRunner to avoid the need of SDF output to fit in memory (#23852).
This helps e.g. with ParquetIO reads. Turn the feature on by adding experiment
WatchFilePatterntransform, which can be used as a side input to the RunInference PTransfrom to watch for model updates using a file pattern. (#24042)
- Add support for loading TorchScript models with
PytorchModelHandler. The TorchScript model path can be passed to PytorchModelHandler using
- The Go SDK now requires Go 1.19 to build. (#25545)
- The Go SDK now has an initial native Go implementation of a portable Beam Runner called Prism. (#24789)
- For more details and current state see https://github.com/apache/beam/tree/master/sdks/go/pkg/beam/runners/prism.
- The deprecated SparkRunner for Spark 2 (see 2.41.0) was removed (#25263).
- Python’s BatchElements performs more aggressive batching in some cases,
capping at 10 second rather than 1 second batches by default and excluding
fixed cost in this computation to better handle cases where the fixed cost
is larger than a single second. To get the old behavior, one can pass
- Avro related classes are deprecated in module
beam-sdks-java-coreand will be eventually removed. Please, migrate to a new module
beam-sdks-java-extensions-avroinstead by importing the classes from
org.apache.beam.sdk.extensions.avropackage. For the sake of migration simplicity, the relative package path and the whole class hierarchy of Avro related classes in new module is preserved the same as it was before. For example, import
org.apache.beam.sdk.extensions.avro.coders.AvroCoderclass instead of
List of Contributors
According to git shortlog, the following people contributed to the 2.46.0 release. Thank you to all contributors!
Amrane Ait Zeouay
Egbert van der Wal
William Ross Morrow
pablo rodriguez defino
Latest from the blog
Behind the Scenes: Crafting an Autoscaler for Apache Beam in a High-Volume Streaming Environment
blog & release
Apache Beam 2.53.0
Jack R. McCluskey
Scaling a streaming workload on Apache Beam, 1 million events per second and beyond
Pablo Rodriguez Defino