Apache Beam 2.24.0

We are happy to present the new 2.24.0 release of Apache Beam. This release includes both improvements and new functionality. See the download page for this release.

For more information on changes in 2.24.0, check out the detailed release notes.

Highlights

  • Apache Beam 2.24.0 is the last release with Python 2 and Python 3.5 support.

I/Os

  • New overloads for BigtableIO.Read.withKeyRange() and BigtableIO.Read.withRowFilter() methods that take ValueProvider as a parameter (Java) (BEAM-10283).
  • The WriteToBigQuery transform (Python) in Dataflow Batch no longer relies on BigQuerySink by default. It relies on a new, fully-featured transform based on file loads into BigQuery. To revert the behavior to the old implementation, you may use --experiments=use_legacy_bq_sink.
  • Add cross-language support to Java’s JdbcIO, now available in the Python module apache_beam.io.jdbc (BEAM-10135, BEAM-10136).
  • Add support of AWS SDK v2 for KinesisIO.Read (Java) (BEAM-9702).
  • Add streaming support to SnowflakeIO in Java SDK (BEAM-9896)
  • Support reading and writing to Google Healthcare DICOM APIs in Python SDK (BEAM-10601)
  • Add dispositions for SnowflakeIO.write (BEAM-10343)
  • Add cross-language support to SnowflakeIO.Read now available in the Python module apache_beam.io.external.snowflake (BEAM-9897).

New Features / Improvements

  • Shared library for simplifying management of large shared objects added to Python SDK. Example use case is sharing a large TF model object across threads (BEAM-10417).
  • Dataflow streaming timers are not strictly time ordered when set earlier mid-bundle (BEAM-8543).
  • OnTimerContext should not create a new one when processing each element/timer in FnApiDoFnRunner (BEAM-9839)
  • Key should be available in @OnTimer methods (Spark Runner) (BEAM-9850)

Breaking Changes

  • WriteToBigQuery transforms now require a GCS location to be provided through either custom_gcs_temp_location in the constructor of WriteToBigQuery or the fallback option –temp_location, or pass method=“STREAMING_INSERTS” to WriteToBigQuery (BEAM-6928).
  • Python SDK now understands typing.FrozenSet type hints, which are not interchangeable with typing.Set. You may need to update your pipelines if type checking fails. (BEAM-10197)

Known Issues

  • Default compressor change breaks dataflow python streaming job update compatibility. Please use python SDK version <= 2.23.0 or > 2.25.0 if job update is critical.(BEAM-11113)

List of Contributors

According to git shortlog, the following people contributed to the 2.24.0 release. Thank you to all contributors!

adesormi, Ahmet Altay, Alex Amato, Alexey Romanenko, Andrew Pilloud, Ashwin Ramaswami, Borzoo, Boyuan Zhang, Brian Hulette, Brian M, Bu Sun Kim, Chamikara Jayalath, Colm O hEigeartaigh, Corvin Deboeser, Damian Gadomski, Damon Douglas, Daniel Oliveira, Dariusz Aniszewski, davidak09, David Cavazos, David Moravek, David Yan, dhodun, Doug Roeper, Emil Hessman, Emily Ye, Etienne Chauchot, Etta Rapp, Eugene Kirpichov, fuyuwei, Gleb Kanterov, Harrison Green, Heejong Lee, Henry Suryawirawan, InigoSJ, Ismaël Mejía, Israel Herraiz, Jacob Ferriero, Jan Lukavský, Jayendra, jfarr, jhnmora000, Jiadai Xia, JIahao wu, Jie Fan, Jiyong Jung, Julius Almeida, Kamil Gałuszka, Kamil Wasilewski, Kasia Kucharczyk, Kenneth Knowles, Kevin Puthusseri, Kyle Weaver, Łukasz Gajowy, Luke Cwik, Mark-Zeng, Maximilian Michels, Michal Walenia, Niel Markwick, Ning Kang, Pablo Estrada, pawel.urbanowicz, Piotr Szuberski, Rafi Kamal, rarokni, Rehman Murad Ali, Reuben van Ammers, Reuven Lax, Ricardo Bordon, Robert Bradshaw, Robert Burke, Robin Qiu, Rui Wang, Saavan Nanavati, sabhyankar, Sam Rohde, Scott Lukas, Siddhartha Thota, Simone Primarosa, Sławomir Andrian, Steve Niemitz, Tobiasz Kędzierski, Tomo Suzuki, Tyson Hamilton, Udi Meiri, Valentyn Tymofieiev, viktorjonsson, Xinyu Liu, Yichi Zhang, Yixing Zhang, yoshiki.obata, Yueyang Qiu, zijiesong