Apache Beam 2.17.0

We are happy to present the new 2.17.0 release of Beam. This release includes both improvements and new functionality. Users of the MongoDbIO connector are encouraged to upgrade to this release to address a security vulnerability.

See the download page for this release.

For more information on changes in 2.17.0, check out the detailed release notes.

Highlights

  • BEAM-7962 - Drop support for Flink 1.5 and 1.6
  • BEAM-7635 - Migrate SnsIO to AWS SDK for Java 2
  • Improved usability for portable Flink Runner
    • BEAM-8183 - Optionally bundle multiple pipelines into a single Flink jar.
    • BEAM-8372 - Allow submission of Flink UberJar directly to flink cluster.
    • BEAM-8471 - Flink native job submission for portable pipelines.
    • BEAM-8312 - Flink portable pipeline jars do not need to stage artifacts remotely.

New Features / Improvements

  • BEAM-7730 - Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9.
  • BEAM-7990 - Add ability to read parquet files into PCollection of pyarrow.Table.
  • BEAM-8355 - Make BooleanCoder a standard coder.
  • BEAM-8394 - Add withDataSourceConfiguration() method in JdbcIO.ReadRows class.
  • BEAM-5428 - Implement cross-bundle state caching.
  • BEAM-5967 - Add handling of DynamicMessage in ProtoCoder.
  • BEAM-7473 - Update RestrictionTracker within Python to not be required to be thread safe.
  • BEAM-7920 - Added AvroTableProvider to Beam SQL.
  • BEAM-8098 - Improve documentation on BigQueryIO.
  • BEAM-8100 - Add exception handling to Json transforms in Java SDK.
  • BEAM-8306 - Improve estimation of data byte size reading from source in ElasticsearchIO.
  • BEAM-8351 - Support passing in arbitrary KV pairs to sdk worker via external environment config.
  • BEAM-8396 - Default to LOOPBACK mode for local flink (spark, …) runner.
  • BEAM-8410 - JdbcIO should support setConnectionInitSqls in its DataSource.
  • BEAM-8609 - Add HllCount to Java transform catalog.
  • BEAM-8861 - Disallow self-signed certificates by default in ElasticsearchIO.

Dependency Changes

  • BEAM-8285 - Upgrade ZetaSQL to 2019.09.1.
  • BEAM-8392 - Upgrade pyarrow version bounds: 0.15.1<= to <0.16.0.
  • BEAM-5895 - Upgrade com.rabbitmq:amqp-client to 5.7.3.
  • BEAM-6896 - Upgrade PyYAML version bounds: 3.12<= to <6.0.0.

Bugfixes

  • [BEAM-8819] - AvroCoder for SpecificRecords is not serialized correctly since 2.13.0
  • Various bug fixes and performance improvements.

Known Issues

  • BEAM-8989 Apache Nemo runner broken due to backwards incompatible change since 2.16.0.

List of Contributors

According to git shortlog, the following people contributed to the 2.17.0 release. Thank you to all contributors!

Ahmet Altay, Alan Myrvold, Alexey Romanenko, Andre-Philippe Paquet, Andrew Pilloud, angulartist, Ankit Jhalaria, Ankur Goenka, Anton Kedin, Aryan Naraghi, Aurélien Geron, B M VISHWAS, Bartok Jozsef, Boyuan Zhang, Brian Hulette, Cerny Ondrej, Chad Dombrova, Chamikara Jayalath, ChethanU, cmach, Colm O hEigeartaigh, Cyrus Maden, Daniel Oliveira, Daniel Robert, Dante, David Cavazos, David Moravek, David Yan, Enrico Canzonieri, Etienne Chauchot, gxercavins, Hai Lu, Hannah Jiang, Ian Lance Taylor, Ismaël Mejía, Israel Herraiz, James Wen, Jan Lukavský, Jean-Baptiste Onofré, Jeff Klukas, jesusrv1103, Jofre, Kai Jiang, Kamil Wasilewski, Kasia Kucharczyk, Kenneth Knowles, Kirill Kozlov, kirillkozlov, Kohki YAMAGIWA, Kyle Weaver, Leonardo Alves Miguel, lloigor, lostluck, Luis Enrique Ortíz Ramirez, Luke Cwik, Mark Liu, Maximilian Michels, Michal Walenia, Mikhail Gryzykhin, mrociorg, Nicolas Delsaux, Ning Kang, NING KANG, Pablo Estrada, pabloem, Piotr Szczepanik, rahul8383, Rakesh Kumar, Renat Nasyrov, Reuven Lax, Robert Bradshaw, Robert Burke, Rui Wang, Ruslan Altynnikov, Ryan Skraba, Salman Raza, Saul Chavez, Sebastian Jambor, sunjincheng121, Tatu Saloranta, tchiarato, Thomas Weise, Tomo Suzuki, Tudor Marian, tvalentyn, Udi Meiri, Valentyn Tymofieiev, Viola Lyu, Vishwas, Yichi Zhang, Yifan Zou, Yueyang Qiu, Łukasz Gajowy