Apache Beam Blog

This is the blog for the Apache Beam project. This blog contains news and updates for the project.

Apache Beam 2.11.0

Mar 5, 2019 • Ahmet Altay

We are happy to present the new 2.11.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

Read more 

Apache Beam 2.10.0

Feb 15, 2019 • Kenneth Knowles [@KennKnowles]

We are happy to present the new 2.10.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

Read more 

Apache Beam 2.9.0

Dec 13, 2018 • Chamikara Jayalath

We are happy to present the new 2.9.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

Read more 

Inaugural edition of the Beam Summit Europe 2018 - aftermath

Oct 31, 2018 • Matthias Baetens [@matthiasbaetens]

Almost 1 month ago, we had the pleasure to welcome the Beam community at Level39 in London for the inaugural edition of the Beam Summit London Summit.

Read more 

Apache Beam 2.8.0

Oct 29, 2018 • Ahmet Altay

We are happy to present the new 2.8.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

Read more 

Apache Beam 2.7.0

Oct 3, 2018 • Charles Chen

We are happy to present the new 2.7.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

Read more 

Beam Summit Europe 2018

Aug 21, 2018 • Matthias Baetens [@matthiasbaetens]

With a growing community of contributors and users, the Apache Beam project is organising the first European Beam Summit.

We are happy to invite you to this event, which will take place in London on October 1st and 2nd of 2018.

Read more 

A review of input streaming connectors

Aug 20, 2018 • Leonid Kuligin [@lkulighin] & Julien Phalip [@julienphalip]

In this post, you’ll learn about the current state of support for input streaming connectors in Apache Beam. For more context, you’ll also learn about the corresponding state of support in Apache Spark.

Read more 

Apache Beam 2.6.0

Aug 10, 2018 • Pablo Estrada [@polecitoem] & Rafael Fernández

We are glad to present the new 2.6.0 release of Beam. This release includes multiple fixes and new functionality, such as new features in SQL and portability.

Read more 

Apache Beam 2.5.0

Jun 26, 2018 • Alexey Romanenko [@alexromdev]

We are glad to present the new 2.5.0 release of Beam. This release includes multiple fixes and new functionalities.

Read more 

Apache Beam 2.3.0

Feb 19, 2018 • Ismaël Mejía [@iemejia]

We are glad to present the new 2.3.0 release of Beam. This release includes multiple fixes and new functionalities.

Read more 

Apache Beam: A Look Back at 2017

Jan 9, 2018 • Anand Iyer & Jean-Baptiste Onofré [@jbonofre]

On January 10, 2017, Apache Beam got promoted as a Top-Level Apache Software Foundation project. It was an important milestone that validated the value of the project, legitimacy of its community, and heralded its growing adoption. In the past year, Apache Beam has been on a phenomenal growth trajectory, with significant growth in its community and feature set. Let us walk you through some of the notable achievements.

Read more 

Timely (and Stateful) Processing with Apache Beam

Aug 28, 2017 • Kenneth Knowles [@KennKnowles]

In a prior blog post, I introduced the basics of stateful processing in Apache Beam, focusing on the addition of state to per-element processing. So-called timely processing complements stateful processing in Beam by letting you set timers to request a (stateful) callback at some point in the future.

What can you do with timers in Beam? Here are some examples:

These are just a few possibilities. State and timers together form a powerful programming paradigm for fine-grained control to express a huge variety of workflows. Stateful and timely processing in Beam is portable across data processing engines and integrated with Beam’s unified model of event time windowing in both streaming and batch processing.

Read more 

Powerful and modular IO connectors with Splittable DoFn in Apache Beam

Aug 16, 2017 • Eugene Kirpichov

One of the most important parts of the Apache Beam ecosystem is its quickly growing set of connectors that allow Beam pipelines to read and write data to various data storage systems (“IOs”). Currently, Beam ships over 20 IO connectors with many more in active development. As user demands for IO connectors grew, our work on improving the related Beam APIs (in particular, the Source API) produced an unexpected result: a generalization of Beam’s most basic primitive, DoFn.

Read more 

Apache Beam publishes the first stable release

May 17, 2017 • Davor Bonaci [@BonaciDavor] & Dan Halperin

The Apache Beam community is pleased to announce the availability of version 2.0.0. This is the first stable release of Apache Beam, signifying a statement from the community that it intends to maintain API stability with all releases for the foreseeable future, and making Beam suitable for enterprise deployment.

Read more 

Python SDK released in Apache Beam 0.6.0

Mar 16, 2017 • Ahmet Altay

Apache Beam’s latest release, version 0.6.0, introduces a new SDK – this time, for the Python programming language. The Python SDK joins the Java SDK as the second implementation of the Beam programming model.

Read more 

Stateful processing with Apache Beam

Feb 13, 2017 • Kenneth Knowles [@KennKnowles]

Beam lets you process unbounded, out-of-order, global-scale data with portable high-level pipelines. Stateful processing is a new feature of the Beam model that expands the capabilities of Beam, unlocking new use cases and new efficiencies. In this post, I will guide you through stateful processing in Beam: how it works, how it fits in with the other features of the Beam model, what you might use it for, and what it looks like in code.

Read more 

Media recap of the Apache Beam graduation

Feb 1, 2017 • Davor Bonaci [@BonaciDavor]

One year ago today Apache Beam was accepted into incubation at the Apache Software Foundation. The community’s work over the past year culminated, just over three weeks ago, with an announcement that Apache Beam has successfully graduated as a new Top-Level Project at the foundation. Graduation sparked an additional interest in the project, from corporate endorsements, news articles, interviews, to the volume of traffic to our website and mailing lists.

Read more 

Apache Beam established as a new top-level project

Jan 10, 2017 • Davor Bonaci [@BonaciDavor]

Today, the Apache Software Foundation announced that Apache Beam has successfully graduated from incubation, becoming a new Top-Level Project at the foundation and signifying that its “community and products have been well-governed under the foundation’s meritocratic process and principles”.

Read more 

Release 0.4.0 adds a runner for Apache Apex

Jan 9, 2017 • Thomas Weise [@thweise]

The latest release 0.4.0 of Apache Beam adds a new runner for Apache Apex. We are excited to reach this initial milestone and are looking forward to continued collaboration between the Beam and Apex communities to advance the runner.

Read more 

Testing Unbounded Pipelines in Apache Beam

Oct 20, 2016 • Thomas Groh

The Beam Programming Model unifies writing pipelines for Batch and Streaming pipelines. We’ve recently introduced a new PTransform to write tests for pipelines that will be run over unbounded datasets and must handle out-of-order and delayed data.

Read more 

Strata+Hadoop World and Beam

Oct 11, 2016 • Jesse Anderson [@jessetanderson]

Tyler Akidau and I gave a three-hour tutorial on Apache Beam at Strata+Hadoop World 2016. We had a plethora of help from our TAs: Kenn Knowles, Reuven Lax, Felipe Hoffa, Slava Chernyak, and Jamie Grier. There were a total of 66 people that attended the session.

Read more 

Apache Beam: Six Months in Incubation

Aug 3, 2016 • Frances Perry [@francesjperry]

It’s been just over six months since Apache Beam was formally accepted into incubation with the Apache Software Foundation. As a community, we’ve been hard at work getting Beam off the ground.

Read more 

The first release of Apache Beam!

Jun 15, 2016 • Davor Bonaci [@BonaciDavor]

I’m happy to announce that Apache Beam has officially released its first version – 0.1.0-incubating. This is an exciting milestone for the project, which joined the Apache Software Foundation and the Apache Incubator earlier this year.

Read more 

Jun 13, 2016 • Aljoscha Krettek [@aljoscha]

We recently achieved a major milestone by adding support for windowing to the Apache Flink Batch runner. In this post we would like to explain what this means for users of Apache Beam and highlight some of the implementation details.

Read more 

Where’s my PCollection.map()?

May 27, 2016 • Robert Bradshaw

Have you ever wondered why Beam has PTransforms for everything instead of having methods on PCollection? Take a look at the history that led to this (and other) design decisions.

Read more 

Dynamic work rebalancing for Beam

May 18, 2016 • Dan Halperin

This morning, Eugene and Malo from the Google Cloud Dataflow team posted No shard left behind: dynamic work rebalancing in Google Cloud Dataflow. This article discusses Cloud Dataflow’s solution to the well-known straggler problem.

Read more 

Apache Beam Presentation Materials

Apr 3, 2016 • Frances Perry [@francesjperry] & Tyler Akidau [@takidau]

Are you interested in giving a presentation about Apache Beam? Perhaps you want to talk about Apache Beam at a local Meetup or a convention. Excellent! The Apache Beam community is excited to expand and grow the community. To help kickstart this process, we are excited to announce an initial set of Apache Beam presentation materials which anyone can use to give a presentation about Apache Beam.

Read more 

Clarifying & Formalizing Runner Capabilities

Mar 17, 2016 • Frances Perry [@francesjperry] & Tyler Akidau [@takidau]

With initial code drops complete (Dataflow SDK and Runner, Flink Runner, Spark Runner) and expressed interest in runner implementations for Storm, Hadoop, and Gearpump (amongst others), we wanted to start addressing a big question in the Apache Beam (incubating) community: what capabilities will each runner be able to support?

Read more 

Dataflow Python SDK is now public!

Feb 25, 2016 • James Malone [@chimerasaurus]

When the Apache Beam project proposed entry into the Apache Incubator the proposal included the Dataflow Java SDK. In the long term, however, Apache Beam aims to support SDKs implemented in multiple languages, such as Python.

Read more 

Feb 22, 2016 • James Malone [@chimerasaurus]

One of the major benefits of Apache Beam is the fact that it unifies both both batch and stream processing into one powerful model. In fact, this unification is so important, the name Beam itself comes from the union of Batch + strEAM = Beam

When the project started, we wanted a logo which was both appealing and visually represented this unification.

Read more