Apache Beam Blog

This is the blog for the Apache Beam project. This blog contains news and updates for the project.

Aug 28, 2017 • Kenneth Knowles [@KennKnowles]

In a prior blog post, I introduced the basics of stateful processing in Apache Beam, focusing on the addition of state to per-element processing. So-called timely processing complements stateful processing in Beam by letting you set timers to request a (stateful) callback at some point in the future.

What can you do with timers in Beam? Here are some examples:

These are just a few possibilities. State and timers together form a powerful programming paradigm for fine-grained control to express a huge variety of workflows. Stateful and timely processing in Beam is portable across data processing engines and integrated with Beam’s unified model of event time windowing in both streaming and batch processing.

Read more 


Aug 16, 2017 • Eugene Kirpichov

One of the most important parts of the Apache Beam ecosystem is its quickly growing set of connectors that allow Beam pipelines to read and write data to various data storage systems (“IOs”). Currently, Beam ships over 20 IO connectors with many more in active development. As user demands for IO connectors grew, our work on improving the related Beam APIs (in particular, the Source API) produced an unexpected result: a generalization of Beam’s most basic primitive, DoFn.

Read more 


May 17, 2017 • Davor Bonaci [@BonaciDavor] & Dan Halperin

The Apache Beam community is pleased to announce the availability of version 2.0.0. This is the first stable release of Apache Beam, signifying a statement from the community that it intends to maintain API stability with all releases for the foreseeable future, and making Beam suitable for enterprise deployment.

Read more 


Mar 16, 2017 • Ahmet Altay

Apache Beam’s latest release, version 0.6.0, introduces a new SDK – this time, for the Python programming language. The Python SDK joins the Java SDK as the second implementation of the Beam programming model.

Read more 


Feb 13, 2017 • Kenneth Knowles [@KennKnowles]

Beam lets you process unbounded, out-of-order, global-scale data with portable high-level pipelines. Stateful processing is a new feature of the Beam model that expands the capabilities of Beam, unlocking new use cases and new efficiencies. In this post, I will guide you through stateful processing in Beam: how it works, how it fits in with the other features of the Beam model, what you might use it for, and what it looks like in code.

Read more 


Feb 1, 2017 • Davor Bonaci [@BonaciDavor]

One year ago today Apache Beam was accepted into incubation at the Apache Software Foundation. The community’s work over the past year culminated, just over three weeks ago, with an announcement that Apache Beam has successfully graduated as a new Top-Level Project at the foundation. Graduation sparked an additional interest in the project, from corporate endorsements, news articles, interviews, to the volume of traffic to our website and mailing lists.

Read more 


Jan 10, 2017 • Davor Bonaci [@BonaciDavor]

Today, the Apache Software Foundation announced that Apache Beam has successfully graduated from incubation, becoming a new Top-Level Project at the foundation and signifying that its “community and products have been well-governed under the foundation’s meritocratic process and principles”.

Read more 


Jan 9, 2017 • Thomas Weise [@thweise]

The latest release 0.4.0 of Apache Beam adds a new runner for Apache Apex. We are excited to reach this initial milestone and are looking forward to continued collaboration between the Beam and Apex communities to advance the runner.

Read more 


Oct 20, 2016 • Thomas Groh

The Beam Programming Model unifies writing pipelines for Batch and Streaming pipelines. We’ve recently introduced a new PTransform to write tests for pipelines that will be run over unbounded datasets and must handle out-of-order and delayed data.

Read more 


Oct 11, 2016 • Jesse Anderson [@jessetanderson]

Tyler Akidau and I gave a three-hour tutorial on Apache Beam at Strata+Hadoop World 2016. We had a plethora of help from our TAs: Kenn Knowles, Reuven Lax, Felipe Hoffa, Slava Chernyak, and Jamie Grier. There were a total of 66 people that attended the session.

Read more 


Aug 3, 2016 • Frances Perry [@francesjperry]

It’s been just over six months since Apache Beam was formally accepted into incubation with the Apache Software Foundation. As a community, we’ve been hard at work getting Beam off the ground.

Read more 


Jun 15, 2016 • Davor Bonaci [@BonaciDavor]

I’m happy to announce that Apache Beam has officially released its first version – 0.1.0-incubating. This is an exciting milestone for the project, which joined the Apache Software Foundation and the Apache Incubator earlier this year.

Read more 


Jun 13, 2016 • Aljoscha Krettek [@aljoscha]

We recently achieved a major milestone by adding support for windowing to the Apache Flink Batch runner. In this post we would like to explain what this means for users of Apache Beam and highlight some of the implementation details.

Read more 


May 27, 2016 • Robert Bradshaw

Have you ever wondered why Beam has PTransforms for everything instead of having methods on PCollection? Take a look at the history that led to this (and other) design decisions.

Read more 


May 18, 2016 • Dan Halperin

This morning, Eugene and Malo from the Google Cloud Dataflow team posted No shard left behind: dynamic work rebalancing in Google Cloud Dataflow. This article discusses Cloud Dataflow’s solution to the well-known straggler problem.

Read more 


Apr 3, 2016 • Frances Perry [@francesjperry] & Tyler Akidau [@takidau]

Are you interested in giving a presentation about Apache Beam? Perhaps you want to talk about Apache Beam at a local Meetup or a convention. Excellent! The Apache Beam community is excited to expand and grow the community. To help kickstart this process, we are excited to announce an initial set of Apache Beam presentation materials which anyone can use to give a presentation about Apache Beam.

Read more 


Mar 17, 2016 • Frances Perry [@francesjperry] & Tyler Akidau [@takidau]

With initial code drops complete (Dataflow SDK and Runner, Flink Runner, Spark Runner) and expressed interest in runner implementations for Storm, Hadoop, and Gearpump (amongst others), we wanted to start addressing a big question in the Apache Beam (incubating) community: what capabilities will each runner be able to support?

Read more 


Feb 25, 2016 • James Malone [@chimerasaurus]

When the Apache Beam project proposed entry into the Apache Incubator the proposal included the Dataflow Java SDK. In the long term, however, Apache Beam aims to support SDKs implemented in multiple languages, such as Python.

Read more 


Feb 22, 2016 • James Malone [@chimerasaurus]

One of the major benefits of Apache Beam is the fact that it unifies both both batch and stream processing into one powerful model. In fact, this unification is so important, the name Beam itself comes from the union of Batch + strEAM = Beam

When the project started, we wanted a logo which was both appealing and visually represented this unification.

Read more