This is the blog for the Apache Beam project. This blog contains news and updates for the project.
Aug 28, 2017 • Kenneth Knowles [@KennKnowles]
In a prior blog post, I introduced the basics of stateful processing in Apache Beam, focusing on the addition of state to per-element processing. So-called timely processing complements stateful processing in Beam by letting you set timers to request a (stateful) callback at some point in the future.
What can you do with timers in Beam? Here are some examples:
These are just a few possibilities. State and timers together form a powerful programming paradigm for fine-grained control to express a huge variety of workflows. Stateful and timely processing in Beam is portable across data processing engines and integrated with Beam’s unified model of event time windowing in both streaming and batch processing.
Aug 16, 2017 • Eugene Kirpichov
One of the most important parts of the Apache Beam ecosystem is its quickly
growing set of connectors that allow Beam pipelines to read and write data to
various data storage systems (“IOs”). Currently, Beam ships over 20 IO
connectors with many more in
active development. As user demands for IO connectors grew, our work on
improving the related Beam APIs (in particular, the Source API) produced an
unexpected result: a generalization of Beam’s most basic primitive,
May 17, 2017 • Davor Bonaci [@BonaciDavor] & Dan Halperin
The Apache Beam community is pleased to announce the availability of version 2.0.0. This is the first stable release of Apache Beam, signifying a statement from the community that it intends to maintain API stability with all releases for the foreseeable future, and making Beam suitable for enterprise deployment.
Mar 16, 2017 • Ahmet Altay
Apache Beam’s latest release, version 0.6.0, introduces a new SDK – this time, for the Python programming language. The Python SDK joins the Java SDK as the second implementation of the Beam programming model.
Feb 13, 2017 • Kenneth Knowles [@KennKnowles]
Beam lets you process unbounded, out-of-order, global-scale data with portable high-level pipelines. Stateful processing is a new feature of the Beam model that expands the capabilities of Beam, unlocking new use cases and new efficiencies. In this post, I will guide you through stateful processing in Beam: how it works, how it fits in with the other features of the Beam model, what you might use it for, and what it looks like in code.
Feb 1, 2017 • Davor Bonaci [@BonaciDavor]
One year ago today Apache Beam was accepted into incubation at the Apache Software Foundation. The community’s work over the past year culminated, just over three weeks ago, with an announcement that Apache Beam has successfully graduated as a new Top-Level Project at the foundation. Graduation sparked an additional interest in the project, from corporate endorsements, news articles, interviews, to the volume of traffic to our website and mailing lists.
Jan 10, 2017 • Davor Bonaci [@BonaciDavor]
Today, the Apache Software Foundation announced that Apache Beam has successfully graduated from incubation, becoming a new Top-Level Project at the foundation and signifying that its “community and products have been well-governed under the foundation’s meritocratic process and principles”.
Jan 9, 2017 • Thomas Weise [@thweise]
The latest release 0.4.0 of Apache Beam adds a new runner for Apache Apex. We are excited to reach this initial milestone and are looking forward to continued collaboration between the Beam and Apex communities to advance the runner.
Oct 20, 2016 • Thomas Groh
The Beam Programming Model unifies writing pipelines for Batch and Streaming pipelines. We’ve recently introduced a new PTransform to write tests for pipelines that will be run over unbounded datasets and must handle out-of-order and delayed data.
Oct 11, 2016 • Jesse Anderson [@jessetanderson]
Tyler Akidau and I gave a three-hour tutorial on Apache Beam at Strata+Hadoop World 2016. We had a plethora of help from our TAs: Kenn Knowles, Reuven Lax, Felipe Hoffa, Slava Chernyak, and Jamie Grier. There were a total of 66 people that attended the session.
Aug 3, 2016 • Frances Perry [@francesjperry]
It’s been just over six months since Apache Beam was formally accepted into incubation with the Apache Software Foundation. As a community, we’ve been hard at work getting Beam off the ground.
Jun 15, 2016 • Davor Bonaci [@BonaciDavor]
I’m happy to announce that Apache Beam has officially released its first version – 0.1.0-incubating. This is an exciting milestone for the project, which joined the Apache Software Foundation and the Apache Incubator earlier this year.
Jun 13, 2016 • Aljoscha Krettek [@aljoscha]
We recently achieved a major milestone by adding support for windowing to the Apache Flink Batch runner. In this post we would like to explain what this means for users of Apache Beam and highlight some of the implementation details.
May 27, 2016 • Robert Bradshaw
Have you ever wondered why Beam has PTransforms for everything instead of having methods on PCollection? Take a look at the history that led to this (and other) design decisions.
May 18, 2016 • Dan Halperin
This morning, Eugene and Malo from the Google Cloud Dataflow team posted No shard left behind: dynamic work rebalancing in Google Cloud Dataflow. This article discusses Cloud Dataflow’s solution to the well-known straggler problem.
Are you interested in giving a presentation about Apache Beam? Perhaps you want to talk about Apache Beam at a local Meetup or a convention. Excellent! The Apache Beam community is excited to expand and grow the community. To help kickstart this process, we are excited to announce an initial set of Apache Beam presentation materials which anyone can use to give a presentation about Apache Beam.
With initial code drops complete (Dataflow SDK and Runner, Flink Runner, Spark Runner) and expressed interest in runner implementations for Storm, Hadoop, and Gearpump (amongst others), we wanted to start addressing a big question in the Apache Beam (incubating) community: what capabilities will each runner be able to support?
Feb 25, 2016 • James Malone [@chimerasaurus]
When the Apache Beam project proposed entry into the Apache Incubator the proposal included the Dataflow Java SDK. In the long term, however, Apache Beam aims to support SDKs implemented in multiple languages, such as Python.
Feb 22, 2016 • James Malone [@chimerasaurus]
One of the major benefits of Apache Beam is the fact that it unifies both both batch and stream processing into one powerful model. In fact, this unification is so important, the name Beam itself comes from the union of Batch + strEAM = Beam
When the project started, we wanted a logo which was both appealing and visually represented this unification.