Apache Beam Blog
This is the blog for the Apache Beam project. This blog contains news and updates for the project.
Mar 16, 2017 • Ahmet Altay
Apache Beam’s latest release, version 0.6.0, introduces a new SDK – this time, for the Python programming language. The Python SDK joins the Java SDK as the second implementation of the Beam programming model.
Feb 13, 2017 • Kenneth Knowles [@KennKnowles]
Beam lets you process unbounded, out-of-order, global-scale data with portable high-level pipelines. Stateful processing is a new feature of the Beam model that expands the capabilities of Beam, unlocking new use cases and new efficiencies. In this post, I will guide you through stateful processing in Beam: how it works, how it fits in with the other features of the Beam model, what you might use it for, and what it looks like in code.
Feb 1, 2017 • Davor Bonaci [@BonaciDavor]
One year ago today Apache Beam was accepted into incubation at the Apache Software Foundation. The community’s work over the past year culminated, just over three weeks ago, with an announcement that Apache Beam has successfully graduated as a new Top-Level Project at the foundation. Graduation sparked an additional interest in the project, from corporate endorsements, news articles, interviews, to the volume of traffic to our website and mailing lists.
Jan 10, 2017 • Davor Bonaci [@BonaciDavor]
Today, the Apache Software Foundation announced that Apache Beam has successfully graduated from incubation, becoming a new Top-Level Project at the foundation and signifying that its “community and products have been well-governed under the foundation’s meritocratic process and principles”.
Jan 9, 2017 • Thomas Weise [@thweise]
The latest release 0.4.0 of Apache Beam adds a new runner for Apache Apex. We are excited to reach this initial milestone and are looking forward to continued collaboration between the Beam and Apex communities to advance the runner.
Oct 20, 2016 • Thomas Groh
The Beam Programming Model unifies writing pipelines for Batch and Streaming pipelines. We’ve recently introduced a new PTransform to write tests for pipelines that will be run over unbounded datasets and must handle out-of-order and delayed data.
Oct 11, 2016 • Jesse Anderson [@jessetanderson]
Tyler Akidau and I gave a three-hour tutorial on Apache Beam at Strata+Hadoop World 2016. We had a plethora of help from our TAs: Kenn Knowles, Reuven Lax, Felipe Hoffa, Slava Chernyak, and Jamie Grier. There were a total of 66 people that attended the session.
Aug 3, 2016 • Frances Perry [@francesjperry]
It’s been just over six months since Apache Beam was formally accepted into incubation with the Apache Software Foundation. As a community, we’ve been hard at work getting Beam off the ground.
Jun 15, 2016 • Davor Bonaci [@BonaciDavor]
I’m happy to announce that Apache Beam has officially released its first version – 0.1.0-incubating. This is an exciting milestone for the project, which joined the Apache Software Foundation and the Apache Incubator earlier this year.
Jun 13, 2016 • Aljoscha Krettek [@aljoscha]
We recently achieved a major milestone by adding support for windowing to the Apache Flink Batch runner. In this post we would like to explain what this means for users of Apache Beam and highlight some of the implementation details.
May 27, 2016 • Robert Bradshaw
Have you ever wondered why Beam has PTransforms for everything instead of having methods on PCollection? Take a look at the history that led to this (and other) design decisions.
May 18, 2016 • Dan Halperin
This morning, Eugene and Malo from the Google Cloud Dataflow team posted No shard left behind: dynamic work rebalancing in Google Cloud Dataflow. This article discusses Cloud Dataflow’s solution to the well-known straggler problem.
Are you interested in giving a presentation about Apache Beam? Perhaps you want to talk about Apache Beam at a local Meetup or a convention. Excellent! The Apache Beam community is excited to expand and grow the community. To help kickstart this process, we are excited to announce an initial set of Apache Beam presentation materials which anyone can use to give a presentation about Apache Beam.
With initial code drops complete (Dataflow SDK and Runner, Flink Runner, Spark Runner) and expressed interest in runner implementations for Storm, Hadoop, and Gearpump (amongst others), we wanted to start addressing a big question in the Apache Beam (incubating) community: what capabilities will each runner be able to support?
Feb 25, 2016 • James Malone [@chimerasaurus]
When the Apache Beam project proposed entry into the Apache Incubator the proposal included the Dataflow Java SDK. In the long term, however, Apache Beam aims to support SDKs implemented in multiple languages, such as Python.
Feb 22, 2016 • James Malone [@chimerasaurus]
One of the major benefits of Apache Beam is the fact that it unifies both both batch and stream processing into one powerful model. In fact, this unification is so important, the name Beam itself comes from the union of Batch + strEAM = Beam
When the project started, we wanted a logo which was both appealing and visually represented this unification.