Apache Beam Blog

This is the blog for the Apache Beam project. This blog contains news and updates for the project.

Apache Beam 2.14.0

Jul 31, 2019 • Anton Kedin & Ahmet Altay

We are happy to present the new 2.14.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

Read more 


Looping timers in Apache Beam

Jun 11, 2019 • Reza Rokni [@rarokni] & Kenneth Knowles [@KennKnowles]

Apache Beam’s primitives let you build expressive data pipelines, suitable for a variety of use cases. One specific use case is the analysis of time series data in which continuous sequences across window boundaries are important. A few fun challenges arise as you tackle this type of data and in this blog we will explore one of those in more detail and make use of the Timer API (blog post) using the “looping timer” pattern.

Read more 


Apache Beam 2.13.0

Jun 7, 2019 •

</i>

We are happy to present the new 2.13.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

Read more 


Adding new Data Sources to Beam SQL CLI

Jun 4, 2019 • Pablo Estrada [@polecitoem]

A new, exciting feature that came to Apache Beam is the ability to use SQL in your pipelines. This is done using Beam’s SqlTransform in Java pipelines.

Beam also has a fancy new SQL command line that you can use to query your data interactively, be it Batch or Streaming. If you haven’t tried it, check out http://bit.ly/ExploreBeamSQL.

A nice feature of the SQL CLI is that you can use CREATE EXTERNAL TABLE commands to add data sources to be accessed in the CLI. Currently, the CLI supports creating tables from BigQuery, PubSub, Kafka, and text files. In this post, we explore how to add new data sources, so that you will be able to consume data from other Beam sources.

Read more 


Apache Beam Katas

May 30, 2019 • Henry Suryawirawan [@henry_ken]

We are happy to announce Apache Beam Katas, a set of interactive Beam coding exercises (i.e. code katas) that can help you in learning Apache Beam concepts and programming model hands-on.

Read more 


Beam community update!

May 11, 2019 • Matthias Baetens [@matthiasbaetens]

The Apache Beam community in 2019

2019 has already been a busy time for the Apache Beam community. The ASF blog featured our way of community building and we’ve had more Beam meetups around the world. Apache Beam also received the Technology of the Year Award from InfoWorld.

As these events happened, we were building up to the 20th anniversary of the Apache Software Foundation. The contributions of the Beam community were a part of Maximilian Michels blog post on the success of the ASF’s open source development model:

Success at Apache: What You Need to Know by Maximilian Michels https://t.co/XjtVYgPAHX #Apache #Open #Innovation #Community #people #processes #JustWorks @stadtlegende pic.twitter.com/xSibnyWAMe

— Apache - The ASF (@TheASF) 26 maart 2019

In that spirit, let’s have an overview of the things that have happened, what the next few months look like, and how we can foster even more community growth.

Meetups

We’ve had a flurry of activity, with several meetups in the planning process and more popping up globally over time. As diversity of contributors is a core ASF value, this geographic spread is exciting for the community. Here’s a picture from the latest Apache Beam meetup organized at Lyft in San Francisco:

Beam Meetup Bay Area

We have more Bay Area meetups coming soon, and the community is looking into kicking off a meetup in Toronto!

London had its first meetup of 2019 at the start of April:

Beam Meetup London

and Stockholm had its second meetup at the start of May:

Big audience for the second @ApacheBeam meetup in Stockholm! Gleb, @kanterov from @SpotifyEng kicking off the first talk with Beam SQL.#ApacheBeamStockholm pic.twitter.com/fDqPPFh2gY

— Matthias Baetens 🌆 (@matthiasbaetens) 6 May 2019

Keep an eye out for a meetup in Paris.

If you are interested in starting your own meetup, feel free to reach out! Good places to start include our Slack channel, the dev and user mailing lists, or the Apache Beam Twitter.

Even if you can’t travel to these meetups, you can stay informed on the happenings of the community. The talks and sessions from previous conferences and meetups are archived on the Apache Beam YouTube channel. If you want your session added to the channel, don’t hesitate to get in touch! And in case you want to attend the next Beam event in style, you can also order your swag on the Beam swag store

Summits

The first summit of the year will be held in Berlin:

Beam Summit Europe Banner

You can find more info on the website and read about the inaugural edition of the Beam Summit Europe here. At these summits, you have the opportunity to meet with other Apache Beam creators and users, get expert advice, learn from the speaker sessions, and participate in workshops.

We strongly encourage you to get involved again this year! You can participate in the following ways for the upcoming summit in Europe:

🎫 If you want to secure your ticket to attend the Beam Summit Europe 2019, check our event page.

💸 If you want to make the Summit even more awesome, check out our sponsor booklet!

We also launched the CfP for our Beam Summit in North America, which will be held in collaboration with ApacheCon.

🎤 If you want to give a talk, take a look at our CfP.

Stay tuned for more information on the summit in North America and Asia.

Why community engagement matters

Why we need a strong Apache Beam community:

Why are we organizing these summits:


Apache Beam + Kotlin = ❤️

Apr 25, 2019 • Harshit Dwivedi [@harshithdwivedi]

Apache Beam samples are now available in Kotlin!

Read more 


Apache Beam 2.12.0

Apr 25, 2019 •

</i>

We are happy to present the new 2.12.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

Read more 


Apache Beam is applying to Season of Docs

Apr 19, 2019 • Aizhamal Nurmamat kyzy [@iamaijamal]

The Apache Beam community is thrilled to announce its application to the first edition of Season of Docs 2019!

Read more 


Announcing Beam Summit Site

Mar 18, 2019 • Aizhamal Nurmamat kyzy [@iamaijamal]

We are thrilled to announce the launch of our new website dedicated to Beam Summits!

The beamsummit.org site provides all the information you need towards the upcoming Beam Summits in Europe, Asia and North America in 2019. You will find information about the conference theme, the speakers and sessions, the abstract submission timeline and the registration process, the conference venues and hosting cities - and much more that you will find useful until and during the Beam Summits 2019.

We are working to make the website easy to use, so that anyone who is organizing a Beam event can rely on it. You can find the code for it in Github.

The pages will be updated on a regular basis, but we also love hearing thoughts from our community! Let us know if you have any questions, comments or suggestions, and help us improve! Also, if you are thinking of organizing a Beam event, please feel free to reach out for support, and to use the code in GitHub as well.

We sincerely hope that you like the new Beam Summit website and will find it useful for accessing information. Enjoy browsing around!

See you in Berlin!

#beamsummit2019.


Apache Beam 2.11.0

Mar 5, 2019 • Ahmet Altay

We are happy to present the new 2.11.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

Read more 


Apache Beam 2.10.0

Feb 15, 2019 • Kenneth Knowles [@KennKnowles]

We are happy to present the new 2.10.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

Read more 


Apache Beam 2.9.0

Dec 13, 2018 • Chamikara Jayalath

We are happy to present the new 2.9.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

Read more 


Inaugural edition of the Beam Summit Europe 2018 - aftermath

Oct 31, 2018 • Matthias Baetens [@matthiasbaetens]

Almost 1 month ago, we had the pleasure to welcome the Beam community at Level39 in London for the inaugural edition of the Beam Summit London Summit.

Read more 


Apache Beam 2.8.0

Oct 29, 2018 • Ahmet Altay

We are happy to present the new 2.8.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

Read more 


Apache Beam 2.7.0

Oct 3, 2018 • Charles Chen

We are happy to present the new 2.7.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

Read more 


Beam Summit Europe 2018

Aug 21, 2018 • Matthias Baetens [@matthiasbaetens]

With a growing community of contributors and users, the Apache Beam project is organising the first European Beam Summit.

We are happy to invite you to this event, which will take place in London on October 1st and 2nd of 2018.

Read more 


A review of input streaming connectors

Aug 20, 2018 • Leonid Kuligin [@lkulighin] & Julien Phalip [@julienphalip]

In this post, you’ll learn about the current state of support for input streaming connectors in Apache Beam. For more context, you’ll also learn about the corresponding state of support in Apache Spark.

Read more 


Apache Beam 2.6.0

Aug 10, 2018 • Pablo Estrada [@polecitoem] & Rafael Fernández

We are glad to present the new 2.6.0 release of Beam. This release includes multiple fixes and new functionality, such as new features in SQL and portability.

Read more 


Apache Beam 2.5.0

Jun 26, 2018 • Alexey Romanenko [@alexromdev]

We are glad to present the new 2.5.0 release of Beam. This release includes multiple fixes and new functionalities.

Read more 


Apache Beam 2.3.0

Feb 19, 2018 • Ismaël Mejía [@iemejia]

We are glad to present the new 2.3.0 release of Beam. This release includes multiple fixes and new functionalities.

Read more 


Apache Beam: A Look Back at 2017

Jan 9, 2018 • Anand Iyer & Jean-Baptiste Onofré [@jbonofre]

On January 10, 2017, Apache Beam got promoted as a Top-Level Apache Software Foundation project. It was an important milestone that validated the value of the project, legitimacy of its community, and heralded its growing adoption. In the past year, Apache Beam has been on a phenomenal growth trajectory, with significant growth in its community and feature set. Let us walk you through some of the notable achievements.

Read more 


Timely (and Stateful) Processing with Apache Beam

Aug 28, 2017 • Kenneth Knowles [@KennKnowles]

In a prior blog post, I introduced the basics of stateful processing in Apache Beam, focusing on the addition of state to per-element processing. So-called timely processing complements stateful processing in Beam by letting you set timers to request a (stateful) callback at some point in the future.

What can you do with timers in Beam? Here are some examples:

These are just a few possibilities. State and timers together form a powerful programming paradigm for fine-grained control to express a huge variety of workflows. Stateful and timely processing in Beam is portable across data processing engines and integrated with Beam’s unified model of event time windowing in both streaming and batch processing.

Read more 


Powerful and modular IO connectors with Splittable DoFn in Apache Beam

Aug 16, 2017 • Eugene Kirpichov

One of the most important parts of the Apache Beam ecosystem is its quickly growing set of connectors that allow Beam pipelines to read and write data to various data storage systems (“IOs”). Currently, Beam ships over 20 IO connectors with many more in active development. As user demands for IO connectors grew, our work on improving the related Beam APIs (in particular, the Source API) produced an unexpected result: a generalization of Beam’s most basic primitive, DoFn.

Read more 


Apache Beam publishes the first stable release

May 17, 2017 • Davor Bonaci [@BonaciDavor] & Dan Halperin

The Apache Beam community is pleased to announce the availability of version 2.0.0. This is the first stable release of Apache Beam, signifying a statement from the community that it intends to maintain API stability with all releases for the foreseeable future, and making Beam suitable for enterprise deployment.

Read more 


Python SDK released in Apache Beam 0.6.0

Mar 16, 2017 • Ahmet Altay

Apache Beam’s latest release, version 0.6.0, introduces a new SDK – this time, for the Python programming language. The Python SDK joins the Java SDK as the second implementation of the Beam programming model.

Read more 


Stateful processing with Apache Beam

Feb 13, 2017 • Kenneth Knowles [@KennKnowles]

Beam lets you process unbounded, out-of-order, global-scale data with portable high-level pipelines. Stateful processing is a new feature of the Beam model that expands the capabilities of Beam, unlocking new use cases and new efficiencies. In this post, I will guide you through stateful processing in Beam: how it works, how it fits in with the other features of the Beam model, what you might use it for, and what it looks like in code.

Note: This post has been updated in May of 2019, to include Python snippets!

Read more 


Media recap of the Apache Beam graduation

Feb 1, 2017 • Davor Bonaci [@BonaciDavor]

One year ago today Apache Beam was accepted into incubation at the Apache Software Foundation. The community’s work over the past year culminated, just over three weeks ago, with an announcement that Apache Beam has successfully graduated as a new Top-Level Project at the foundation. Graduation sparked an additional interest in the project, from corporate endorsements, news articles, interviews, to the volume of traffic to our website and mailing lists.

Read more 


Apache Beam established as a new top-level project

Jan 10, 2017 • Davor Bonaci [@BonaciDavor]

Today, the Apache Software Foundation announced that Apache Beam has successfully graduated from incubation, becoming a new Top-Level Project at the foundation and signifying that its “community and products have been well-governed under the foundation’s meritocratic process and principles”.

Read more 


Release 0.4.0 adds a runner for Apache Apex

Jan 9, 2017 • Thomas Weise [@thweise]

The latest release 0.4.0 of Apache Beam adds a new runner for Apache Apex. We are excited to reach this initial milestone and are looking forward to continued collaboration between the Beam and Apex communities to advance the runner.

Read more 


Testing Unbounded Pipelines in Apache Beam

Oct 20, 2016 • Thomas Groh

The Beam Programming Model unifies writing pipelines for Batch and Streaming pipelines. We’ve recently introduced a new PTransform to write tests for pipelines that will be run over unbounded datasets and must handle out-of-order and delayed data.

Read more 


Strata+Hadoop World and Beam

Oct 11, 2016 • Jesse Anderson [@jessetanderson]

Tyler Akidau and I gave a three-hour tutorial on Apache Beam at Strata+Hadoop World 2016. We had a plethora of help from our TAs: Kenn Knowles, Reuven Lax, Felipe Hoffa, Slava Chernyak, and Jamie Grier. There were a total of 66 people that attended the session.

Read more 


Apache Beam: Six Months in Incubation

Aug 3, 2016 • Frances Perry [@francesjperry]

It’s been just over six months since Apache Beam was formally accepted into incubation with the Apache Software Foundation. As a community, we’ve been hard at work getting Beam off the ground.

Read more 


The first release of Apache Beam!

Jun 15, 2016 • Davor Bonaci [@BonaciDavor]

I’m happy to announce that Apache Beam has officially released its first version – 0.1.0-incubating. This is an exciting milestone for the project, which joined the Apache Software Foundation and the Apache Incubator earlier this year.

Read more 


Jun 13, 2016 • Aljoscha Krettek [@aljoscha]

We recently achieved a major milestone by adding support for windowing to the Apache Flink Batch runner. In this post we would like to explain what this means for users of Apache Beam and highlight some of the implementation details.

Read more 


Where’s my PCollection.map()?

May 27, 2016 • Robert Bradshaw

Have you ever wondered why Beam has PTransforms for everything instead of having methods on PCollection? Take a look at the history that led to this (and other) design decisions.

Read more 


Dynamic work rebalancing for Beam

May 18, 2016 • Dan Halperin

This morning, Eugene and Malo from the Google Cloud Dataflow team posted No shard left behind: dynamic work rebalancing in Google Cloud Dataflow. This article discusses Cloud Dataflow’s solution to the well-known straggler problem.

Read more 


Apache Beam Presentation Materials

Apr 3, 2016 • Frances Perry [@francesjperry] & Tyler Akidau [@takidau]

Are you interested in giving a presentation about Apache Beam? Perhaps you want to talk about Apache Beam at a local Meetup or a convention. Excellent! The Apache Beam community is excited to expand and grow the community. To help kickstart this process, we are excited to announce an initial set of Apache Beam presentation materials which anyone can use to give a presentation about Apache Beam.

Read more 


Clarifying & Formalizing Runner Capabilities

Mar 17, 2016 • Frances Perry [@francesjperry] & Tyler Akidau [@takidau]

With initial code drops complete (Dataflow SDK and Runner, Flink Runner, Spark Runner) and expressed interest in runner implementations for Storm, Hadoop, and Gearpump (amongst others), we wanted to start addressing a big question in the Apache Beam (incubating) community: what capabilities will each runner be able to support?

Read more 


Dataflow Python SDK is now public!

Feb 25, 2016 • James Malone [@chimerasaurus]

When the Apache Beam project proposed entry into the Apache Incubator the proposal included the Dataflow Java SDK. In the long term, however, Apache Beam aims to support SDKs implemented in multiple languages, such as Python.

Read more 


Feb 22, 2016 • James Malone [@chimerasaurus]

One of the major benefits of Apache Beam is the fact that it unifies both both batch and stream processing into one powerful model. In fact, this unification is so important, the name Beam itself comes from the union of Batch + strEAM = Beam

When the project started, we wanted a logo which was both appealing and visually represented this unification.

Read more