Blogs

Blog

blog & release

2025/07/01

Apache Beam 2.66.0

Vitalii Terentev

We are happy to present the new 2.66.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2025/06/16

My Experience at Beam College 2025: 3rd Place Hackathon Winner

Marcio Sugar

Introduction: The Spark of an Idea In 2025, I had the opportunity to participate in the Beam College Hackathon, a fantastic event that brings together students and professionals to explore the power of Apache Beam.

blog & release

2025/05/12

Apache Beam 2.65.0

Yi Hu

We are happy to present the new 2.65.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2025/03/31

Apache Beam 2.64.0

XQ Hu

We are happy to present the new 2.64.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2025/02/18

Apache Beam 2.63.0

Jack R. McCluskey

We are happy to present the new 2.63.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2025/01/21

Apache Beam 2.62.0

Kenneth Knowles

We are happy to present the new 2.62.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2024/11/25

Apache Beam 2.61.0

Danny McCormick

We are happy to present the new 2.61.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2024/10/17

Apache Beam 2.60.0

Yi Hu

We are happy to present the new 2.60.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2024/10/16

Apache Beam Summit 2024: Unlocking the power of ML for data processing

XQ Hu, Danny McCormick & Reza Rokni

At the recently concluded Beam Summit 2024, a two-day event held from September 4 to 5, numerous captivating presentations showcased the potential of Beam to address a wide range of challenges, with an emphasis on machine learning (ML).

blog

2024/09/20

Efficient Streaming Data Processing with Beam YAML and Protobuf

Ferran Fernandez

Efficient Streaming Data Processing with Beam YAML and Protobuf As streaming data processing grows, so do its maintenance, complexity, and costs. This post explains how to efficiently scale pipelines by using Protobuf, which ensures that pipelines are reusable and quick to deploy. The goal is to keep this process simple for engineers to implement using Beam YAML.

blog

2024/09/13

Unit Testing in Beam: An opinionated guide

Svetak Sundhar

Testing remains one of the most fundamental components of software engineering. In this blog post, we shed light on some of the constructs that Apache Beam provides for testing. We cover an opinionated set of best practices to write unit tests for your data pipeline.

blog & release

2024/09/11

Apache Beam 2.59.0

Robert Burke

We are happy to present the new 2.59.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2024/08/15

Apache Beam 2.58.1

Danny McCormick

We are happy to present the new 2.58.1 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2024/08/06

Apache Beam 2.58.0

Jack R. McCluskey

We are happy to present the new 2.58.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2024/06/26

Apache Beam 2.57.0

Kenneth Knowles

We are happy to present the new 2.57.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2024/06/20

Deploy Python pipelines on Kubernetes using the Flink runner

Jaehyeon Kim

Deploy Python pipelines on Kubernetes using the Flink runner The Apache Flink Kubernetes Operator acts as a control plane to manage the complete deployment lifecycle of Apache Flink applications. With the operator, we can simplify the deployment and management of Apache Beam pipelines. In this post, we develop an Apache Beam pipeline using the Python SDK and deploy it on an Apache Flink cluster by using the Apache Flink runner. We first deploy an Apache Kafka cluster on a minikube cluster, because the pipeline uses Kafka topics for its data source and sink. Then, we develop the pipeline as a Python package and add the package to a custom Docker image so that Python user code can be executed externally. For deployment, we create a Flink session cluster using the Flink Kubernetes Operator, and deploy the pipeline using a Kubernetes job. Finally, we check the output of the application by sending messages to the input Kafka topic using a Python producer application. Resources to run a Python Beam pipeline on Flink Set up the Kafka cluster Deploy the Strimzi operator Deploy the Kafka cluster Deploy the Kafka UI Develop a stream processing app Beam pipeline code Build Docker images Deploy the stream processing app Deploy the Flink Kubernetes Operator Deploy the Beam pipeline Kafka producer

blog & release

2024/05/01

Apache Beam 2.56.0

Danny McCormick

We are happy to present the new 2.56.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2024/04/11

Introducing Beam YAML: Apache Beam's First No-code SDK

Jeff Kinard

Writing a Beam pipeline can be a daunting task. Learning the Beam model, downloading dependencies for the SDK language of choice, debugging the pipeline, and maintaining the pipeline code is a lot of overhead for users who want to write a simple to intermediate data processing pipeline. There have been strides in making the SDK’s entry points easier, but for many, it is still a long way from being a painless process. To address some of these issues and simplify the entry point to Beam, we have introduced a new way to specify Beam pipelines by using configuration files rather than code. This new SDK, known as Beam YAML, employs a declarative approach to creating data processing pipelines using YAML, a widely used data serialization language.

blog & release

2024/03/25

Apache Beam 2.55.0

Yi Hu

We are happy to present the new 2.55.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2024/02/14

Apache Beam 2.54.0

Robert Burke

We are happy to present the new 2.54.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2024/02/05

Behind the Scenes: Crafting an Autoscaler for Apache Beam in a High-Volume Streaming Environment

Talat Uyarer

Introduction to the Design of Our Autoscaler for Apache Beam Jobs Welcome to the third and final part of our blog series on building a scalable, self-managed streaming infrastructure with Beam and Flink.

blog & release

2024/01/04

Apache Beam 2.53.0

Jack R. McCluskey

We are happy to present the new 2.53.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2024/01/03

Scaling a streaming workload on Apache Beam, 1 million events per second and beyond

Pablo Rodriguez Defino

Scaling a streaming workload is critical for ensuring that a pipeline can process large amounts of data while also minimizing latency and executing efficiently. Without proper scaling, a pipeline may experience performance issues or even fail entirely, delaying the time to insights for the business.

blog

2023/12/18

Build a scalable, self-managed streaming infrastructure with Beam and Flink: Tackling Autoscaling Challenges - Part 2

Talat Uyarer

Build a scalable, self-managed streaming infrastructure with Flink: Tackling Autoscaling Challenges - Part 2 Welcome to Part 2 of our in-depth series about building and managing a service for Apache Beam Flink on Kubernetes.

blog & release

2023/11/17

Apache Beam 2.52.0

Danny McCormick

We are happy to present the new 2.52.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2023/11/11

Contributor Spotlight: Johanna Öjeling

Ahmet Altay

Johanna Öjeling is a Senior Software Engineer at Normative. She started using Apache Beam in 2020 at her previous company Datatonic and began contributing in 2022 at a personal capacity.

blog

2023/11/03

Build a scalable, self-managed streaming infrastructure with Beam and Flink

Talat Uyarer

In this blog series, Talat Uyarer (Architect / Senior Principal Engineer), Rishabh Kedia (Principal Engineer), and David He (Engineering Director) describe how we built a self-managed streaming platform by using Apache Beam and Flink. In this part of the series, we describe why and how we built a large-scale, self-managed streaming infrastructure and services based on Flink by migrating from a cloud managed streaming service. We also outline the learnings for operational scalability and observability, performance, and cost effectiveness. We summarize techniques that we found useful in our journey.

blog & release

2023/10/11

Apache Beam 2.51.0

Kenneth Knowles

We are happy to present the new 2.51.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2023/10/02

DIY GenAI Content Discovery Platform with Apache Beam

Pablo Rodriguez Defino & Namita Sharma

DIY GenAI Content Discovery Platform with Apache Beam Your digital assets, such as documents, PDFs, spreadsheets, and presentations, contain a wealth of valuable information, but sometimes it’s hard to find what you’re looking for.

blog & release

2023/08/30

Apache Beam 2.50.0

Robert Burke

We are happy to present the new 2.50.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2023/07/17

Apache Beam 2.49.0

Yi Hu

We are happy to present the new 2.49.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2023/06/23

Managing Beam dependencies in Java

Bruno Volpato

Managing your Java dependencies can be challenging, and if not done correctly, it may cause a variety of problems, as incompatibilities may arise when using specific and previously untested combinations. To make that process easier, Beam now provides Bill of Materials (BOM) artifacts that will help dependency management tools to select compatible combinations. We hope this will make it easier for you to use Apache Beam, and have a simpler transition when upgrading to newer versions.

blog

2023/06/06

Getting started with Apache Beam: An open source proficiency credential sponsored by Google Cloud

Svetak Sundhar

We’re excited to announce the release of the “Getting Started with Apache Beam” quest, a series of four online labs that venture into different Apache Beam concepts. When you complete all four labs, you’ll earn a Google Cloud badge that you can share on platforms like LinkedIn.

blog & release

2023/05/31

Apache Beam 2.48.0

Ritesh Ghorse

We are happy to present the new 2.48.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2023/05/10

Apache Beam 2.47.0

Jack R. McCluskey

We are happy to present the new 2.47.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2023/03/10

Apache Beam 2.46.0

Danny McCormick

We are happy to present the new 2.46.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2023/02/15

Apache Beam 2.45.0

John Casey

We are happy to present the new 2.45.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2023/01/17

Apache Beam 2.44.0

Kenneth Knowles

We are happy to present the new 2.44.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2022/11/30

Apache Beam Playground: An interactive environment to try transforms and examples

Alex Kosolapov

What is Apache Beam Playground? Apache Beam Playground is an interactive environment to try Apache Beam transforms and examples without requiring to install or set up a Beam environment.

blog & release

2022/11/17

Apache Beam 2.43.0

Chamikara Jayalath

We are happy to present the new 2.43.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & python

2022/11/09

New Resources Available for Beam ML

Danny McCormick

If you’ve been paying attention, over the past year you’ve noticed that Beam has released a number of features designed to make Machine Learning easy. Ranging from things like the introduction of the RunInference transform to the continued refining of Beam Dataframes, this has been an area where we’ve seen Beam make huge strides.

blog

2022/11/03

Beam starter projects

David Cavazos

We’re happy to announce that we’re providing new Beam starter projects! 🎉 Setting up and configuring a new project can be time consuming, and varies in different languages. We hope this will make it easier for you to get started in creating new Apache Beam projects and pipelines.

blog & release

2022/10/17

Apache Beam 2.42.0

Robert Burke

We are happy to present the new 2.42.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2022/10/15

Apache Hop web version with Cloud Dataflow

Israel Herraiz

Hop is a codeless visual development environment for Apache Beam pipelines that can run jobs in any Beam runner, such as Dataflow, Flink or Spark. In a previous post, we introduced the desktop version of Apache Hop.

blog & release

2022/08/23

Apache Beam 2.41.0

Kiley Sok

We are happy to present the new 2.41.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & go

2022/07/06

Big Improvements in Beam Go's 2.40 Release

Danny McCormick

The 2.40 release is one of Beam Go’s biggest yet, and we wanted to highlight some of the biggest changes coming with this important release! Native Streaming Support 2.40 marks the release of one of our most anticipated feature sets yet: native streaming Go pipelines.

blog & release

2022/06/25

Apache Beam 2.40.0

Pablo Estrada

We are happy to present the new 2.40.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2022/05/25

Apache Beam 2.39.0

Yichi Zhang

We are happy to present the new 2.39.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2022/04/28

Running Beam SQL in notebooks

Ning Kang

Intro Beam SQL allows a Beam user to query PCollections with SQL statements. Interactive Beam provides an integration between Apache Beam and Jupyter Notebooks (formerly known as IPython Notebooks) to make pipeline prototyping and data exploration much faster and easier.

blog

2022/04/22

Running Apache Hop visual pipelines with Google Cloud Dataflow

Israel Herraiz

Intro Apache Hop (https://hop.apache.org/) is a visual development environment for creating data pipelines using Apache Beam. You can run your Hop pipelines in Spark, Flink or Google Cloud Dataflow.

blog & release

2022/04/20

Apache Beam 2.38.0

Daniel Oliviera

We are happy to present the new 2.38.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2022/03/04

Apache Beam 2.37.0

Brian Hulette

We are happy to present the new 2.37.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2022/02/28

Upcoming Events for Beam in 2022

Brittany Hermann

We are so excited to announce the upcoming Beam events for this year! We believe that events are an important mechanism to foster the community around Apache Beam as an Open Source Project. Our events are focused on a developer experience by giving spaces for the community to connect, facilitate collaboration, and enable knowledge sharing.

blog & release

2022/02/07

Apache Beam 2.36.0

Emily Ye

We are happy to present the new 2.36.0 release of Apache Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2021/12/29

Apache Beam 2.35.0

Valentyn Tymofieiev

We are happy to present the new 2.35.0 release of Apache Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2021/11/11

Apache Beam 2.34.0

Kyle Weaver

We are happy to present the new 2.34.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2021/11/04

Go SDK Exits Experimental in Apache Beam 2.33.0

Robert Burke

Apache Beam’s latest release, version 2.33.0, is the first official release of the long experimental Go SDK. Built with the Go Programming Language, the Go SDK joins the Java and Python SDKs as the third implementation of the Beam programming model.

blog & release

2021/10/07

Apache Beam 2.33.0

Udi Meiri

We are happy to present the new 2.33.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2021/08/25

Apache Beam 2.32.0

Ankur Goenka

We are happy to present the new 2.32.0 release of Apache Beam. This release includes both improvements and new functionality. See the download page for this release. For more information on changes in 2.

blog & release

2021/07/08

Apache Beam 2.31.0

Andrew Pilloud

We are happy to present the new 2.31.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2021/06/09

Apache Beam 2.30.0

Heejong Lee

We are happy to present the new 2.30.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2021/06/08

How to validate a Beam Release

Pablo Estrada

Performing new releases is a core responsibility of any software project. It is even more important in the culture of Apache projects. Releases are the main flow of new code / features among the community of a project.

blog & release

2021/04/29

Apache Beam 2.29.0

Kenneth Knowles

We are happy to present the new 2.29.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2021/02/22

Apache Beam 2.28.0

Chamikara Jayalath

We are happy to present the new 2.28.0 release of Apache Beam. This release includes both improvements and new functionality. See the download page for this release. For more information on changes in 2.

blog & java

2021/01/15

Example to ingest data from Apache Kafka to Google Cloud Pub/Sub

Artur Khanin, Ilya Kozyrev & Alex Kosolapov

In this blog post we present an example that creates a pipeline to read data from a single topic or multiple topics from Apache Kafka and write data into a topic in Google Pub/Sub.

blog & release

2021/01/07

Apache Beam 2.27.0

Pablo Estrada

We are happy to present the new 2.27.0 release of Apache Beam. This release includes both improvements and new functionality. See the download page for this release. For more information on changes in 2.

blog

2020/12/16

DataFrame API Preview now Available!

Brian Hulette & Robert Bradshaw

We’re excited to announce that a preview of the Beam Python SDK’s new DataFrame API is now available in Beam 2.26.0. Much like SqlTransform (Java, Python), the DataFrame API gives Beam users a way to express complex relational logic much more concisely than previously possible.

blog

2020/12/14

Splittable DoFn in Apache Beam is Ready to Use

Boyuan Zhang

We are pleased to announce that Splittable DoFn (SDF) is ready for use in the Beam Python, Java, and Go SDKs for versions 2.25.0 and later. In 2017, Splittable DoFn Blog Post proposed to build Splittable DoFn APIs as the new recommended way of building I/O connectors.

blog & release

2020/12/11

Apache Beam 2.26.0

Robert Burke

We are happy to present the new 2.26.0 release of Apache Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2020/10/23

Apache Beam 2.25.0

Robin Qiu

We are happy to present the new 2.25.0 release of Apache Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2020/09/18

Apache Beam 2.24.0

Daniel Oliviera

We are happy to present the new 2.24.0 release of Apache Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2020/08/27

Pattern Matching with Beam SQL

Mark Zeng

Introduction SQL is becoming increasingly powerful and useful in the field of data analysis. MATCH_RECOGNIZE, a new SQL component introduced in 2016, brings extra analytical functionality. This project, as part of Google Summer of Code, aims to support basic MATCH_RECOGNIZE functionality.

blog, python & typing

2020/08/21

Performance-Driven Runtime Type Checking for the Python SDK

Saavan Nanavati

In this blog post, we’re announcing the upcoming release of a new, opt-in runtime type checking system for Beam’s Python SDK that’s optimized for performance in both development and production environments.

blog, python & typing

2020/08/21

Improved Annotation Support for the Python SDK

Saavan Nanavati

The importance of static type checking in a dynamically typed language like Python is not up for debate. Type hints allow developers to leverage a strong typing system to:

blog & release

2020/07/29

Apache Beam 2.23.0

Valentyn Tymofieiev

We are happy to present the new 2.23.0 release of Apache Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2020/06/08

Apache Beam 2.22.0

Brian Hulette

We are happy to present the new 2.22.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2020/06/01

Announcing Beam Katas for Kotlin

Rion Williams

Today, we are happy to announce a new addition to the Beam Katas family: Kotlin!

blog, python & typing

2020/05/28

Python SDK Typing Changes

Chad Dombrova & Udi Meiri

Beam Python has recently increased its support and integration of Python 3 type annotations for improved code clarity and type correctness checks. Read on to find out what’s new.

blog & release

2020/05/27

Apache Beam 2.21.0

Kyle Weaver

We are happy to present the new 2.21.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2020/05/08

Beam Summit Digital Is Coming - Register Now!

Pedro Galvan, Matthias Baetens & Maximilian Michels

As some of you are already aware, the 2020 edition of the Beam Summit will be completely digital and free. Beam Summit Digital will take place from August 24th to 28th. The conference will be spread across the course of one week with a couple of hours of program each day.

blog & release

2020/04/15

Apache Beam 2.20.0

Rui Wang

We are happy to present the new 2.20.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2020/02/04

Apache Beam 2.19.0

Boyuan Zhang

We are happy to present the new 2.19.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2020/01/23

Apache Beam 2.18.0

Udi Meiri & Ahmet Altay

We are happy to present the new 2.18.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2020/01/06

Apache Beam 2.17.0

Mikhail Gryzykhin

We are happy to present the new 2.17.0 release of Beam. This release includes both improvements and new functionality. Users of the MongoDbIO connector are encouraged to upgrade to this release to address a security vulnerability. See the download page for this release.

blog & release

2019/10/07

Apache Beam 2.16.0

Mark Liu

We are happy to present the new 2.16.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & gsoc

2019/09/04

Google Summer of Code '19

Tanay Tummalapalli

Google Summer of Code was an amazing learning experience for me. I contributed to open source, learned about Apache Beam’s internals and worked with the best engineers in the world.

blog & release

2019/08/22

Apache Beam 2.15.0

Yifan Zou

We are happy to present the new 2.15.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2019/07/31

Apache Beam 2.14.0

Anton Kedin & Ahmet Altay

We are happy to present the new 2.14.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2019/06/11

Looping timers in Apache Beam

Reza Rokni & Kenneth Knowles

Apache Beam’s primitives let you build expressive data pipelines, suitable for a variety of use cases. One specific use case is the analysis of time series data in which continuous sequences across window boundaries are important. A few fun challenges arise as you tackle this type of data and in this blog we will explore one of those in more detail and make use of the Timer API (blog post) using the “looping timer” pattern.

blog & release

2019/06/07

Apache Beam 2.13.0

Ankur Goenka

We are happy to present the new 2.13.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2019/06/04

Adding new Data Sources to Beam SQL CLI

Pablo Estrada

A new, exciting feature that came to Apache Beam is the ability to use SQL in your pipelines. This is done using Beam’s SqlTransform in Java pipelines. Beam also has a fancy new SQL command line that you can use to query your data interactively, be it Batch or Streaming. If you haven’t tried it, check out https://bit.ly/ExploreBeamSQL. A nice feature of the SQL CLI is that you can use CREATE EXTERNAL TABLE commands to add data sources to be accessed in the CLI. Currently, the CLI supports creating tables from BigQuery, PubSub, Kafka, and text files. In this post, we explore how to add new data sources, so that you will be able to consume data from other Beam sources.

blog

2019/05/30

Apache Beam Katas

Henry Suryawirawan

We are happy to announce Apache Beam Katas, a set of interactive Beam coding exercises (i.e. code katas) that can help you in learning Apache Beam concepts and programming model hands-on.

blog

2019/05/11

Beam community update!

Matthias Baetens

The Apache Beam community in 2019 2019 has already been a busy time for the Apache Beam community. The ASF blog featured our way of community building and we’ve had more Beam meetups around the world. Apache Beam also received the Technology of the Year Award from InfoWorld. As these events happened, we were building up to the 20th anniversary of the Apache Software Foundation. The contributions of the Beam community were a part of Maximilian Michels blog post on the success of the ASF’s open source development model: Success at Apache: What You Need to Know by Maximilian Michels https://t.co/XjtVYgPAHX #Apache #Open #Innovation #Community #people #processes #JustWorks @stadtlegende pic.twitter.com/xSibnyWAMe — Apache - The ASF (@TheASF) 26 maart 2019 In that spirit, let’s have an overview of the things that have happened, what the next few months look like, and how we can foster even more community growth. Meetups We’ve had a flurry of activity, with several meetups in the planning process and more popping up globally over time. As diversity of contributors is a core ASF value, this geographic spread is exciting for the community. Here’s a picture from the latest Apache Beam meetup organized at Lyft in San Francisco: We have more Bay Area meetups coming soon, and the community is looking into kicking off a meetup in Toronto! London had its first meetup of 2019 at the start of April: and Stockholm had its second meetup at the start of May: Big audience for the second @ApacheBeam meetup in Stockholm! Gleb, @kanterov from @SpotifyEng kicking off the first talk with Beam SQL.#ApacheBeamStockholm pic.twitter.com/fDqPPFh2gY — Matthias Baetens 🌆 (@matthiasbaetens) 6 May 2019 Keep an eye out for a meetup in Paris. If you are interested in starting your own meetup, feel free to reach out! Good places to start include our Slack channel, the dev and user mailing lists, or the Apache Beam Twitter. Even if you can’t travel to these meetups, you can stay informed on the happenings of the community. The talks and sessions from previous conferences and meetups are archived on the Apache Beam YouTube channel. If you want your session added to the channel, don’t hesitate to get in touch! Summits The first summit of the year will be held in Berlin: You can find more info on the website and read about the inaugural edition of the Beam Summit Europe here. At these summits, you have the opportunity to meet with other Apache Beam creators and users, get expert advice, learn from the speaker sessions, and participate in workshops. We strongly encourage you to get involved again this year! You can participate in the following ways for the upcoming summit in Europe: 🎫 If you want to secure your ticket to attend the Beam Summit Europe 2019, check our event page. 💸 If you want to make the Summit even more awesome, check out our sponsor booklet! We also launched the CfP for our Beam Summit in North America, which will be held in collaboration with ApacheCon. 🎤 If you want to give a talk, take a look at our CfP. Stay tuned for more information on the summit in North America and Asia. Why community engagement matters Why we need a strong Apache Beam community: We’re receiving lots of code contributions and need committers to review those and help onboard new contributors to the project. We want people to feel a sense of ownership to the project. By fostering this level of engagement, the work becomes even more exciting. A healthy community has a further reach and leads to more growth. More hours can be contributed to the project as we can spread the work and ownership. Why are we organizing these summits: We’d like to give folks a place to meet, congregate, and share ideas. We know that offline interactions often changes the nature of the online ones in a positive manner. Building an active and diverse community is part of the Apache Way. These summits provide an opportunity for us to engage people from different locations, companies, and backgrounds.

blog & release

2019/04/25

Apache Beam 2.12.0

Andrew Pilloud

We are happy to present the new 2.12.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2019/04/25

Apache Beam + Kotlin = ❤️

Harshit Dwivedi

Apache Beam samples are now available in Kotlin!

blog

2019/04/19

Apache Beam is applying to Season of Docs

Aizhamal Nurmamat kyzy

The Apache Beam community is thrilled to announce its application to the first edition of Season of Docs 2019!

blog

2019/03/18

Announcing Beam Summit Site

Aizhamal Nurmamat kyzy

We are thrilled to announce the launch of our new website dedicated to Beam Summits! The beamsummit.org site provides all the information you need towards the upcoming Beam Summits in Europe, Asia and North America in 2019. You will find information about the conference theme, the speakers and sessions, the abstract submission timeline and the registration process, the conference venues and hosting cities - and much more that you will find useful until and during the Beam Summits 2019. We are working to make the website easy to use, so that anyone who is organizing a Beam event can rely on it. You can find the code for it in Github. The pages will be updated on a regular basis, but we also love hearing thoughts from our community! Let us know if you have any questions, comments or suggestions, and help us improve! Also, if you are thinking of organizing a Beam event, please feel free to reach out for support, and to use the code in GitHub as well. We sincerely hope that you like the new Beam Summit website and will find it useful for accessing information. Enjoy browsing around! See you in Berlin! #beamsummit2019.

blog & release

2019/03/05

Apache Beam 2.11.0

Ahmet Altay

We are happy to present the new 2.11.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2019/02/15

Apache Beam 2.10.0

Kenneth Knowles

We are happy to present the new 2.10.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2018/12/13

Apache Beam 2.9.0

Chamikara Jayalath

We are happy to present the new 2.9.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2018/10/31

Inaugural edition of the Beam Summit Europe 2018 - aftermath

Matthias Baetens

Almost 1 month ago, we had the pleasure to welcome the Beam community at Level39 in London for the inaugural edition of the Beam Summit London Summit.

blog & release

2018/10/29

Apache Beam 2.8.0

Ahmet Altay

We are happy to present the new 2.8.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog & release

2018/10/03

Apache Beam 2.7.0

Charles Chen

We are happy to present the new 2.7.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

blog

2018/08/21

Beam Summit Europe 2018

Matthias Baetens

With a growing community of contributors and users, the Apache Beam project is organising the first European Beam Summit. We are happy to invite you to this event, which will take place in London on October 1st and 2nd of 2018.

blog

2018/08/20

A review of input streaming connectors

Leonid Kuligin & Julien Phalip

In this post, you’ll learn about the current state of support for input streaming connectors in Apache Beam. For more context, you’ll also learn about the corresponding state of support in Apache Spark.

blog & release

2018/08/10

Apache Beam 2.6.0

Pablo Estrada & Rafael Fernández

We are glad to present the new 2.6.0 release of Beam. This release includes multiple fixes and new functionality, such as new features in SQL and portability.

blog & release

2018/06/26

Apache Beam 2.5.0

Alexey Romanenko

We are glad to present the new 2.5.0 release of Beam. This release includes multiple fixes and new functionalities.

blog & release

2018/02/19

Apache Beam 2.3.0

Ismaël Mejía

We are glad to present the new 2.3.0 release of Beam. This release includes multiple fixes and new functionalities.

blog

2018/01/09

Apache Beam: A Look Back at 2017

Anand Iyer & Jean-Baptiste Onofré

On January 10, 2017, Apache Beam got promoted as a Top-Level Apache Software Foundation project. It was an important milestone that validated the value of the project, legitimacy of its community, and heralded its growing adoption. In the past year, Apache Beam has been on a phenomenal growth trajectory, with significant growth in its community and feature set. Let us walk you through some of the notable achievements.

blog

2017/08/28

Timely (and Stateful) Processing with Apache Beam

Kenneth Knowles

In a prior blog post, I introduced the basics of stateful processing in Apache Beam, focusing on the addition of state to per-element processing. So-called timely processing complements stateful processing in Beam by letting you set timers to request a (stateful) callback at some point in the future. What can you do with timers in Beam? Here are some examples: You can output data buffered in state after some amount of processing time. You can take special action when the watermark estimates that you have received all data up to a specified point in event time. You can author workflows with timeouts that alter state and emit output in response to the absence of additional input for some period of time. These are just a few possibilities. State and timers together form a powerful programming paradigm for fine-grained control to express a huge variety of workflows. Stateful and timely processing in Beam is portable across data processing engines and integrated with Beam’s unified model of event time windowing in both streaming and batch processing.

blog

2017/08/16

Powerful and modular IO connectors with Splittable DoFn in Apache Beam

Eugene Kirpichov

One of the most important parts of the Apache Beam ecosystem is its quickly growing set of connectors that allow Beam pipelines to read and write data to various data storage systems (“IOs”). Currently, Beam ships over 20 IO connectors with many more in active development. As user demands for IO connectors grew, our work on improving the related Beam APIs (in particular, the Source API) produced an unexpected result: a generalization of Beam’s most basic primitive, DoFn.

blog

2017/05/17

Apache Beam publishes the first stable release

Davor Bonaci & Dan Halperin

The Apache Beam community is pleased to announce the availability of version 2.0.0. This is the first stable release of Apache Beam, signifying a statement from the community that it intends to maintain API stability with all releases for the foreseeable future, and making Beam suitable for enterprise deployment.

blog

2017/03/16

Python SDK released in Apache Beam 0.6.0

Ahmet Altay

Apache Beam’s latest release, version 0.6.0, introduces a new SDK – this time, for the Python programming language. The Python SDK joins the Java SDK as the second implementation of the Beam programming model.

blog

2017/02/13

Stateful processing with Apache Beam

Kenneth Knowles

Beam lets you process unbounded, out-of-order, global-scale data with portable high-level pipelines. Stateful processing is a new feature of the Beam model that expands the capabilities of Beam, unlocking new use cases and new efficiencies. In this post, I will guide you through stateful processing in Beam: how it works, how it fits in with the other features of the Beam model, what you might use it for, and what it looks like in code. Note: This post has been updated in May of 2019, to include Python snippets!

blog

2017/02/01

Media recap of the Apache Beam graduation

Davor Bonaci

One year ago today Apache Beam was accepted into incubation at the Apache Software Foundation. The community’s work over the past year culminated, just over three weeks ago, with an announcement that Apache Beam has successfully graduated as a new Top-Level Project at the foundation. Graduation sparked an additional interest in the project, from corporate endorsements, news articles, interviews, to the volume of traffic to our website and mailing lists.

blog

2017/01/10

Apache Beam established as a new top-level project

Davor Bonaci

Today, the Apache Software Foundation announced that Apache Beam has successfully graduated from incubation, becoming a new Top-Level Project at the foundation and signifying that its “community and products have been well-governed under the foundation’s meritocratic process and principles”.

blog

2017/01/09

Release 0.4.0 adds a runner for Apache Apex

Thomas Weise

The latest release 0.4.0 of Apache Beam adds a new runner for Apache Apex. We are excited to reach this initial milestone and are looking forward to continued collaboration between the Beam and Apex communities to advance the runner.

blog

2016/10/20

Testing Unbounded Pipelines in Apache Beam

Thomas Groh

The Beam Programming Model unifies writing pipelines for Batch and Streaming pipelines. We’ve recently introduced a new PTransform to write tests for pipelines that will be run over unbounded datasets and must handle out-of-order and delayed data.

beam & update

2016/10/11

Strata+Hadoop World and Beam

Jesse Anderson

Tyler Akidau and I gave a three-hour tutorial on Apache Beam at Strata+Hadoop World 2016. We had a plethora of help from our TAs: Kenn Knowles, Reuven Lax, Felipe Hoffa, Slava Chernyak, and Jamie Grier. There were a total of 66 people that attended the session.

blog

2016/08/03

Apache Beam: Six Months in Incubation

Frances Perry

It’s been just over six months since Apache Beam was formally accepted into incubation with the Apache Software Foundation. As a community, we’ve been hard at work getting Beam off the ground.

beam & release

2016/06/15

The first release of Apache Beam!

Davor Bonaci

I’m happy to announce that Apache Beam has officially released its first version – 0.1.0-incubating. This is an exciting milestone for the project, which joined the Apache Software Foundation and the Apache Incubator earlier this year.

blog

2016/06/13

How We Added Windowing to the Apache Flink Batch Runner

Aljoscha Krettek

We recently achieved a major milestone by adding support for windowing to the Apache Flink Batch runner. In this post we would like to explain what this means for users of Apache Beam and highlight some of the implementation details.

blog

2016/05/27

Where's my PCollection.map()?

Robert Bradshaw

Have you ever wondered why Beam has PTransforms for everything instead of having methods on PCollection? Take a look at the history that led to this (and other) design decisions.

blog

2016/05/18

Dynamic work rebalancing for Beam

Dan Halperin

This morning, Eugene and Malo from the Google Cloud Dataflow team posted No shard left behind: dynamic work rebalancing in Google Cloud Dataflow. This article discusses Cloud Dataflow’s solution to the well-known straggler problem.

beam & capability

2016/04/03

Apache Beam Presentation Materials

Frances Perry & Tyler Akidau

Are you interested in giving a presentation about Apache Beam? Perhaps you want to talk about Apache Beam at a local Meetup or a convention. Excellent! The Apache Beam community is excited to expand and grow the community. To help kickstart this process, we are excited to announce an initial set of Apache Beam presentation materials which anyone can use to give a presentation about Apache Beam.

beam & capability

2016/03/17

Clarifying & Formalizing Runner Capabilities

Frances Perry & Tyler Akidau

With initial code drops complete (Dataflow SDK and Runner, Flink Runner, Spark Runner) and expressed interest in runner implementations for Storm, Hadoop, and Gearpump (amongst others), we wanted to start addressing a big question in the Apache Beam (incubating) community: what capabilities will each runner be able to support?

beam, python & sdk

2016/02/25

Dataflow Python SDK is now public!

James Malone

When the Apache Beam project proposed entry into the Apache Incubator the proposal included the Dataflow Java SDK. In the long term, however, Apache Beam aims to support SDKs implemented in multiple languages, such as Python.

beam, update & website

2016/02/22

Apache Beam has a logo!

James Malone

One of the major benefits of Apache Beam is the fact that it unifies both both batch and stream processing into one powerful model. In fact, this unification is so important, the name Beam itself comes from the union of Batch + strEAM = Beam When the project started, we wanted a logo which was both appealing and visually represented this unification.