Apache Beam is applying to Season of Docs

The Apache Beam community is thrilled to announce its application to the first edition of Season of Docs 2019!

Season of Docs 2019 flyer

Season of Docs is a unique program that pairs technical writers with open source mentors to contribute to open source. This creates an opportunity to introduce the technical writer to an open source community and provide guidance while the writer works on a real world open source project. We, in the Apache Beam community, would love to take this chance and invite technical writers to collaborate with us, and help us improve our documentation in many ways.

Apache Beam does have help from excellent technical writers, but the documentation needs of the project often exceed their bandwidth. This is why we are excited about this program.

After discussing ideas in the community, we have been able to find mentors, and frame two ideas that we think would be a great fit for an incoming tech writer to tackle. We hope you will find this opportunity interesting - and if you do, please get in touch by emailing the Apache Beam mailing list at dev@beam.apache.org (you will need to subscribe first by emailing to dev-subscribe@beam.apache.org).

The project ideas available in Apache Beam are described below. Please take a look and ask any questions that you may have. We will be very happy to help you get onboarded with the project.

Project ideas

Deployment of Flink and Spark Clusters for use with Portable Beam

The Apache Beam vision has been to provide a framework for users to write and execute pipelines on the programming language of your choice, and the runner of your choice. As the reality of Beam has evolved towards this vision, the way in which Beam is run on top of runners such as Apache Spark and Apache Flink has changed.

These changes are documented in the wiki and in design documents, and are accessible for Beam contributors; but they are not available in the user-facing documentation. This has been a barrier of adoption for other users of Beam.

This project involves improving the Flink Runner page to include strategies to deploy Beam on a few different environments: A Kubernetes cluster, a Google Cloud Dataproc cluster, and an AWS EMR cluster. There are other places in the documentation that should be updated in this regard, such as the Python streaming section, and the set of supported features.

After working on the Flink Runner, then similar updates should be made to the Spark Runner page, and the getting started documentation.

The runner comparison page / capability matrix update

Beam maintains a capability matrix to track which Beam features are supported by which set of language SDKs + Runners. This project involves a number of corrections and improvements to the capability matrix; followed by a few larger set of changes, involving:

  • Plain english summaries for each runner’s support of the Beam model.
  • A paragraph-length description of the production-readiness for each runner.
  • Comparisons for non-model differences between runners.
  • Comparison for support of the portability framework for each runner.

Thank you, and we are looking forward to hearing from you!