Overview (Apache Beam 2.15.0)

Packages
Package	Description
org.apache.beam.runners.apex	Implementation of the Beam runner for Apache Apex.
org.apache.beam.runners.dataflow	Provides a Beam runner that executes pipelines on the Google Cloud Dataflow service.
org.apache.beam.runners.dataflow.options	Provides `PipelineOptions` specific to Google Cloud Dataflow.
org.apache.beam.runners.dataflow.util	Provides miscellaneous internal utilities used by the Google Cloud Dataflow runner.
org.apache.beam.runners.direct	Defines the `PipelineOptions.DirectRunner` which executes both Bounded and Unbounded `Pipelines` on the local machine.
org.apache.beam.runners.flink	Internal implementation of the Beam runner for Apache Flink.
org.apache.beam.runners.flink.metrics	Internal metrics implementation of the Beam runner for Apache Flink.
org.apache.beam.runners.fnexecution	Utilities used by runners to interact with the fn execution components of the Beam Portability Framework.
org.apache.beam.runners.fnexecution.artifact	Pipeline execution-time artifact-management services, including abstract implementations of the Artifact Retrieval Service.
org.apache.beam.runners.fnexecution.control	Utilities for a Beam runner to interact with the Fn API `Control Service` via java abstractions.
org.apache.beam.runners.fnexecution.data	Utilities for a Beam runner to interact with the Fn API `Data Service` via java abstractions.
org.apache.beam.runners.fnexecution.environment	Classes used to instantiate and manage SDK harness environments.
org.apache.beam.runners.fnexecution.environment.testing	Test utilities for the environment management package.
org.apache.beam.runners.fnexecution.jobsubmission	Job management services for use in beam runners.
org.apache.beam.runners.fnexecution.logging	Classes used to log informational messages over the `Beam Fn Logging Service`.
org.apache.beam.runners.fnexecution.provisioning	Provision api services.
org.apache.beam.runners.fnexecution.splittabledofn	Utilities for a Beam runner to interact with a remotely running splittable DoFn.
org.apache.beam.runners.fnexecution.state	State API services.
org.apache.beam.runners.fnexecution.translation	Shared utilities for a Beam runner to translate portable pipelines.
org.apache.beam.runners.fnexecution.wire	Wire coders for communications between runner and SDK harness.
org.apache.beam.runners.gearpump	Internal implementation of the Beam runner for Apache Gearpump.
org.apache.beam.runners.gearpump.translators	Gearpump specific translators.
org.apache.beam.runners.gearpump.translators.functions	Gearpump specific wrappers for Beam DoFn.
org.apache.beam.runners.gearpump.translators.io	Gearpump specific wrappers for Beam I/O.
org.apache.beam.runners.gearpump.translators.utils	Utilities for translators.
org.apache.beam.runners.jet	Implementation of the Beam runner for Hazelcast Jet.
org.apache.beam.runners.jet.metrics	Helper classes for implementing metrics in the Hazelcast Jet based runner.
org.apache.beam.runners.jet.processors	Individual DAG node processors used by the Beam runner for Hazelcast Jet.
org.apache.beam.runners.local	Utilities useful when executing a pipeline on a single machine.
org.apache.beam.runners.reference	Support for executing a pipeline locally over the Beam fn API.
org.apache.beam.runners.reference.testing	Testing utilities for the reference runner.
org.apache.beam.runners.spark	Internal implementation of the Beam runner for Apache Spark.
org.apache.beam.runners.spark.aggregators	Provides internal utilities for implementing Beam aggregators using Spark accumulators.
org.apache.beam.runners.spark.coders	Beam coders and coder-related utilities for running on Apache Spark.
org.apache.beam.runners.spark.io	Spark-specific transforms for I/O.
org.apache.beam.runners.spark.metrics	Provides internal utilities for implementing Beam metrics using Spark accumulators.
org.apache.beam.runners.spark.metrics.sink	Spark sinks that supports beam metrics and aggregators.
org.apache.beam.runners.spark.stateful	Spark-specific stateful operators.
org.apache.beam.runners.spark.util	Internal utilities to translate Beam pipelines to Spark.
org.apache.beam.sdk	Provides a simple, powerful model for building both batch and streaming parallel data processing `Pipeline`s.
org.apache.beam.sdk.annotations	Defines annotations used across the SDK.
org.apache.beam.sdk.coders	Defines `Coders` to specify how data is encoded to and decoded from byte strings.
org.apache.beam.sdk.expansion	Contains classes needed to expose transforms to other SDKs.
org.apache.beam.sdk.extensions.gcp.auth	Defines classes related to interacting with `Credentials` for pipeline creation and execution containing Google Cloud Platform components.
org.apache.beam.sdk.extensions.gcp.options	Defines `PipelineOptions` for configuring pipeline execution for Google Cloud Platform components.
org.apache.beam.sdk.extensions.gcp.storage	Defines IO connectors for Google Cloud Storage.
org.apache.beam.sdk.extensions.gcp.util	Defines Google Cloud Platform component utilities that can be used by Beam runners.
org.apache.beam.sdk.extensions.gcp.util.gcsfs	Defines utilities used to interact with Google Cloud Storage.
org.apache.beam.sdk.extensions.jackson	Utilities for parsing and creating JSON serialized objects.
org.apache.beam.sdk.extensions.joinlibrary	Utilities for performing SQL-style joins of keyed `PCollections`.
org.apache.beam.sdk.extensions.protobuf	Defines a `Coder` for Protocol Buffers messages, `ProtoCoder`.
org.apache.beam.sdk.extensions.sketching	Utilities for computing statistical indicators using probabilistic sketches.
org.apache.beam.sdk.extensions.sorter	Utility for performing local sort of potentially large sets of values.
org.apache.beam.sdk.extensions.sql	BeamSQL provides a new interface to run a SQL statement with Beam.
org.apache.beam.sdk.extensions.sql.example	Example how to use Data Catalog table provider.
org.apache.beam.sdk.extensions.sql.example.model	Java classes used to for modeling the examples.
org.apache.beam.sdk.extensions.sql.impl	Implementation classes of BeamSql.
org.apache.beam.sdk.extensions.sql.impl.parser	Beam SQL parsing additions to Calcite SQL.
org.apache.beam.sdk.extensions.sql.impl.planner	`BeamQueryPlanner` is the main interface.
org.apache.beam.sdk.extensions.sql.impl.rel	BeamSQL specified nodes, to replace `RelNode`.
org.apache.beam.sdk.extensions.sql.impl.rule	`RelOptRule` to generate `BeamRelNode`.
org.apache.beam.sdk.extensions.sql.impl.schema	define table schema, to map with Beam IO components.
org.apache.beam.sdk.extensions.sql.impl.transform	`PTransform` used in a BeamSql pipeline.
org.apache.beam.sdk.extensions.sql.impl.transform.agg	Implementation of standard SQL aggregation functions, e.g.
org.apache.beam.sdk.extensions.sql.impl.udf	UDF classes.
org.apache.beam.sdk.extensions.sql.impl.utils	Utility classes.
org.apache.beam.sdk.extensions.sql.meta	Metadata related classes.
org.apache.beam.sdk.extensions.sql.meta.provider	Table providers.
org.apache.beam.sdk.extensions.sql.meta.provider.bigquery	Table schema for BigQuery.
org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog	Table schema for Google Cloud Data Catalog.
org.apache.beam.sdk.extensions.sql.meta.provider.hcatalog	Table schema for HCatalog.
org.apache.beam.sdk.extensions.sql.meta.provider.kafka	Table schema for KafkaIO.
org.apache.beam.sdk.extensions.sql.meta.provider.parquet	Table schema for ParquetIO.
org.apache.beam.sdk.extensions.sql.meta.provider.pubsub	Table schema for `PubsubIO`.
org.apache.beam.sdk.extensions.sql.meta.provider.seqgen	Table schema for streaming sequence generator.
org.apache.beam.sdk.extensions.sql.meta.provider.test	Table schema for in-memory test data.
org.apache.beam.sdk.extensions.sql.meta.provider.text	Table schema for text files.
org.apache.beam.sdk.extensions.sql.meta.store	Meta stores.
org.apache.beam.sdk.fn	The top level package for the Fn Execution Java libraries.
org.apache.beam.sdk.fn.channel	gRPC channel management.
org.apache.beam.sdk.fn.data	Classes to interact with the portability framework data plane.
org.apache.beam.sdk.fn.splittabledofn	Defines utilities related to executing splittable `DoFn`.
org.apache.beam.sdk.fn.stream	gRPC stream management.
org.apache.beam.sdk.fn.test	Utilities for testing use of this package.
org.apache.beam.sdk.fn.windowing	Common utilities related to windowing during execution of a pipeline.
org.apache.beam.sdk.function	Java 8 functional interface extensions.
org.apache.beam.sdk.harness	Utilities for configuring worker environment.
org.apache.beam.sdk.io	Defines transforms for reading and writing common storage formats, including `AvroIO`, and `TextIO`.
org.apache.beam.sdk.io.amqp	Transforms for reading and writing using AMQP 1.0 protocol.
org.apache.beam.sdk.io.aws.dynamodb	Defines IO connectors for Amazon Web Services DynamoDB.
org.apache.beam.sdk.io.aws.options	Defines `PipelineOptions` for configuring pipeline execution for Amazon Web Services components.
org.apache.beam.sdk.io.aws.s3	Defines IO connectors for Amazon Web Services S3.
org.apache.beam.sdk.io.aws.sns	Defines IO connectors for Amazon Web Services SNS.
org.apache.beam.sdk.io.aws.sqs	Defines IO connectors for Amazon Web Services SQS.
org.apache.beam.sdk.io.aws2.dynamodb	Defines IO connectors for Amazon Web Services DynamoDB.
org.apache.beam.sdk.io.aws2.options	Defines `PipelineOptions` for configuring pipeline execution for Amazon Web Services components.
org.apache.beam.sdk.io.cassandra	Transforms for reading and writing from/to Apache Cassandra.
org.apache.beam.sdk.io.clickhouse	Transform for writing to ClickHouse.
org.apache.beam.sdk.io.elasticsearch	Transforms for reading and writing from Elasticsearch.
org.apache.beam.sdk.io.fs	Apache Beam FileSystem interfaces and their default implementations.
org.apache.beam.sdk.io.gcp.bigquery	Defines transforms for reading and writing from Google BigQuery.
org.apache.beam.sdk.io.gcp.bigtable	Defines transforms for reading and writing from Google Cloud Bigtable.
org.apache.beam.sdk.io.gcp.common	Defines common Google Cloud Platform IO support classes.
org.apache.beam.sdk.io.gcp.datastore	Provides an API for reading from and writing to Google Cloud Datastore over different versions of the Cloud Datastore Client libraries.
org.apache.beam.sdk.io.gcp.pubsub	Defines transforms for reading and writing from Google Cloud Pub/Sub.
org.apache.beam.sdk.io.gcp.spanner	Provides an API for reading from and writing to Google Cloud Spanner.
org.apache.beam.sdk.io.gcp.testing	Defines utilities for unit testing Google Cloud Platform components of Apache Beam pipelines.
org.apache.beam.sdk.io.hadoop	Classes shared by Hadoop based IOs.
org.apache.beam.sdk.io.hadoop.format	Defines transforms for writing to Data sinks that implement `HadoopFormatIO` .
org.apache.beam.sdk.io.hbase	Transforms for reading and writing from/to Apache HBase.
org.apache.beam.sdk.io.hcatalog	Transforms for reading and writing using HCatalog.
org.apache.beam.sdk.io.hcatalog.test	Test utilities for HCatalog IO.
org.apache.beam.sdk.io.hdfs	`FileSystem` implementation for any Hadoop `FileSystem`.
org.apache.beam.sdk.io.jdbc	Transforms for reading and writing from JDBC.
org.apache.beam.sdk.io.jms	Transforms for reading and writing from JMS (Java Messaging Service).
org.apache.beam.sdk.io.kafka	Transforms for reading and writing from Apache Kafka.
org.apache.beam.sdk.io.kafka.serialization	Kafka serializers and deserializers.
org.apache.beam.sdk.io.kinesis	Transforms for reading and writing from Amazon Kinesis.
org.apache.beam.sdk.io.mongodb	Transforms for reading and writing from MongoDB.
org.apache.beam.sdk.io.mqtt	Transforms for reading and writing from MQTT.
org.apache.beam.sdk.io.parquet	Transforms for reading and writing from Parquet.
org.apache.beam.sdk.io.range	Provides thread-safe helpers for implementing dynamic work rebalancing in position-based bounded sources.
org.apache.beam.sdk.io.redis	Transforms for reading and writing from Redis.
org.apache.beam.sdk.io.solr	Transforms for reading and writing from/to Solr.
org.apache.beam.sdk.io.tika	Transform for reading and parsing files with Apache Tika.
org.apache.beam.sdk.io.xml	Transforms for reading and writing Xml files.
org.apache.beam.sdk.metrics	Metrics allow exporting information about the execution of a pipeline.
org.apache.beam.sdk.options	Defines `PipelineOptions` for configuring pipeline execution.
org.apache.beam.sdk.schemas	Defines `Schema` and other classes for representing schema'd data in a `Pipeline`.
org.apache.beam.sdk.schemas.annotations	Defines `Schema` and other classes for representing schema'd data in a `Pipeline`.
org.apache.beam.sdk.schemas.parser	Defines utilities for deailing with schemas.
org.apache.beam.sdk.schemas.parser.generated	Defines utilities for deailing with schemas.
org.apache.beam.sdk.schemas.transforms	Defines transforms that work on PCollections with schemas..
org.apache.beam.sdk.schemas.utils	Defines utilities for deailing with schemas.
org.apache.beam.sdk.state	Classes and interfaces for interacting with state.
org.apache.beam.sdk.testing	Defines utilities for unit testing Apache Beam pipelines.
org.apache.beam.sdk.transforms	Defines `PTransform`s for transforming data in a pipeline.
org.apache.beam.sdk.transforms.display	Defines `HasDisplayData` for annotating components which provide `display data` used within UIs and diagnostic tools.
org.apache.beam.sdk.transforms.join	Defines the `CoGroupByKey` transform for joining multiple PCollections.
org.apache.beam.sdk.transforms.splittabledofn	Defines utilities related to splittable `DoFn`.
org.apache.beam.sdk.transforms.windowing	Defines the `Window` transform for dividing the elements in a PCollection into windows, and the `Trigger` for controlling when those elements are output.
org.apache.beam.sdk.values	Defines `PCollection` and other classes for representing data in a `Pipeline`.