Managed I/O Connectors

Beam’s new Managed API streamlines how you use existing I/Os, offering both simplicity and powerful enhancements. I/Os are now configured through a lightweight, consistent interface: a simple configuration map with a unified API that spans multiple connectors.

With Managed I/O, runners gain deeper insight into each I/O’s structure and intent. This allows the runner to optimize performance, adjust behavior dynamically, or even replace the I/O with a more efficient or updated implementation behind the scenes.

For example, the DataflowRunner can seamlessly upgrade a Managed transform to its latest SDK version, automatically applying bug fixes and new features (no manual updates or user intervention required!)

Supported SDKs

The Managed API is directly accessible through the Java and Python SDKs.

Additionally, some SDKs use the Managed API internally. For example, the Iceberg connector used in Beam YAML and Beam SQL is invoked via the Managed API under the hood.

Available Configurations

Note: required configuration fields are bolded.

Connector Name	Read Configuration	Write Configuration
KAFKA	bootstrap_servers (`str`) topic (`str`) allow_duplicates (`boolean`) confluent_schema_registry_subject (`str`) confluent_schema_registry_url (`str`) consumer_config_updates (`map[str, str]`) file_descriptor_path (`str`) format (`str`) message_name (`str`) offset_deduplication (`boolean`) redistribute_by_record_key (`boolean`) redistribute_num_keys (`int32`) redistributed (`boolean`) schema (`str`)	bootstrap_servers (`str`) format (`str`) topic (`str`) file_descriptor_path (`str`) message_name (`str`) producer_config_updates (`map[str, str]`) schema (`str`)
ICEBERG	table (`str`) catalog_name (`str`) catalog_properties (`map[str, str]`) config_properties (`map[str, str]`) drop (`list[str]`) filter (`str`) keep (`list[str]`)	table (`str`) catalog_name (`str`) catalog_properties (`map[str, str]`) config_properties (`map[str, str]`) direct_write_byte_limit (`int32`) drop (`list[str]`) keep (`list[str]`) only (`str`) partition_fields (`list[str]`) table_properties (`map[str, str]`) triggering_frequency_seconds (`int32`)
ICEBERG_CDC	table (`str`) catalog_name (`str`) catalog_properties (`map[str, str]`) config_properties (`map[str, str]`) drop (`list[str]`) filter (`str`) from_snapshot (`int64`) from_timestamp (`int64`) keep (`list[str]`) poll_interval_seconds (`int32`) starting_strategy (`str`) streaming (`boolean`) to_snapshot (`int64`) to_timestamp (`int64`)	Unavailable
BIGQUERY	kms_key (`str`) query (`str`) row_restriction (`str`) fields (`list[str]`) table (`str`)	table (`str`) drop (`list[str]`) keep (`list[str]`) kms_key (`str`) only (`str`) triggering_frequency_seconds (`int64`)
POSTGRES	jdbc_url (`str`) connection_init_sql (`list[str]`) connection_properties (`str`) disable_auto_commit (`boolean`) driver_class_name (`str`) driver_jars (`str`) fetch_size (`int32`) jdbc_type (`str`) location (`str`) num_partitions (`int32`) output_parallelization (`boolean`) partition_column (`str`) password (`str`) read_query (`str`) username (`str`)	jdbc_url (`str`) autosharding (`boolean`) batch_size (`int64`) connection_init_sql (`list[str]`) connection_properties (`str`) driver_class_name (`str`) driver_jars (`str`) jdbc_type (`str`) location (`str`) password (`str`) username (`str`) write_statement (`str`)
SQLSERVER	jdbc_url (`str`) connection_init_sql (`list[str]`) connection_properties (`str`) disable_auto_commit (`boolean`) driver_class_name (`str`) driver_jars (`str`) fetch_size (`int32`) jdbc_type (`str`) location (`str`) num_partitions (`int32`) output_parallelization (`boolean`) partition_column (`str`) password (`str`) read_query (`str`) username (`str`)	jdbc_url (`str`) autosharding (`boolean`) batch_size (`int64`) connection_init_sql (`list[str]`) connection_properties (`str`) driver_class_name (`str`) driver_jars (`str`) jdbc_type (`str`) location (`str`) password (`str`) username (`str`) write_statement (`str`)
MYSQL	jdbc_url (`str`) connection_init_sql (`list[str]`) connection_properties (`str`) disable_auto_commit (`boolean`) driver_class_name (`str`) driver_jars (`str`) fetch_size (`int32`) jdbc_type (`str`) location (`str`) num_partitions (`int32`) output_parallelization (`boolean`) partition_column (`str`) password (`str`) read_query (`str`) username (`str`)	jdbc_url (`str`) autosharding (`boolean`) batch_size (`int64`) connection_init_sql (`list[str]`) connection_properties (`str`) driver_class_name (`str`) driver_jars (`str`) jdbc_type (`str`) location (`str`) password (`str`) username (`str`) write_statement (`str`)

Configuration Details

`KAFKA` Write

Configuration	Type	Description
bootstrap_servers	`str`	A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. \| Format: host1:port1,host2:port2,...
format	`str`	The encoding format for the data stored in Kafka. Valid options are: RAW,JSON,AVRO,PROTO
topic	`str`	n/a
file_descriptor_path	`str`	The path to the Protocol Buffer File Descriptor Set file. This file is used for schema definition and message serialization.
message_name	`str`	The name of the Protocol Buffer message to be used for schema extraction and data conversion.
producer_config_updates	`map[str, str]`	A list of key-value pairs that act as configuration parameters for Kafka producers. Most of these configurations will not be needed, but if you need to customize your Kafka producer, you may use this. See a detailed list: https://docs.confluent.io/platform/current/installation/configuration/producer-configs.html
schema	`str`	n/a

`KAFKA` Read

Configuration	Type	Description
bootstrap_servers	`str`	A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form `host1:port1,host2:port2,...`
topic	`str`	n/a
allow_duplicates	`boolean`	If the Kafka read allows duplicates.
confluent_schema_registry_subject	`str`	n/a
confluent_schema_registry_url	`str`	n/a
consumer_config_updates	`map[str, str]`	A list of key-value pairs that act as configuration parameters for Kafka consumers. Most of these configurations will not be needed, but if you need to customize your Kafka consumer, you may use this. See a detailed list: https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html
file_descriptor_path	`str`	The path to the Protocol Buffer File Descriptor Set file. This file is used for schema definition and message serialization.
format	`str`	The encoding format for the data stored in Kafka. Valid options are: RAW,STRING,AVRO,JSON,PROTO
message_name	`str`	The name of the Protocol Buffer message to be used for schema extraction and data conversion.
offset_deduplication	`boolean`	If the redistribute is using offset deduplication mode.
redistribute_by_record_key	`boolean`	If the redistribute keys by the Kafka record key.
redistribute_num_keys	`int32`	The number of keys for redistributing Kafka inputs.
redistributed	`boolean`	If the Kafka read should be redistributed.
schema	`str`	The schema in which the data is encoded in the Kafka topic. For AVRO data, this is a schema defined with AVRO schema syntax (https://avro.apache.org/docs/1.10.2/spec.html#schemas). For JSON data, this is a schema defined with JSON-schema syntax (https://json-schema.org/). If a URL to Confluent Schema Registry is provided, then this field is ignored, and the schema is fetched from Confluent Schema Registry.

`ICEBERG` Read

Configuration	Type	Description
table	`str`	Identifier of the Iceberg table.
catalog_name	`str`	Name of the catalog containing the table.
catalog_properties	`map[str, str]`	Properties used to set up the Iceberg catalog.
config_properties	`map[str, str]`	Properties passed to the Hadoop Configuration.
drop	`list[str]`	A subset of column names to exclude from reading. If null or empty, all columns will be read.
filter	`str`	SQL-like predicate to filter data at scan time. Example: "id > 5 AND status = 'ACTIVE'". Uses Apache Calcite syntax: https://calcite.apache.org/docs/reference.html
keep	`list[str]`	A subset of column names to read exclusively. If null or empty, all columns will be read.

`ICEBERG` Write

Configuration	Type	Description
table	`str`	A fully-qualified table identifier. You may also provide a template to write to multiple dynamic destinations, for example: `dataset.my_{col1}_{col2.nested}_table`.
catalog_name	`str`	Name of the catalog containing the table.
catalog_properties	`map[str, str]`	Properties used to set up the Iceberg catalog.
config_properties	`map[str, str]`	Properties passed to the Hadoop Configuration.
direct_write_byte_limit	`int32`	For a streaming pipeline, sets the limit for lifting bundles into the direct write path.
drop	`list[str]`	A list of field names to drop from the input record before writing. Is mutually exclusive with 'keep' and 'only'.
keep	`list[str]`	A list of field names to keep in the input record. All other fields are dropped before writing. Is mutually exclusive with 'drop' and 'only'.
only	`str`	The name of a single record field that should be written. Is mutually exclusive with 'keep' and 'drop'.
partition_fields	`list[str]`	Fields used to create a partition spec that is applied when tables are created. For a field 'foo', the available partition transforms are: `foo` `truncate(foo, N)` `bucket(foo, N)` `hour(foo)` `day(foo)` `month(foo)` `year(foo)` `void(foo)` For more information on partition transforms, please visit https://iceberg.apache.org/spec/#partition-transforms.
table_properties	`map[str, str]`	Iceberg table properties to be set on the table when it is created. For more information on table properties, please visit https://iceberg.apache.org/docs/latest/configuration/#table-properties.
triggering_frequency_seconds	`int32`	For a streaming pipeline, sets the frequency at which snapshots are produced.

`ICEBERG_CDC` Read

Configuration	Type	Description
table	`str`	Identifier of the Iceberg table.
catalog_name	`str`	Name of the catalog containing the table.
catalog_properties	`map[str, str]`	Properties used to set up the Iceberg catalog.
config_properties	`map[str, str]`	Properties passed to the Hadoop Configuration.
drop	`list[str]`	A subset of column names to exclude from reading. If null or empty, all columns will be read.
filter	`str`	SQL-like predicate to filter data at scan time. Example: "id > 5 AND status = 'ACTIVE'". Uses Apache Calcite syntax: https://calcite.apache.org/docs/reference.html
from_snapshot	`int64`	Starts reading from this snapshot ID (inclusive).
from_timestamp	`int64`	Starts reading from the first snapshot (inclusive) that was created after this timestamp (in milliseconds).
keep	`list[str]`	A subset of column names to read exclusively. If null or empty, all columns will be read.
poll_interval_seconds	`int32`	The interval at which to poll for new snapshots. Defaults to 60 seconds.
starting_strategy	`str`	The source's starting strategy. Valid options are: "earliest" or "latest". Can be overriden by setting a starting snapshot or timestamp. Defaults to earliest for batch, and latest for streaming.
streaming	`boolean`	Enables streaming reads, where source continuously polls for snapshots forever.
to_snapshot	`int64`	Reads up to this snapshot ID (inclusive).
to_timestamp	`int64`	Reads up to the latest snapshot (inclusive) created before this timestamp (in milliseconds).

`BIGQUERY` Read

Configuration	Type	Description
kms_key	`str`	Use this Cloud KMS key to encrypt your data
query	`str`	The SQL query to be executed to read from the BigQuery table.
row_restriction	`str`	Read only rows that match this filter, which must be compatible with Google standard SQL. This is not supported when reading via query.
fields	`list[str]`	Read only the specified fields (columns) from a BigQuery table. Fields may not be returned in the order specified. If no value is specified, then all fields are returned. Example: "col1, col2, col3"
table	`str`	The fully-qualified name of the BigQuery table to read from. Format: [${PROJECT}:]${DATASET}.${TABLE}

`BIGQUERY` Write

Configuration	Type	Description
table	`str`	The bigquery table to write to. Format: [${PROJECT}:]${DATASET}.${TABLE}
drop	`list[str]`	A list of field names to drop from the input record before writing. Is mutually exclusive with 'keep' and 'only'.
keep	`list[str]`	A list of field names to keep in the input record. All other fields are dropped before writing. Is mutually exclusive with 'drop' and 'only'.
kms_key	`str`	Use this Cloud KMS key to encrypt your data
only	`str`	The name of a single record field that should be written. Is mutually exclusive with 'keep' and 'drop'.
triggering_frequency_seconds	`int64`	Determines how often to 'commit' progress into BigQuery. Default is every 5 seconds.

`POSTGRES` Write

Configuration	Type	Description
jdbc_url	`str`	Connection URL for the JDBC sink.
autosharding	`boolean`	If true, enables using a dynamically determined number of shards to write.
batch_size	`int64`	n/a
connection_init_sql	`list[str]`	Sets the connection init sql statements used by the Driver. Only MySQL and MariaDB support this.
connection_properties	`str`	Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
driver_class_name	`str`	Name of a Java Driver class to use to connect to the JDBC source. For example, "com.mysql.jdbc.Driver".
driver_jars	`str`	Comma separated path(s) for the JDBC driver jar(s). This can be a local path or GCS (gs://) path.
jdbc_type	`str`	Type of JDBC source. When specified, an appropriate default Driver will be packaged with the transform. One of mysql, postgres, oracle, or mssql.
location	`str`	Name of the table to write to.
password	`str`	Password for the JDBC source.
username	`str`	Username for the JDBC source.
write_statement	`str`	SQL query used to insert records into the JDBC sink.

`POSTGRES` Read

Configuration	Type	Description
jdbc_url	`str`	Connection URL for the JDBC source.
connection_init_sql	`list[str]`	Sets the connection init sql statements used by the Driver. Only MySQL and MariaDB support this.
connection_properties	`str`	Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
disable_auto_commit	`boolean`	Whether to disable auto commit on read. Defaults to true if not provided. The need for this config varies depending on the database platform. Informix requires this to be set to false while Postgres requires this to be set to true.
driver_class_name	`str`	Name of a Java Driver class to use to connect to the JDBC source. For example, "com.mysql.jdbc.Driver".
driver_jars	`str`	Comma separated path(s) for the JDBC driver jar(s). This can be a local path or GCS (gs://) path.
fetch_size	`int32`	This method is used to override the size of the data that is going to be fetched and loaded in memory per every database call. It should ONLY be used if the default value throws memory errors.
jdbc_type	`str`	Type of JDBC source. When specified, an appropriate default Driver will be packaged with the transform. One of mysql, postgres, oracle, or mssql.
location	`str`	Name of the table to read from.
num_partitions	`int32`	The number of partitions
output_parallelization	`boolean`	Whether to reshuffle the resulting PCollection so results are distributed to all workers.
partition_column	`str`	Name of a column of numeric type that will be used for partitioning.
password	`str`	Password for the JDBC source.
read_query	`str`	SQL query used to query the JDBC source.
username	`str`	Username for the JDBC source.

`SQLSERVER` Read

Configuration	Type	Description
jdbc_url	`str`	Connection URL for the JDBC source.
connection_init_sql	`list[str]`	Sets the connection init sql statements used by the Driver. Only MySQL and MariaDB support this.
connection_properties	`str`	Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
disable_auto_commit	`boolean`	Whether to disable auto commit on read. Defaults to true if not provided. The need for this config varies depending on the database platform. Informix requires this to be set to false while Postgres requires this to be set to true.
driver_class_name	`str`	Name of a Java Driver class to use to connect to the JDBC source. For example, "com.mysql.jdbc.Driver".
driver_jars	`str`	Comma separated path(s) for the JDBC driver jar(s). This can be a local path or GCS (gs://) path.
fetch_size	`int32`	This method is used to override the size of the data that is going to be fetched and loaded in memory per every database call. It should ONLY be used if the default value throws memory errors.
jdbc_type	`str`	Type of JDBC source. When specified, an appropriate default Driver will be packaged with the transform. One of mysql, postgres, oracle, or mssql.
location	`str`	Name of the table to read from.
num_partitions	`int32`	The number of partitions
output_parallelization	`boolean`	Whether to reshuffle the resulting PCollection so results are distributed to all workers.
partition_column	`str`	Name of a column of numeric type that will be used for partitioning.
password	`str`	Password for the JDBC source.
read_query	`str`	SQL query used to query the JDBC source.
username	`str`	Username for the JDBC source.

`SQLSERVER` Write

Configuration	Type	Description
jdbc_url	`str`	Connection URL for the JDBC sink.
autosharding	`boolean`	If true, enables using a dynamically determined number of shards to write.
batch_size	`int64`	n/a
connection_init_sql	`list[str]`	Sets the connection init sql statements used by the Driver. Only MySQL and MariaDB support this.
connection_properties	`str`	Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
driver_class_name	`str`	Name of a Java Driver class to use to connect to the JDBC source. For example, "com.mysql.jdbc.Driver".
driver_jars	`str`	Comma separated path(s) for the JDBC driver jar(s). This can be a local path or GCS (gs://) path.
jdbc_type	`str`	Type of JDBC source. When specified, an appropriate default Driver will be packaged with the transform. One of mysql, postgres, oracle, or mssql.
location	`str`	Name of the table to write to.
password	`str`	Password for the JDBC source.
username	`str`	Username for the JDBC source.
write_statement	`str`	SQL query used to insert records into the JDBC sink.

`MYSQL` Read

Configuration	Type	Description
jdbc_url	`str`	Connection URL for the JDBC source.
connection_init_sql	`list[str]`	Sets the connection init sql statements used by the Driver. Only MySQL and MariaDB support this.
connection_properties	`str`	Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
disable_auto_commit	`boolean`	Whether to disable auto commit on read. Defaults to true if not provided. The need for this config varies depending on the database platform. Informix requires this to be set to false while Postgres requires this to be set to true.
driver_class_name	`str`	Name of a Java Driver class to use to connect to the JDBC source. For example, "com.mysql.jdbc.Driver".
driver_jars	`str`	Comma separated path(s) for the JDBC driver jar(s). This can be a local path or GCS (gs://) path.
fetch_size	`int32`	This method is used to override the size of the data that is going to be fetched and loaded in memory per every database call. It should ONLY be used if the default value throws memory errors.
jdbc_type	`str`	Type of JDBC source. When specified, an appropriate default Driver will be packaged with the transform. One of mysql, postgres, oracle, or mssql.
location	`str`	Name of the table to read from.
num_partitions	`int32`	The number of partitions
output_parallelization	`boolean`	Whether to reshuffle the resulting PCollection so results are distributed to all workers.
partition_column	`str`	Name of a column of numeric type that will be used for partitioning.
password	`str`	Password for the JDBC source.
read_query	`str`	SQL query used to query the JDBC source.
username	`str`	Username for the JDBC source.

`MYSQL` Write

Configuration	Type	Description
jdbc_url	`str`	Connection URL for the JDBC sink.
autosharding	`boolean`	If true, enables using a dynamically determined number of shards to write.
batch_size	`int64`	n/a
connection_init_sql	`list[str]`	Sets the connection init sql statements used by the Driver. Only MySQL and MariaDB support this.
connection_properties	`str`	Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
driver_class_name	`str`	Name of a Java Driver class to use to connect to the JDBC source. For example, "com.mysql.jdbc.Driver".
driver_jars	`str`	Comma separated path(s) for the JDBC driver jar(s). This can be a local path or GCS (gs://) path.
jdbc_type	`str`	Type of JDBC source. When specified, an appropriate default Driver will be packaged with the transform. One of mysql, postgres, oracle, or mssql.
location	`str`	Name of the table to write to.
password	`str`	Password for the JDBC source.
username	`str`	Username for the JDBC source.
write_statement	`str`	SQL query used to insert records into the JDBC sink.

Last updated on 2025/12/31

Have you found everything you were looking for?

Was it all useful and clear? Is there anything that you would like to change? Let us know!

Managed I/O Connectors

Supported SDKs

Available Configurations

Configuration Details

KAFKA Write

KAFKA Read

ICEBERG Read

ICEBERG Write

ICEBERG_CDC Read

BIGQUERY Read

BIGQUERY Write

POSTGRES Write

POSTGRES Read

SQLSERVER Read

SQLSERVER Write

MYSQL Read

MYSQL Write