apache_beam.io.gcp.bigtableio module

BigTable connector

This module implements writing to BigTable tables. The default mode is to set row data to write to BigTable tables. The syntax supported is described here: https://cloud.google.com/bigtable/docs/quickstart-cbt

BigTable connector can be used as main outputs. A main output (common case) is expected to be massive and will be split into manageable chunks and processed in parallel. In the example below we created a list of rows then passed to the GeneratedDirectRows DoFn to set the Cells and then we call the BigTableWriteFn to insert those generated rows in the table.

main_table = (p
beam.Create(self._generate())
WriteToBigTable(project_id, instance_id, table_id))
class apache_beam.io.gcp.bigtableio.WriteToBigTable(project_id, instance_id, table_id, use_cross_language=False, expansion_service=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

A transform that writes rows to a Bigtable table.

Takes an input PCollection of DirectRow objects containing un-committed mutations. For more information about this row object, visit https://cloud.google.com/python/docs/reference/bigtable/latest/row#class-googlecloudbigtablerowdirectrowrowkey-tablenone

If flag use_cross_language is set to true, this transform will use the multi-language transforms framework to inject the Java native write transform into the pipeline.

Initialize an WriteToBigTable transform.

Parameters:
  • table_id – The ID of the table to write to.
  • instance_id – The ID of the instance where the table resides.
  • project_id – The GCP project ID.
  • use_cross_language – If set to True, will use the Java native transform via cross-language.
  • expansion_service – The address of the expansion service in the case of using cross-language. If no expansion service is provided, will attempt to run the default GCP expansion service.
URN = 'beam:schematransform:org.apache.beam:bigtable_write:v1'
expand(input)[source]
class apache_beam.io.gcp.bigtableio.ReadFromBigtable(project_id, instance_id, table_id, expansion_service=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

Reads rows from Bigtable.

Returns a PCollection of PartialRowData objects, each representing a Bigtable row. For more information about this row object, visit https://cloud.google.com/python/docs/reference/bigtable/latest/row#class-googlecloudbigtablerowpartialrowdatarowkey

Initialize a ReadFromBigtable transform.

Parameters:
  • table_id – The ID of the table to read from.
  • instance_id – The ID of the instance where the table resides.
  • project_id – The GCP project ID.
  • expansion_service – The address of the expansion service. If no expansion service is provided, will attempt to run the default GCP expansion service.
URN = 'beam:schematransform:org.apache.beam:bigtable_read:v1'
expand(input)[source]