apache_beam.io.gcp.bigquery_tools module¶
Tools used by BigQuery sources and sinks.
Classes, constants and functions in this file are experimental and have no backwards compatibility guarantees.
These tools include wrappers and clients to interact with BigQuery APIs.
-
apache_beam.io.gcp.bigquery_tools.
parse_table_schema_from_json
(schema_string)[source]¶ Parse the Table Schema provided as string.
Parameters: schema_string – String serialized table schema, should be a valid JSON. Returns: A TableSchema of the BigQuery export from either the Query or the Table.
-
apache_beam.io.gcp.bigquery_tools.
parse_table_reference
(table, dataset=None, project=None)[source]¶ Parses a table reference into a (project, dataset, table) tuple.
Parameters: - table – The ID of the table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). If dataset argument is None then the table argument must contain the entire table reference: ‘DATASET.TABLE’ or ‘PROJECT:DATASET.TABLE’. This argument can be a bigquery.TableReference instance in which case dataset and project are ignored and the reference is returned as a result. Additionally, for date partitioned tables, appending ‘$YYYYmmdd’ to the table name is supported, e.g. ‘DATASET.TABLE$YYYYmmdd’.
- dataset – The ID of the dataset containing this table or null if the table reference is specified entirely by the table argument.
- project – The ID of the project containing this table or null if the table reference is specified entirely by the table (and possibly dataset) argument.
Returns: A bigquery.TableReference object.
Raises: ValueError
– if the table reference as a string does not match the expected format.
-
class
apache_beam.io.gcp.bigquery_tools.
BigQueryWrapper
(client=None)[source]¶ Bases:
future.types.newobject.newobject
BigQuery client wrapper with utilities for querying.
The wrapper is used to organize all the BigQuery integration points and offer a common place where retry logic for failures can be controlled. In addition it offers various functions used both in sources and sinks (e.g., find and create tables, query a table, etc.).
-
TEMP_TABLE
= 'temp_table_'¶
-
TEMP_DATASET
= 'temp_dataset_'¶
-
unique_row_id
¶ Returns a unique row ID (str) used to avoid multiple insertions.
If the row ID is provided, BigQuery will make a best effort to not insert the same row multiple times for fail and retry scenarios in which the insert request may be issued several times. This comes into play for sinks executed in a local runner.
Returns: a unique row ID string
-
get_query_location
(*args, **kwargs)¶
-
get_table
(*args, **kwargs)¶
-
get_or_create_dataset
(*args, **kwargs)¶
-
get_table_location
(*args, **kwargs)¶
-
create_temporary_dataset
(*args, **kwargs)¶
-
clean_up_temporary_dataset
(*args, **kwargs)¶
-
get_or_create_table
(*args, **kwargs)¶
-
insert_rows
(project_id, dataset_id, table_id, rows)[source]¶ Inserts rows into the specified table.
Parameters: - project_id – The project id owning the table.
- dataset_id – The dataset id owning the table.
- table_id – The table id.
- rows – A list of plain Python dictionaries. Each dictionary is a row and each key in it is the name of a field.
Returns: A tuple (bool, errors). If first element is False then the second element will be a bigquery.InserttErrorsValueListEntry instance containing specific errors.
-
-
class
apache_beam.io.gcp.bigquery_tools.
BigQueryReader
(source, test_bigquery_client=None, use_legacy_sql=True, flatten_results=True, kms_key=None)[source]¶ Bases:
apache_beam.runners.dataflow.native_io.iobase.NativeSourceReader
A reader for a BigQuery source.
-
class
apache_beam.io.gcp.bigquery_tools.
BigQueryWriter
(sink, test_bigquery_client=None, buffer_size=None)[source]¶ Bases:
apache_beam.runners.dataflow.native_io.iobase.NativeSinkWriter
The sink writer for a BigQuerySink.
-
class
apache_beam.io.gcp.bigquery_tools.
RowAsDictJsonCoder
[source]¶ Bases:
apache_beam.coders.coders.Coder
A coder for a table row (represented as a dict) to/from a JSON string.
This is the default coder for sources and sinks if the coder argument is not specified.