apache_beam.io.gcp.datastore.v1new.query_splitter module

Implements a Cloud Datastore query splitter.

For internal use only. No backwards compatibility guarantees.

exception apache_beam.io.gcp.datastore.v1new.query_splitter.QuerySplitterError[source]

Bases: Exception

Top-level error type.

exception apache_beam.io.gcp.datastore.v1new.query_splitter.SplitNotPossibleError[source]

Bases: QuerySplitterError

Raised when some parameter of the query does not allow splitting.

apache_beam.io.gcp.datastore.v1new.query_splitter.get_splits(client, query, num_splits)[source]

Returns a list of sharded queries for the given Cloud Datastore query.

This will create up to the desired number of splits, however it may return less splits if the desired number of splits is unavailable. This will happen if the number of split points provided by the underlying Datastore is less than the desired number, which will occur if the number of results for the query is too small.

This implementation of the QuerySplitter uses the __scatter__ property to gather random split points for a query.

Note: This implementation is derived from the java query splitter in https://github.com/GoogleCloudPlatform/google-cloud-datastore/blob/master/java/datastore/src/main/java/com/google/datastore/v1/client/QuerySplitterImpl.java

Parameters:
  • client – the datastore client.

  • query – the query to split.

  • num_splits – the desired number of splits.

Returns:

A list of split queries, of a max length of num_splits

Raises:

QuerySplitterError – if split could not be performed owing to query or split parameters.