apache_beam.io.gcp.datastore.v1.query_splitter module

Implements a Cloud Datastore query splitter.

apache_beam.io.gcp.datastore.v1.query_splitter.get_splits(datastore, query, num_splits, partition=None)[source]

Returns a list of sharded queries for the given Cloud Datastore query.

This will create up to the desired number of splits, however it may return less splits if the desired number of splits is unavailable. This will happen if the number of split points provided by the underlying Datastore is less than the desired number, which will occur if the number of results for the query is too small.

This implementation of the QuerySplitter uses the __scatter__ property to gather random split points for a query.

Note: This implementation is derived from the java query splitter in https://github.com/GoogleCloudPlatform/google-cloud-datastore/blob/master/java/datastore/src/main/java/com/google/datastore/v1/client/QuerySplitterImpl.java

Parameters:
  • datastore – the datastore client.
  • query – the query to split.
  • num_splits – the desired number of splits.
  • partition – the partition the query is running in.
Returns:

A list of split queries, of a max length of num_splits