apache_beam.ml.gcp.visionml module

A connector for sending API requests to the GCP Vision API.

class apache_beam.ml.gcp.visionml.AnnotateImage(features, retry=None, timeout=120, max_batch_size=None, min_batch_size=None, client_options=None, context_side_input=None, metadata=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

A PTransform for annotating images using the GCP Vision API. ref: https://cloud.google.com/vision/docs/

Batches elements together using util.BatchElements PTransform and sends each batch of elements to the GCP Vision API. Element is a Union[str, bytes] of either an URI (e.g. a GCS URI) or bytes base64-encoded image data. Accepts an AsDict side input that maps each image to an image context.

Parameters:
  • features – (List[vision.Feature]) Required. The Vision API features to detect
  • retry – (google.api_core.retry.Retry) Optional. A retry object used to retry requests. If None is specified (default), requests will not be retried.
  • timeout – (float) Optional. The time in seconds to wait for the response from the Vision API. Default is 120.
  • max_batch_size – (int) Optional. Maximum number of images to batch in the same request to the Vision API. Default is 5 (which is also the Vision API max). This parameter is primarily intended for testing.
  • min_batch_size – (int) Optional. Minimum number of images to batch in the same request to the Vision API. Default is None. This parameter is primarily intended for testing.
  • client_options – (Union[dict, google.api_core.client_options.ClientOptions]) Optional. Client options used to set user options on the client. API Endpoint should be set through client_options.
  • context_side_input

    (beam.pvalue.AsDict) Optional. An AsDict of a PCollection to be passed to the _ImageAnnotateFn as the image context mapping containing additional image context and/or feature-specific parameters. Example usage:

    image_contexts =
      [(''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict,
      ``vision.ImageContext()``]),
      (''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict,
      ``vision.ImageContext()``]),]
    
    context_side_input =
      (
        p
        | "Image contexts" >> beam.Create(image_contexts)
      )
    
    visionml.AnnotateImage(features,
      context_side_input=beam.pvalue.AsDict(context_side_input)))
    
  • metadata – (Optional[Sequence[Tuple[str, str]]]): Optional. Additional metadata that is provided to the method.
MAX_BATCH_SIZE = 5
MIN_BATCH_SIZE = 1
expand(pvalue)[source]
class apache_beam.ml.gcp.visionml.AnnotateImageWithContext(features, retry=None, timeout=120, max_batch_size=None, min_batch_size=None, client_options=None, metadata=None)[source]

Bases: apache_beam.ml.gcp.visionml.AnnotateImage

A PTransform for annotating images using the GCP Vision API. ref: https://cloud.google.com/vision/docs/ Batches elements together using util.BatchElements PTransform and sends each batch of elements to the GCP Vision API.

Element is a tuple of:

(Union[str, bytes],
Optional[``vision.ImageContext``])

where the former is either an URI (e.g. a GCS URI) or bytes base64-encoded image data.

Parameters:
  • features – (List[vision.Feature]) Required. The Vision API features to detect
  • retry – (google.api_core.retry.Retry) Optional. A retry object used to retry requests. If None is specified (default), requests will not be retried.
  • timeout – (float) Optional. The time in seconds to wait for the response from the Vision API. Default is 120.
  • max_batch_size – (int) Optional. Maximum number of images to batch in the same request to the Vision API. Default is 5 (which is also the Vision API max). This parameter is primarily intended for testing.
  • min_batch_size – (int) Optional. Minimum number of images to batch in the same request to the Vision API. Default is None. This parameter is primarily intended for testing.
  • client_options – (Union[dict, google.api_core.client_options.ClientOptions]) Optional. Client options used to set user options on the client. API Endpoint should be set through client_options.
  • metadata – (Optional[Sequence[Tuple[str, str]]]): Optional. Additional metadata that is provided to the method.
expand(pvalue)[source]