apache_beam.ml.gcp.naturallanguageml module¶
-
class
apache_beam.ml.gcp.naturallanguageml.Document(content, type='PLAIN_TEXT', language_hint=None, encoding='UTF8', from_gcs=False)[source]¶ Bases:
objectRepresents the input to
AnnotateTexttransform.Parameters: - content (str) – The content of the input or the Google Cloud Storage URI where the file is stored.
- type (Union[str, google.cloud.language.enums.Document.Type]) – Text type. Possible values are HTML, PLAIN_TEXT. The default value is PLAIN_TEXT.
- language_hint (Optional[str]) – The language of the text. If not specified, language will be automatically detected. Values should conform to ISO-639-1 standard.
- encoding (Optional[str]) – Text encoding. Possible values are: NONE, UTF8, UTF16, UTF32. The default value is UTF8.
- from_gcs (bool) – Whether the content should be interpret as a Google Cloud
Storage URI. The default value is
False.
-
apache_beam.ml.gcp.naturallanguageml.AnnotateText(pcoll, features, timeout=None, metadata=None)[source]¶ A
PTransformfor annotating text using the Google Cloud Natural Language API: https://cloud.google.com/natural-language/docs.Parameters: - pcoll (
PCollection) – An input PCollection ofDocumentobjects. - features (Union[Mapping[str, bool], types.AnnotateTextRequest.Features]) –
A dictionary of natural language operations to be performed on given text in the following format:
{'extact_syntax'=True, 'extract_entities'=True}
- timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. The timeout applies to each individual retry attempt.
- metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.
- pcoll (