apache_beam.ml.gcp.naturallanguageml module¶
- 
class 
apache_beam.ml.gcp.naturallanguageml.Document(content, type='PLAIN_TEXT', language_hint=None, encoding='UTF8', from_gcs=False)[source]¶ Bases:
objectRepresents the input to
AnnotateTexttransform.Parameters: - content (str) – The content of the input or the Google Cloud Storage URI where the file is stored.
 - type (Union[str, google.cloud.language.enums.Document.Type]) – Text type. Possible values are HTML, PLAIN_TEXT. The default value is PLAIN_TEXT.
 - language_hint (Optional[str]) – The language of the text. If not specified, language will be automatically detected. Values should conform to ISO-639-1 standard.
 - encoding (Optional[str]) – Text encoding. Possible values are: NONE, UTF8, UTF16, UTF32. The default value is UTF8.
 - from_gcs (bool) – Whether the content should be interpret as a Google Cloud
Storage URI. The default value is 
False. 
- 
apache_beam.ml.gcp.naturallanguageml.AnnotateText(pcoll, features, timeout=None, metadata=None)[source]¶ A
PTransformfor annotating text using the Google Cloud Natural Language API: https://cloud.google.com/natural-language/docs.Parameters: - pcoll (
PCollection) – An input PCollection ofDocumentobjects. - features (Union[Mapping[str, bool], types.AnnotateTextRequest.Features]) – 
A dictionary of natural language operations to be performed on given text in the following format:
{'extact_syntax'=True, 'extract_entities'=True}
 - timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. The timeout applies to each individual retry attempt.
 - metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.
 
- pcoll (