apache_beam.ml.gcp.naturallanguageml module
- class apache_beam.ml.gcp.naturallanguageml.Document(content: str, type: str | Type = 'PLAIN_TEXT', language_hint: str | None = None, encoding: str | None = 'UTF8', from_gcs: bool = False)[source]
- Bases: - object- Represents the input to - AnnotateTexttransform.- Parameters:
- content (str) – The content of the input or the Google Cloud Storage URI where the file is stored. 
- type (Union[str, google.cloud.language_v1.Document.Type]) – Text type. Possible values are HTML, PLAIN_TEXT. The default value is PLAIN_TEXT. 
- language_hint (Optional[str]) – The language of the text. If not specified, language will be automatically detected. Values should conform to ISO-639-1 standard. 
- encoding (Optional[str]) – Text encoding. Possible values are: NONE, UTF8, UTF16, UTF32. The default value is UTF8. 
- from_gcs (bool) – Whether the content should be interpret as a Google Cloud Storage URI. The default value is - False.
 
 
- apache_beam.ml.gcp.naturallanguageml.AnnotateText(features: Mapping[str, bool] | Features, timeout: float | None = None, metadata: Sequence[Tuple[str, str]] | None = None)[source]
- A - PTransformfor annotating text using the Google Cloud Natural Language API: https://cloud.google.com/natural-language/docs.- Parameters:
- pcoll ( - PCollection) – An input PCollection of- Documentobjects.
- features (Union[Mapping[str, bool], types.AnnotateTextRequest.Features]) – A dictionary of natural language operations to be performed on given text in the following format:: {‘extact_syntax’=True, ‘extract_entities’=True} 
- timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. The timeout applies to each individual retry attempt. 
- metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.