apache_beam.ml.gcp.naturallanguageml module¶
-
class
apache_beam.ml.gcp.naturallanguageml.
Document
(content, type='PLAIN_TEXT', language_hint=None, encoding='UTF8', from_gcs=False)[source]¶ Bases:
object
Represents the input to
AnnotateText
transform.Parameters: - content (str) – The content of the input or the Google Cloud Storage URI where the file is stored.
- type (Union[str, google.cloud.language_v1.Document.Type]) – Text type. Possible values are HTML, PLAIN_TEXT. The default value is PLAIN_TEXT.
- language_hint (Optional[str]) – The language of the text. If not specified, language will be automatically detected. Values should conform to ISO-639-1 standard.
- encoding (Optional[str]) – Text encoding. Possible values are: NONE, UTF8, UTF16, UTF32. The default value is UTF8.
- from_gcs (bool) – Whether the content should be interpret as a Google Cloud
Storage URI. The default value is
False
.
-
apache_beam.ml.gcp.naturallanguageml.
AnnotateText
(pcoll, features, timeout=None, metadata=None)[source]¶ A
PTransform
for annotating text using the Google Cloud Natural Language API: https://cloud.google.com/natural-language/docs.Parameters: - pcoll (
PCollection
) – An input PCollection ofDocument
objects. - features (Union[Mapping[str, bool], types.AnnotateTextRequest.Features]) – A dictionary of natural language operations to be performed on given text in the following format:: {‘extact_syntax’=True, ‘extract_entities’=True}
- timeout (Optional[float]) – The amount of time, in seconds, to wait for the request to complete. The timeout applies to each individual retry attempt.
- metadata (Optional[Sequence[Tuple[str, str]]]) – Additional metadata that is provided to the method.
- pcoll (