apache_beam.ml.inference.utils module

Util/helper functions used in apache_beam.ml.inference.

class apache_beam.ml.inference.utils.WatchFilePattern(file_pattern, interval=360, stop_timestamp=Timestamp(9223372036854.775000))[source]

Bases: PTransform

Watches a directory for updates to files matching a given file pattern.

Parameters:
  • file_pattern – The file path to read from as a local file path or a GCS gs:// path. The path can contain glob characters (*, ?, and [...] sets). interval: Interval at which to check for files matching file_pattern in seconds.

  • stop_timestamp – Timestamp after which no more files will be checked.

Note:

  1. Any previously used filenames cannot be reused. If a file is added

    or updated to a previously used filename, this transform will ignore that update. To trigger a model update, always upload a file with unique name.

  2. Initially, before the pipeline startup time, WatchFilePattern expects

    at least one file present that matches the file_pattern.

  3. This transform is supported in streaming mode since

    MatchContinuously produces an unbounded source. Running in batch mode can lead to undesired results or result in pipeline being stuck.

expand(pcoll) PCollection[ModelMetadata][source]