apache_beam.ml.inference.utils module

Util/helper functions used in apache_beam.ml.inference.

class apache_beam.ml.inference.utils.WatchFilePattern(file_pattern, interval=360, stop_timestamp=Timestamp(9223372036854.775000))[source]

Bases: apache_beam.transforms.ptransform.PTransform

Watches a directory for updates to files matching a given file pattern.

Parameters:
  • file_pattern – The file path to read from as a local file path or a GCS gs:// path. The path can contain glob characters (*, ?, and [...] sets). interval: Interval at which to check for files matching file_pattern in seconds.
  • stop_timestamp – Timestamp after which no more files will be checked.

Note:

  1. Any previously used filenames cannot be reused. If a file is added
    or updated to a previously used filename, this transform will ignore that update. To trigger a model update, always upload a file with unique name.
  2. Initially, before the pipeline startup time, WatchFilePattern expects
    at least one file present that matches the file_pattern.
  3. This transform is supported in streaming mode since
    MatchContinuously produces an unbounded source. Running in batch mode can lead to undesired results or result in pipeline being stuck.
expand(pcoll) → apache_beam.pvalue.PCollection[apache_beam.ml.inference.base.ModelMetadata][apache_beam.ml.inference.base.ModelMetadata][source]