Class FileIO.MatchConfiguration

java.lang.Object
org.apache.beam.sdk.io.FileIO.MatchConfiguration
All Implemented Interfaces:
Serializable, HasDisplayData
Enclosing class:
FileIO

public abstract static class FileIO.MatchConfiguration extends Object implements HasDisplayData, Serializable
Describes configuration for matching filepatterns, such as EmptyMatchTreatment and continuous watching for matching files.
See Also:
  • Constructor Details

    • MatchConfiguration

      public MatchConfiguration()
  • Method Details

    • create

      public static FileIO.MatchConfiguration create(EmptyMatchTreatment emptyMatchTreatment)
    • getEmptyMatchTreatment

      public abstract EmptyMatchTreatment getEmptyMatchTreatment()
    • getMatchUpdatedFiles

      public abstract boolean getMatchUpdatedFiles()
    • getWatchInterval

      public abstract @Nullable Duration getWatchInterval()
    • withEmptyMatchTreatment

      public FileIO.MatchConfiguration withEmptyMatchTreatment(EmptyMatchTreatment treatment)
    • continuously

      public FileIO.MatchConfiguration continuously(Duration interval, Watch.Growth.TerminationCondition<String,?> condition, boolean matchUpdatedFiles)
      Continuously watches for new files at the given interval until the given termination condition is reached, where the input to the condition is the filepattern.

      If matchUpdatedFiles is set, also watches for files with timestamp change, with the watching frequency given by the interval. The pipeline will throw a RuntimeError if timestamp extraction for the matched file has failed, suggesting the timestamp metadata is not available with the IO connector.

      Matching continuously scales poorly, as it is stateful, and requires storing file ids in memory. In addition, because it is memory-only, if a pipeline is restarted, already processed files will be reprocessed. Consider an alternate technique, such as Pub/Sub Notifications when using GCS if possible.

    • continuously

      public FileIO.MatchConfiguration continuously(Duration interval, Watch.Growth.TerminationCondition<String,?> condition)
      Continuously watches for new files at the given interval until the given termination condition is reached, where the input to the condition is the filepattern. To watch also for updated files, please set matchUpdatedFiles as true.
    • populateDisplayData

      public void populateDisplayData(DisplayData.Builder builder)
      Description copied from interface: HasDisplayData
      Register display data for the given transform or component.

      populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace of the subcomponent.

      Specified by:
      populateDisplayData in interface HasDisplayData
      Parameters:
      builder - The builder to populate with display data.
      See Also: