Class FileReadSchemaTransformConfiguration

java.lang.Object
org.apache.beam.sdk.io.fileschematransform.FileReadSchemaTransformConfiguration

@DefaultSchema(AutoValueSchema.class) public abstract class FileReadSchemaTransformConfiguration extends Object
  • Field Details

    • VALID_PROVIDERS

      public static final Set<String> VALID_PROVIDERS
  • Constructor Details

    • FileReadSchemaTransformConfiguration

      public FileReadSchemaTransformConfiguration()
  • Method Details

    • builder

    • getFormat

      @SchemaFieldDescription("The format of the file(s) to read. Possible values are \"lines\", \"avro\", \"parquet\", \"json\".") public abstract String getFormat()
      The format of the file(s) to read.

      Possible values are: `"lines"`, `"avro"`, `"parquet"`, `"json"`

    • getFilepattern

      @SchemaFieldDescription("The filepattern used to match and read files. May instead use an input PCollection<Row> of filepatterns. To do so, each Row must have a \"filepattern\" String field containing the filepattern.") @Nullable public abstract String getFilepattern()
      The filepattern used to match and read files.

      May instead use an input PCollection of filepatterns. To do so, each Row must have a "filepattern" String field containing the filepattern.

    • getSafeFilepattern

      public String getSafeFilepattern()
    • getSchema

      @SchemaFieldDescription("The schema used by sources to deserialize data and create Beam Rows. May provide either a String representation of the schema or a single path to a file that contains the schema.") @Nullable public abstract String getSchema()
      The schema used by sources to deserialize data and create Beam Rows.

      May provide either a String representation of the schema or a single path to a file that contains the schema.

    • getSafeSchema

      public String getSafeSchema()
    • getPollIntervalMillis

      @SchemaFieldDescription("The time, in milliseconds, to wait before polling for new files. This will set the pipeline to be a streaming pipeline that continuously watches for new files.Note: This only polls for new files. New updates to an existing file will not be watched for.") @Nullable public abstract Long getPollIntervalMillis()
      The time, in milliseconds, to wait before polling for new files.

      This will set the pipeline to be a streaming pipeline that continuously watches for new files.

      Note: This only polls for new files. New updates to an existing file will not be watched for.

    • getTerminateAfterSecondsSinceNewOutput

      @SchemaFieldDescription("If no new files are found after this many seconds, this transform will cease to watch for new files. The default is to never terminate. To set this parameter, a poll interval must also be provided.") @Nullable public abstract Long getTerminateAfterSecondsSinceNewOutput()
      If no new files are found after this many seconds, this transform will cease to watch for new files.

      The default is to never terminate. To set this parameter, a poll interval must also be provided.