Class AvroIO.Read<T>

java.lang.Object
org.apache.beam.sdk.transforms.PTransform<PBegin,PCollection<T>>
org.apache.beam.sdk.extensions.avro.io.AvroIO.Read<T>
All Implemented Interfaces:
Serializable, HasDisplayData
Enclosing class:
AvroIO

public abstract static class AvroIO.Read<T> extends PTransform<PBegin,PCollection<T>>
See Also:
  • Constructor Details

    • Read

      public Read()
  • Method Details

    • from

      public AvroIO.Read<T> from(ValueProvider<String> filepattern)
      Reads from the given filename or filepattern.

      If it is known that the filepattern will match a very large number of files (at least tens of thousands), use withHintMatchesManyFiles() for better performance and scalability.

    • from

      public AvroIO.Read<T> from(String filepattern)
    • withMatchConfiguration

      public AvroIO.Read<T> withMatchConfiguration(FileIO.MatchConfiguration matchConfiguration)
    • withEmptyMatchTreatment

      public AvroIO.Read<T> withEmptyMatchTreatment(EmptyMatchTreatment treatment)
      Configures whether or not a filepattern matching no files is allowed.
    • watchForNewFiles

      public AvroIO.Read<T> watchForNewFiles(Duration pollInterval, Watch.Growth.TerminationCondition<String,?> terminationCondition, boolean matchUpdatedFiles)
      Continuously watches for new files matching the filepattern, polling it at the given interval, until the given termination condition is reached. The returned PCollection is unbounded. If matchUpdatedFiles is set, also watches for files with timestamp change.

      This works only in runners supporting splittable DoFn.

    • watchForNewFiles

      public AvroIO.Read<T> watchForNewFiles(Duration pollInterval, Watch.Growth.TerminationCondition<String,?> terminationCondition)
      Same as watchForNewFiles(Duration, TerminationCondition, boolean) with matchUpdatedFiles=false.
    • withHintMatchesManyFiles

      public AvroIO.Read<T> withHintMatchesManyFiles()
      Hints that the filepattern specified in from(String) matches a very large number of files.

      This hint may cause a runner to execute the transform differently, in a way that improves performance for this case, but it may worsen performance if the filepattern matches only a small number of files (e.g., in a runner that supports dynamic work rebalancing, it will happen less efficiently within individual files).

    • withBeamSchemas

      public AvroIO.Read<T> withBeamSchemas(boolean withBeamSchemas)
      If set to true, a Beam schema will be inferred from the AVRO schema. This allows the output to be used by SQL and by the schema-transform library.
    • withCoder

      public AvroIO.Read<T> withCoder(Coder<T> coder)
      Sets a coder for the result of the read function.
    • withDatumReaderFactory

      public AvroIO.Read<T> withDatumReaderFactory(AvroSource.DatumReaderFactory<T> readerFactory)
      Sets a custom AvroSource.DatumReaderFactory for reading. Pass a AvroDatumFactory to also use the factory for the default output AvroCoder
    • expand

      public PCollection<T> expand(PBegin input)
      Description copied from class: PTransform
      Override this method to specify how this PTransform should be expanded on the given InputT.

      NOTE: This method should not be called directly. Instead apply the PTransform should be applied to the InputT using the apply method.

      Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).

      Specified by:
      expand in class PTransform<PBegin,PCollection<T>>
    • populateDisplayData

      public void populateDisplayData(DisplayData.Builder builder)
      Description copied from class: PTransform
      Register display data for the given transform or component.

      populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace of the subcomponent.

      By default, does not register any display data. Implementors may override this method to provide their own display data.

      Specified by:
      populateDisplayData in interface HasDisplayData
      Overrides:
      populateDisplayData in class PTransform<PBegin,PCollection<T>>
      Parameters:
      builder - The builder to populate with display data.
      See Also: