Class ContextualTextIO.Read

java.lang.Object
org.apache.beam.sdk.transforms.PTransform<PBegin,PCollection<Row>>
org.apache.beam.sdk.io.contextualtextio.ContextualTextIO.Read
All Implemented Interfaces:
Serializable, HasDisplayData
Enclosing class:
ContextualTextIO

public abstract static class ContextualTextIO.Read extends PTransform<PBegin,PCollection<Row>>
Implementation of ContextualTextIO.read().
See Also:
  • Constructor Details

    • Read

      public Read()
  • Method Details

    • from

      public ContextualTextIO.Read from(String filepattern)
      Reads text from the file(s) with the given filename or filename pattern.

      This can be a local path (if running locally), or a Google Cloud Storage filename or filename pattern of the form "gs://<bucket>/<filepath>" (if running locally or using remote execution service).

      Standard Java Filesystem glob patterns ("*", "?", "[..]") are supported.

      If it is known that the filepattern will match a very large number of files (at least tens of thousands), use withHintMatchesManyFiles() for better performance and scalability.

    • from

      public ContextualTextIO.Read from(ValueProvider<String> filepattern)
      Same as from(filepattern), but accepting a ValueProvider.
    • withMatchConfiguration

      public ContextualTextIO.Read withMatchConfiguration(FileIO.MatchConfiguration matchConfiguration)
    • withHasMultilineCSVRecords

      public ContextualTextIO.Read withHasMultilineCSVRecords(Boolean hasMultilineCSVRecords)
      When reading RFC4180 CSV files that have values that span multiple lines, set this to true. Note: this reduces the read performance (see: ContextualTextIO).
    • withCompression

      public ContextualTextIO.Read withCompression(Compression compression)
      Reads from input sources using the specified compression type.

      If no compression type is specified, the default is Compression.AUTO.

    • withHintMatchesManyFiles

      public ContextualTextIO.Read withHintMatchesManyFiles()
      Hints that the filepattern specified in from(String) matches a very large number of files.

      This hint may cause a runner to execute the transform differently, in a way that improves performance for this case, but it may worsen performance if the filepattern matches only a small number of files (e.g., in a runner that supports dynamic work rebalancing, it will happen less efficiently within individual files).

    • withRecordNumMetadata

      public ContextualTextIO.Read withRecordNumMetadata()
      Allows the user to opt into getting recordNums associated with each record. This option is only supported with default triggers.

      When set to true, it will introduce a grouping step to assemble the recordNums for each record, which will increase the resources used by the pipeline.

      Use this when you need metadata like fileNames and you need processed position/order information.

    • withEmptyMatchTreatment

      public ContextualTextIO.Read withEmptyMatchTreatment(EmptyMatchTreatment treatment)
    • withDelimiter

      public ContextualTextIO.Read withDelimiter(byte[] delimiter)
      Set the custom delimiter to be used in place of the default ones ('\r', '\n' or '\r\n').
    • expand

      public PCollection<Row> expand(PBegin input)
      Description copied from class: PTransform
      Override this method to specify how this PTransform should be expanded on the given InputT.

      NOTE: This method should not be called directly. Instead apply the PTransform should be applied to the InputT using the apply method.

      Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).

      Specified by:
      expand in class PTransform<PBegin,PCollection<Row>>
    • getSource

      protected FileBasedSource<Row> getSource()
    • populateDisplayData

      public void populateDisplayData(DisplayData.Builder builder)
      Description copied from class: PTransform
      Register display data for the given transform or component.

      populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace of the subcomponent.

      By default, does not register any display data. Implementors may override this method to provide their own display data.

      Specified by:
      populateDisplayData in interface HasDisplayData
      Overrides:
      populateDisplayData in class PTransform<PBegin,PCollection<Row>>
      Parameters:
      builder - The builder to populate with display data.
      See Also: