Class ElasticsearchIO.Read

java.lang.Object
org.apache.beam.sdk.transforms.PTransform<PBegin,PCollection<String>>
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.Read
All Implemented Interfaces:
Serializable, HasDisplayData
Enclosing class:
ElasticsearchIO

public abstract static class ElasticsearchIO.Read extends PTransform<PBegin,PCollection<String>>
A PTransform reading data from Elasticsearch.
See Also:
  • Constructor Details

    • Read

      public Read()
  • Method Details

    • withConnectionConfiguration

      public ElasticsearchIO.Read withConnectionConfiguration(ElasticsearchIO.ConnectionConfiguration connectionConfiguration)
      Provide the Elasticsearch connection configuration object.
      Parameters:
      connectionConfiguration - a ElasticsearchIO.ConnectionConfiguration describes a connection configuration to Elasticsearch.
      Returns:
      a PTransform reading data from Elasticsearch.
    • withQuery

      public ElasticsearchIO.Read withQuery(String query)
      Provide a query used while reading from Elasticsearch.
      Parameters:
      query - the query. See Query DSL
      Returns:
      a PTransform reading data from Elasticsearch.
    • withQuery

      public ElasticsearchIO.Read withQuery(ValueProvider<String> query)
      Provide a ValueProvider that provides the query used while reading from Elasticsearch. This is useful for cases when the query must be dynamic.
      Parameters:
      query - the query. See Query DSL
      Returns:
      a PTransform reading data from Elasticsearch.
    • withMetadata

      public ElasticsearchIO.Read withMetadata()
      Include metadata in result json documents. Document source will be under json node _source.
      Returns:
      a PTransform reading data from Elasticsearch.
    • withScrollKeepalive

      public ElasticsearchIO.Read withScrollKeepalive(String scrollKeepalive)
      Provide a scroll keepalive. See scroll API Default is "5m". Change this only if you get "No search context found" errors. When configuring the read to use Point In Time (PIT) search this configuration is used to set the PIT keep alive.
      Parameters:
      scrollKeepalive - keepalive duration of the scroll
      Returns:
      a PTransform reading data from Elasticsearch.
    • withBatchSize

      public ElasticsearchIO.Read withBatchSize(long batchSize)
      Provide a size for the scroll read. See scroll API Default is 100. Maximum is 10 000. If documents are small, increasing batch size might improve read performance. If documents are big, you might need to decrease batchSize
      Parameters:
      batchSize - number of documents read in each scroll read
      Returns:
      a PTransform reading data from Elasticsearch.
    • withPointInTimeSearch

      public ElasticsearchIO.Read withPointInTimeSearch()
      Configures the source to user Point In Time search iteration while reading data from Elasticsearch. See Point in time search, using default settings. This iteration mode for searches does not have the same size constrains the Scroll API have (slice counts, batch size or how deep the iteration is). By default this iteration mode will use a @timestamp named property on the indexed documents to consistently retrieve the data when failures occur on an specific read work.
      Returns:
      a PTransform reading data from Elasticsearch.
    • withPointInTimeSearchAndTimestampSortProperty

      public ElasticsearchIO.Read withPointInTimeSearchAndTimestampSortProperty(String timestampSortProperty)
      Similar to the default PIT search but setting an existing timestamp based property name which Elasticsearch will use to sort for the results.
      Parameters:
      timestampSortProperty - a property name found in the read documents containing a timestamp-like value.
      Returns:
      a PTransform reading data from Elasticsearch.
    • withPointInTimeSearchAndSortConfiguration

      public ElasticsearchIO.Read withPointInTimeSearchAndSortConfiguration(String sortConfiguration)
      Similar to the default PIT search but setting a specific sorting configuration which Elasticsearch will use to sort for the results.
      Parameters:
      sortConfiguration - the full sorting configuration to be sent to Elasticsearch while iterating on the results.
      Returns:
      a PTransform reading data from Elasticsearch.
    • expand

      public PCollection<String> expand(PBegin input)
      Description copied from class: PTransform
      Override this method to specify how this PTransform should be expanded on the given InputT.

      NOTE: This method should not be called directly. Instead apply the PTransform should be applied to the InputT using the apply method.

      Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).

      Specified by:
      expand in class PTransform<PBegin,PCollection<String>>
    • populateDisplayData

      public void populateDisplayData(DisplayData.Builder builder)
      Description copied from class: PTransform
      Register display data for the given transform or component.

      populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace of the subcomponent.

      By default, does not register any display data. Implementors may override this method to provide their own display data.

      Specified by:
      populateDisplayData in interface HasDisplayData
      Overrides:
      populateDisplayData in class PTransform<PBegin,PCollection<String>>
      Parameters:
      builder - The builder to populate with display data.
      See Also: