Class ElasticsearchIO.DocToBulk

All Implemented Interfaces:
Serializable, HasDisplayData
Enclosing class:
ElasticsearchIO

public abstract static class ElasticsearchIO.DocToBulk extends PTransform<PCollection<String>,PCollection<ElasticsearchIO.Document>>
A PTransform converting docs to their Bulk API counterparts.
See Also:
  • Constructor Details

    • DocToBulk

      public DocToBulk()
  • Method Details

    • withConnectionConfiguration

      public ElasticsearchIO.DocToBulk withConnectionConfiguration(ElasticsearchIO.ConnectionConfiguration connectionConfiguration)
      Provide the Elasticsearch connection configuration object. Only required if withBackendVersion was not used i.e. getBackendVersion() returns null.
      Parameters:
      connectionConfiguration - the Elasticsearch ElasticsearchIO.ConnectionConfiguration object
      Returns:
      the ElasticsearchIO.DocToBulk with connection configuration set
    • withIdFn

      Provide a function to extract the id from the document. This id will be used as the document id in Elasticsearch. Should the function throw an Exception then the batch will fail and the exception propagated.
      Parameters:
      idFn - to extract the document ID
      Returns:
      the ElasticsearchIO.DocToBulk with the function set
    • withIndexFn

      Provide a function to extract the target index from the document allowing for dynamic document routing. Should the function throw an Exception then the batch will fail and the exception propagated.
      Parameters:
      indexFn - to extract the destination index from
      Returns:
      the ElasticsearchIO.DocToBulk with the function set
    • withRoutingFn

      Provide a function to extract the target routing from the document allowing for dynamic document routing. Should the function throw an Exception then the batch will fail and the exception propagated.
      Parameters:
      routingFn - to extract the destination index from
      Returns:
      the ElasticsearchIO.DocToBulk with the function set
    • withTypeFn

      Provide a function to extract the target type from the document allowing for dynamic document routing. Should the function throw an Exception then the batch will fail and the exception propagated. Users are encouraged to consider carefully if multipe types are a sensible model as discussed in this blog.
      Parameters:
      typeFn - to extract the destination index from
      Returns:
      the ElasticsearchIO.DocToBulk with the function set
    • withUsePartialUpdate

      public ElasticsearchIO.DocToBulk withUsePartialUpdate(boolean usePartialUpdate)
      Provide an instruction to control whether partial updates or inserts (default) are issued to Elasticsearch.
      Parameters:
      usePartialUpdate - set to true to issue partial updates
      Returns:
      the ElasticsearchIO.DocToBulk with the partial update control set
    • withAppendOnly

      public ElasticsearchIO.DocToBulk withAppendOnly(boolean appendOnly)
      Provide an instruction to control whether the target index should be considered append-only. For append-only indexes and/or data streams, only create operations will be issued, instead of index, which is the default.

      create fails if a document with the same ID already exists in the target, index adds or replaces a document as necessary. If no ID is provided, both operations are equivalent, unless you are writing to a data stream. Data streams only support the create operation. For more information see the Elasticsearch documentation

      Updates and deletions are not allowed, so related options will be ignored.

      When the documents contain

      Parameters:
      appendOnly - set to true to allow only document appending
      Returns:
      the ElasticsearchIO.DocToBulk with the-append only control set
    • withUpsertScript

      public ElasticsearchIO.DocToBulk withUpsertScript(String source)
      Whether to use scripted updates and what script to use.
      Parameters:
      source - set to the value of the script source, painless lang
      Returns:
      the ElasticsearchIO.DocToBulk with the scripted updates set
    • withDocVersionFn

      Provide a function to extract the doc version from the document. This version number will be used as the document version in Elasticsearch. Should the function throw an Exception then the batch will fail and the exception propagated. Incompatible with update operations and should only be used with withUsePartialUpdate(false)
      Parameters:
      docVersionFn - to extract the document version
      Returns:
      the ElasticsearchIO.DocToBulk with the function set
    • withIsDeleteFn

      Provide a function to extract the target operation either upsert or delete from the document fields allowing dynamic bulk operation decision. While using withIsDeleteFn, it should be taken care that the document's id extraction is defined using the withIdFn function or else IllegalArgumentException is thrown. Should the function throw an Exception then the batch will fail and the exception propagated.
      Parameters:
      isDeleteFn - set to true for deleting the specific document
      Returns:
      the ElasticsearchIO.Write with the function set
    • withDocVersionType

      public ElasticsearchIO.DocToBulk withDocVersionType(String docVersionType)
      Provide a function to extract the doc version from the document. This version number will be used as the document version in Elasticsearch. Should the function throw an Exception then the batch will fail and the exception propagated. Incompatible with update operations and should only be used with withUsePartialUpdate(false)
      Parameters:
      docVersionType - the version type to use, one of ElasticsearchIO.VERSION_TYPES
      Returns:
      the ElasticsearchIO.DocToBulk with the doc version type set
    • withBackendVersion

      public ElasticsearchIO.DocToBulk withBackendVersion(int backendVersion)
      Use to set explicitly which version of Elasticsearch the destination cluster is running. Providing this hint means there is no need for setting withConnectionConfiguration(org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.ConnectionConfiguration). This can also be very useful for testing purposes.

      Note: if the value of @param backendVersion differs from the version the destination cluster is running, behavior is undefined and likely to yield errors.

      Parameters:
      backendVersion - the major version number of the version of Elasticsearch being run in the cluster where documents will be indexed.
      Returns:
      the ElasticsearchIO.DocToBulk with the Elasticsearch major version number set
    • expand

      Description copied from class: PTransform
      Override this method to specify how this PTransform should be expanded on the given InputT.

      NOTE: This method should not be called directly. Instead apply the PTransform should be applied to the InputT using the apply method.

      Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).

      Specified by:
      expand in class PTransform<PCollection<String>,PCollection<ElasticsearchIO.Document>>