Class ElasticsearchIO.DocToBulk
- All Implemented Interfaces:
Serializable
,HasDisplayData
- Enclosing class:
ElasticsearchIO
PTransform
converting docs to their Bulk API counterparts.- See Also:
-
Field Summary
Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionexpand
(PCollection<String> docs) Override this method to specify how thisPTransform
should be expanded on the givenInputT
.withAppendOnly
(boolean appendOnly) Provide an instruction to control whether the target index should be considered append-only.withBackendVersion
(int backendVersion) Use to set explicitly which version of Elasticsearch the destination cluster is running.withConnectionConfiguration
(ElasticsearchIO.ConnectionConfiguration connectionConfiguration) Provide the Elasticsearch connection configuration object.withDocVersionFn
(ElasticsearchIO.Write.FieldValueExtractFn docVersionFn) Provide a function to extract the doc version from the document.withDocVersionType
(String docVersionType) Provide a function to extract the doc version from the document.Provide a function to extract the id from the document.Provide a function to extract the target index from the document allowing for dynamic document routing.Provide a function to extract the target operation either upsert or delete from the document fields allowing dynamic bulk operation decision.Provide a function to extract the target routing from the document allowing for dynamic document routing.Provide a function to extract the target type from the document allowing for dynamic document routing.withUpsertScript
(String source) Whether to use scripted updates and what script to use.withUsePartialUpdate
(boolean usePartialUpdate) Provide an instruction to control whether partial updates or inserts (default) are issued to Elasticsearch.Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setDisplayData, setResourceHints, toString, validate, validate
-
Constructor Details
-
DocToBulk
public DocToBulk()
-
-
Method Details
-
withConnectionConfiguration
public ElasticsearchIO.DocToBulk withConnectionConfiguration(ElasticsearchIO.ConnectionConfiguration connectionConfiguration) Provide the Elasticsearch connection configuration object. Only required if withBackendVersion was not used i.e. getBackendVersion() returns null.- Parameters:
connectionConfiguration
- the ElasticsearchElasticsearchIO.ConnectionConfiguration
object- Returns:
- the
ElasticsearchIO.DocToBulk
with connection configuration set
-
withIdFn
Provide a function to extract the id from the document. This id will be used as the document id in Elasticsearch. Should the function throw an Exception then the batch will fail and the exception propagated.- Parameters:
idFn
- to extract the document ID- Returns:
- the
ElasticsearchIO.DocToBulk
with the function set
-
withIndexFn
Provide a function to extract the target index from the document allowing for dynamic document routing. Should the function throw an Exception then the batch will fail and the exception propagated.- Parameters:
indexFn
- to extract the destination index from- Returns:
- the
ElasticsearchIO.DocToBulk
with the function set
-
withRoutingFn
Provide a function to extract the target routing from the document allowing for dynamic document routing. Should the function throw an Exception then the batch will fail and the exception propagated.- Parameters:
routingFn
- to extract the destination index from- Returns:
- the
ElasticsearchIO.DocToBulk
with the function set
-
withTypeFn
Provide a function to extract the target type from the document allowing for dynamic document routing. Should the function throw an Exception then the batch will fail and the exception propagated. Users are encouraged to consider carefully if multipe types are a sensible model as discussed in this blog.- Parameters:
typeFn
- to extract the destination index from- Returns:
- the
ElasticsearchIO.DocToBulk
with the function set
-
withUsePartialUpdate
Provide an instruction to control whether partial updates or inserts (default) are issued to Elasticsearch.- Parameters:
usePartialUpdate
- set to true to issue partial updates- Returns:
- the
ElasticsearchIO.DocToBulk
with the partial update control set
-
withAppendOnly
Provide an instruction to control whether the target index should be considered append-only. For append-only indexes and/or data streams, onlycreate
operations will be issued, instead ofindex
, which is the default.create
fails if a document with the same ID already exists in the target,index
adds or replaces a document as necessary. If no ID is provided, both operations are equivalent, unless you are writing to a data stream. Data streams only support thecreate
operation. For more information see the Elasticsearch documentationUpdates and deletions are not allowed, so related options will be ignored.
When the documents contain
- Parameters:
appendOnly
- set to true to allow only document appending- Returns:
- the
ElasticsearchIO.DocToBulk
with the-append only control set
-
withUpsertScript
Whether to use scripted updates and what script to use.- Parameters:
source
- set to the value of the script source, painless lang- Returns:
- the
ElasticsearchIO.DocToBulk
with the scripted updates set
-
withDocVersionFn
public ElasticsearchIO.DocToBulk withDocVersionFn(ElasticsearchIO.Write.FieldValueExtractFn docVersionFn) Provide a function to extract the doc version from the document. This version number will be used as the document version in Elasticsearch. Should the function throw an Exception then the batch will fail and the exception propagated. Incompatible with update operations and should only be used with withUsePartialUpdate(false)- Parameters:
docVersionFn
- to extract the document version- Returns:
- the
ElasticsearchIO.DocToBulk
with the function set
-
withIsDeleteFn
public ElasticsearchIO.DocToBulk withIsDeleteFn(ElasticsearchIO.Write.BooleanFieldValueExtractFn isDeleteFn) Provide a function to extract the target operation either upsert or delete from the document fields allowing dynamic bulk operation decision. While using withIsDeleteFn, it should be taken care that the document's id extraction is defined using the withIdFn function or else IllegalArgumentException is thrown. Should the function throw an Exception then the batch will fail and the exception propagated.- Parameters:
isDeleteFn
- set to true for deleting the specific document- Returns:
- the
ElasticsearchIO.Write
with the function set
-
withDocVersionType
Provide a function to extract the doc version from the document. This version number will be used as the document version in Elasticsearch. Should the function throw an Exception then the batch will fail and the exception propagated. Incompatible with update operations and should only be used with withUsePartialUpdate(false)- Parameters:
docVersionType
- the version type to use, one ofElasticsearchIO.VERSION_TYPES
- Returns:
- the
ElasticsearchIO.DocToBulk
with the doc version type set
-
withBackendVersion
Use to set explicitly which version of Elasticsearch the destination cluster is running. Providing this hint means there is no need for settingwithConnectionConfiguration(org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.ConnectionConfiguration)
. This can also be very useful for testing purposes.Note: if the value of @param backendVersion differs from the version the destination cluster is running, behavior is undefined and likely to yield errors.
- Parameters:
backendVersion
- the major version number of the version of Elasticsearch being run in the cluster where documents will be indexed.- Returns:
- the
ElasticsearchIO.DocToBulk
with the Elasticsearch major version number set
-
expand
Description copied from class:PTransform
Override this method to specify how thisPTransform
should be expanded on the givenInputT
.NOTE: This method should not be called directly. Instead apply the
PTransform
should be applied to theInputT
using theapply
method.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
- Specified by:
expand
in classPTransform<PCollection<String>,
PCollection<ElasticsearchIO.Document>>
-