Class ElasticsearchIO.Read
- All Implemented Interfaces:
Serializable
,HasDisplayData
- Enclosing class:
ElasticsearchIO
PTransform
reading data from Elasticsearch.- See Also:
-
Field Summary
Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionOverride this method to specify how thisPTransform
should be expanded on the givenInputT
.void
populateDisplayData
(DisplayData.Builder builder) Register display data for the given transform or component.withBatchSize
(long batchSize) Provide a size for the scroll read.withConnectionConfiguration
(ElasticsearchIO.ConnectionConfiguration connectionConfiguration) Provide the Elasticsearch connection configuration object.Include metadata in result json documents.Configures the source to user Point In Time search iteration while reading data from Elasticsearch.withPointInTimeSearchAndSortConfiguration
(String sortConfiguration) Similar tothe default PIT search
but setting a specific sorting configuration which Elasticsearch will use to sort for the results.withPointInTimeSearchAndTimestampSortProperty
(String timestampSortProperty) Similar tothe default PIT search
but setting an existing timestamp based property name which Elasticsearch will use to sort for the results.Provide a query used while reading from Elasticsearch.withQuery
(ValueProvider<String> query) Provide aValueProvider
that provides the query used while reading from Elasticsearch.withScrollKeepalive
(String scrollKeepalive) Provide a scroll keepalive.Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setDisplayData, setResourceHints, toString, validate, validate
-
Constructor Details
-
Read
public Read()
-
-
Method Details
-
withConnectionConfiguration
public ElasticsearchIO.Read withConnectionConfiguration(ElasticsearchIO.ConnectionConfiguration connectionConfiguration) Provide the Elasticsearch connection configuration object.- Parameters:
connectionConfiguration
- aElasticsearchIO.ConnectionConfiguration
describes a connection configuration to Elasticsearch.- Returns:
- a
PTransform
reading data from Elasticsearch.
-
withQuery
Provide a query used while reading from Elasticsearch.- Parameters:
query
- the query. See Query DSL- Returns:
- a
PTransform
reading data from Elasticsearch.
-
withQuery
Provide aValueProvider
that provides the query used while reading from Elasticsearch. This is useful for cases when the query must be dynamic.- Parameters:
query
- the query. See Query DSL- Returns:
- a
PTransform
reading data from Elasticsearch.
-
withMetadata
Include metadata in result json documents. Document source will be under json node _source.- Returns:
- a
PTransform
reading data from Elasticsearch.
-
withScrollKeepalive
Provide a scroll keepalive. See scroll API Default is "5m". Change this only if you get "No search context found" errors. When configuring the read to use Point In Time (PIT) search this configuration is used to set the PIT keep alive.- Parameters:
scrollKeepalive
- keepalive duration of the scroll- Returns:
- a
PTransform
reading data from Elasticsearch.
-
withBatchSize
Provide a size for the scroll read. See scroll API Default is 100. Maximum is 10 000. If documents are small, increasing batch size might improve read performance. If documents are big, you might need to decrease batchSize- Parameters:
batchSize
- number of documents read in each scroll read- Returns:
- a
PTransform
reading data from Elasticsearch.
-
withPointInTimeSearch
Configures the source to user Point In Time search iteration while reading data from Elasticsearch. See Point in time search, using default settings. This iteration mode for searches does not have the same size constrains the Scroll API have (slice counts, batch size or how deep the iteration is). By default this iteration mode will use a@timestamp
named property on the indexed documents to consistently retrieve the data when failures occur on an specific read work.- Returns:
- a
PTransform
reading data from Elasticsearch.
-
withPointInTimeSearchAndTimestampSortProperty
public ElasticsearchIO.Read withPointInTimeSearchAndTimestampSortProperty(String timestampSortProperty) Similar tothe default PIT search
but setting an existing timestamp based property name which Elasticsearch will use to sort for the results.- Parameters:
timestampSortProperty
- a property name found in the read documents containing a timestamp-like value.- Returns:
- a
PTransform
reading data from Elasticsearch.
-
withPointInTimeSearchAndSortConfiguration
Similar tothe default PIT search
but setting a specific sorting configuration which Elasticsearch will use to sort for the results.- Parameters:
sortConfiguration
- the full sorting configuration to be sent to Elasticsearch while iterating on the results.- Returns:
- a
PTransform
reading data from Elasticsearch.
-
expand
Description copied from class:PTransform
Override this method to specify how thisPTransform
should be expanded on the givenInputT
.NOTE: This method should not be called directly. Instead apply the
PTransform
should be applied to theInputT
using theapply
method.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
- Specified by:
expand
in classPTransform<PBegin,
PCollection<String>>
-
populateDisplayData
Description copied from class:PTransform
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData)
. Implementations may callsuper.populateDisplayData(builder)
in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)
to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Overrides:
populateDisplayData
in classPTransform<PBegin,
PCollection<String>> - Parameters:
builder
- The builder to populate with display data.- See Also:
-