@Experimental(value=SOURCE_SINK) public class ElasticsearchIO extends java.lang.Object
ElasticsearchIO.read()
returns a bounded PCollection<String>
representing JSON documents.
To configure the read()
, you have to provide a connection configuration
containing the HTTP address of the instances, an index name and a type. The following example
illustrates options for configuring the source:
pipeline.apply(ElasticsearchIO.read().withConnectionConfiguration(
ElasticsearchIO.ConnectionConfiguration.create("http://host:9200", "my-index", "my-type")
)
The connection configuration also accepts optional configuration: withUsername()
and
withPassword()
.
You can also specify a query on the read()
using withQuery()
.
To write documents to Elasticsearch, use ElasticsearchIO.write()
, which writes JSON documents from a PCollection<String>
(which can be bounded or unbounded).
To configure ElasticsearchIO.write()
, similar to the read, you
have to provide a connection configuration. For instance:
pipeline
.apply(...)
.apply(ElasticsearchIO.write().withConnectionConfiguration(
ElasticsearchIO.ConnectionConfiguration.create("http://host:9200", "my-index", "my-type")
)
Optionally, you can provide withBatchSize()
and withBatchSizeBytes()
to
specify the size of the write batch in number of documents or in bytes.
Optionally, you can provide an ElasticsearchIO.Write.FieldValueExtractFn
using withIdFn()
that will be run to extract the id value out of the provided document rather than
using the document id auto-generated by Elasticsearch.
Optionally, you can provide ElasticsearchIO.Write.FieldValueExtractFn
using withIndexFn()
or withTypeFn()
to enable per-document routing to the target Elasticsearch
index (all versions) and type (version > 6). Support for type routing was removed in
Elasticsearch 6 (see
https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch)
When {withUsePartialUpdate()} is enabled, the input document must contain an id field and
withIdFn()
must be used to allow its extraction by the ElasticsearchIO.
Optionally, withSocketAndRetryTimeout()
can be used to override the default retry
timeout and socket timeout of 30000ms. withConnectTimeout()
can be used to override the
default connect timeout of 1000ms.
Modifier and Type | Class and Description |
---|---|
static class |
ElasticsearchIO.BoundedElasticsearchSource
A
BoundedSource reading from Elasticsearch. |
static class |
ElasticsearchIO.ConnectionConfiguration
A POJO describing a connection configuration to Elasticsearch.
|
static class |
ElasticsearchIO.Read
A
PTransform reading data from Elasticsearch. |
static class |
ElasticsearchIO.RetryConfiguration
A POJO encapsulating a configuration for retry behavior when issuing requests to ES.
|
static class |
ElasticsearchIO.Write
A
PTransform writing data to Elasticsearch. |
Modifier and Type | Method and Description |
---|---|
static ElasticsearchIO.Read |
read() |
static ElasticsearchIO.Write |
write() |
public static ElasticsearchIO.Read read()
public static ElasticsearchIO.Write write()