Class ClickHouseIO

java.lang.Object
org.apache.beam.sdk.io.clickhouse.ClickHouseIO

public class ClickHouseIO extends Object
An IO to write to ClickHouse.

Writing to ClickHouse

To write to ClickHouse, use write(String, String), which writes elements from input PCollection. It's required that your ClickHouse cluster already has table you are going to insert into.


 pipeline
   .apply(...)
   .apply(
     ClickHouseIO.<POJO>write("jdbc:clickhouse:localhost:8123/default", "my_table"));
 

Optionally, you can provide connection settings, for instance, specify insert block size with ClickHouseIO.Write.withMaxInsertBlockSize(long), or configure number of retries with ClickHouseIO.Write.withMaxRetries(int).

Deduplication

Deduplication is performed by ClickHouse if inserting to ReplicatedMergeTree or Distributed table on top of ReplicatedMergeTree. Without replication, inserting into regular MergeTree can produce duplicates, if insert fails, and then successfully retries. However, each block is inserted atomically, and you can configure block size with ClickHouseIO.Write.withMaxInsertBlockSize(long).

Deduplication is performed using checksums of inserted blocks.

Mapping between Beam and ClickHouse types

ClickHouse Beam
TableSchema.TypeName.FLOAT32 Schema.TypeName.FLOAT
TableSchema.TypeName.FLOAT64 Schema.TypeName.DOUBLE
TableSchema.TypeName.FIXEDSTRING FixedBytes
TableSchema.TypeName.INT8 Schema.TypeName.BYTE
TableSchema.TypeName.INT16 Schema.TypeName.INT16
TableSchema.TypeName.INT32 Schema.TypeName.INT32
TableSchema.TypeName.INT64 Schema.TypeName.INT64
TableSchema.TypeName.STRING Schema.TypeName.STRING
TableSchema.TypeName.UINT8 Schema.TypeName.INT16
TableSchema.TypeName.UINT16 Schema.TypeName.INT32
TableSchema.TypeName.UINT32 Schema.TypeName.INT64
TableSchema.TypeName.UINT64 Schema.TypeName.INT64
TableSchema.TypeName.DATE Schema.TypeName.DATETIME
TableSchema.TypeName.DATETIME Schema.TypeName.DATETIME
TableSchema.TypeName.ARRAY Schema.TypeName.ARRAY
TableSchema.TypeName.ENUM8 Schema.TypeName.STRING
TableSchema.TypeName.ENUM16 Schema.TypeName.STRING
TableSchema.TypeName.BOOL Schema.TypeName.BOOLEAN
TableSchema.TypeName.TUPLE Schema.TypeName.ROW
Nullable row columns are supported through Nullable type in ClickHouse. Low cardinality hint is supported through LowCardinality DataType in ClickHouse.

Nested rows should be unnested using Select.flattenedSchema(). Type casting should be done using Cast before ClickHouseIO.

  • Field Details

    • DEFAULT_MAX_INSERT_BLOCK_SIZE

      public static final long DEFAULT_MAX_INSERT_BLOCK_SIZE
      See Also:
    • DEFAULT_MAX_RETRIES

      public static final int DEFAULT_MAX_RETRIES
      See Also:
    • DEFAULT_MAX_CUMULATIVE_BACKOFF

      public static final Duration DEFAULT_MAX_CUMULATIVE_BACKOFF
    • DEFAULT_INITIAL_BACKOFF

      public static final Duration DEFAULT_INITIAL_BACKOFF
  • Constructor Details

    • ClickHouseIO

      public ClickHouseIO()
  • Method Details