@Experimental(value=SOURCE_SINK) public class HCatalogIO extends java.lang.Object
HCatalog source supports reading of HCatRecord from a HCatalog managed source, for eg. Hive.
To configure a HCatalog source, you must specify a metastore URI and a table name. Other optional parameters are database & filter For instance:
Map<String, String> configProperties = new HashMap<>();
configProperties.put("hive.metastore.uris","thrift://metastore-host:port");
pipeline
.apply(HCatalogIO.read()
.withConfigProperties(configProperties)
.withDatabase("default") //optional, assumes default if none specified
.withTable("employee")
.withFilter(filterString) //optional, may be specified if the table is partitioned
HCatalog sink supports writing of HCatRecord to a HCatalog managed source, for eg. Hive.
To configure a HCatalog sink, you must specify a metastore URI and a table name. Other optional parameters are database, partition & batchsize The destination table should exist beforehand, the transform does not create a new table if it does not exist For instance:
Map<String, String> configProperties = new HashMap<>();
configProperties.put("hive.metastore.uris","thrift://metastore-host:port");
pipeline
.apply(...)
.apply(HCatalogIO.write()
.withConfigProperties(configProperties)
.withDatabase("default") //optional, assumes default if none specified
.withTable("employee")
.withPartition(partitionValues) //optional, may be specified if the table is partitioned
.withBatchSize(1024L)) //optional, assumes a default batch size of 1024 if none specified
Modifier and Type | Class and Description |
---|---|
static class |
HCatalogIO.Read
A
PTransform to read data using HCatalog. |
static class |
HCatalogIO.Write
A
PTransform to write to a HCatalog managed source. |
Modifier and Type | Method and Description |
---|---|
static HCatalogIO.Read |
read()
Read data from Hive.
|
static HCatalogIO.Write |
write()
Write data to Hive.
|
public static HCatalogIO.Write write()
public static HCatalogIO.Read read()