SingleStoreDB I/O
Pipeline options and general information about using and running SingleStoreDB I/O.
Before you start
To use SingleStoreDB I/O, add the Maven artifact dependency to your pom.xml
file.
Additional resources:
Authentication
DataSource configuration is required for configuring SingleStoreIO connection properties.
Create the DataSource configuration:
Where parameters can be:
.create(endpoint)
- Hostname or IP address of the SingleStoreDB in the form host:[port] (port is optional).
- Required parameter.
- Example:
.create("myHost:3306")
.
.withUsername(username)
- SingleStoreDB username.
- Default -
root
. - Example:
.withUsername("USERNAME")
.
.withPassword(password)
- Password of the SingleStoreDB user.
- Default - empty String.
- Example:
.withPassword("PASSWORD")
.
.withDatabase(database)
- Name of the SingleStoreDB database to use.
- Example:
.withDatabase("MY_DATABASE")
.
.withConnectionProperties(connectionProperties)
- List of properties that are used by JDBC Driver.
- The format is “key1=value1;key2=value2;…”.
- A full list of supported properties can be found here.
- Example:
.withConnectionProperties("connectTimeout=30000;useServerPrepStmts=FALSE")
.
Note - .withDatabase(...)
is required for .readWithPartitions()
.
Reading from SingleStoreDB
One of the functions of SingleStoreIO is reading from SingleStoreDB tables. SingleStoreIO supports two types of reading:
- Sequential data reading (
.read()
) - Parallel data reading (
.readWithPartitions()
)
In many cases, parallel data reading is preferred over sequential data reading because of performance reasons.
Sequential data reading
The basic .read()
operation usage is as follows:
Where parameters can be:
.withDataSourceConfiguration(dataSourceConfiguration)
DataSourceConfiguration
object with all information needed to establish a connection to the database. See authentication for more information.- Required parameter.
.withTable(table)
- Table to read data from.
- Example:
.withTable("MY_TABLE")
.
.withQuery(query)
- SQL query to execute.
- Example:
.withTable("SELECT * FROM MY_TABLE")
.
.withStatementPreparator(statementPreparator)
- StatementPreparator object.
.withRowMapper(rowMapper)
- RowMapper object.
- Required parameter.
.withOutputParallelization(outputParallelization)
- Boolean value that indicates whether to reshuffle the result.
- Default -
true
. - Example:
.withOutputParallelization(true)
.
Note - either .withTable(...)
or .withQuery(...)
is required.
Parallel data reading
The basic .readWithPartitions()
operation usage is as follows:
Where parameters can be:
.withDataSourceConfiguration(dataSourceConfiguration)
DataSourceConfiguration
object with all information needed to establish a connection to the database. See DataSource Configuration for more information.- Required parameter.
.withTable(table)
- Table to read data from.
- Example:
.withTable("MY_TABLE")
.
.withQuery(query)
- SQL query to execute.
- Example:
.withTable("SELECT * FROM MY_TABLE")
.
.withRowMapper(rowMapper)
- RowMapper object.
- Required parameter.
Note - either .withTable(...)
or .withQuery(...)
is required.
StatementPreparator
The StatementPreparator
is used by read()
to set the parameters of the PreparedStatement
.
For example:
RowMapper
The RowMapper
is used by read()
and readWithPartitions()
for converting each row of the ResultSet
into an element of the resulting PCollection
.
For example:
Writing to SingleStoreDB tables
One of the functions of SingleStoreIO is writing to SingleStoreDB tables. This transformation enables you to send the user’s PCollection to your SingleStoreDB database. It returns number of rows written by each batch of elements.
The basic .write()
operation usage is as follows:
Where parameters can be:
.withDataSourceConfiguration(dataSourceConfiguration)
DataSourceConfiguration
object with all information needed to establish a connection to the database. See DataSource Configuration for more information.- Required parameter.
.withTable(table)
- Table in which data should be saved.
- Required parameter.
- Example:
.withTable("MY_TABLE")
.
.withBatchSize(batchSize)
- Number of rows loaded by one
LOAD DATA
query. - Default - 100000.
- Example:
.withBatchSize(100000)
.
- Number of rows loaded by one
.withUserDataMapper(userDataMapper)
- UserDataMapper object.
- Required parameter.
UserDataMapper
The UserDataMapper
is required to map data from a PCollection
to an array of String
values before the write()
operation saves the data.
For example:
Last updated on 2024/10/06
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!