public abstract class TextRowCountEstimator
extends java.lang.Object
Modifier and Type | Class and Description |
---|---|
static class |
TextRowCountEstimator.Builder
Builder for
TextRowCountEstimator . |
static class |
TextRowCountEstimator.LimitNumberOfFiles
This strategy stops sampling if we sample enough number of bytes.
|
static class |
TextRowCountEstimator.LimitNumberOfTotalBytes
This strategy stops sampling when total number of sampled bytes are more than some threshold.
|
static class |
TextRowCountEstimator.NoEstimationException
An exception that will be thrown if the estimator cannot get an estimation of the number of
lines.
|
static class |
TextRowCountEstimator.SampleAllFiles
This strategy samples all the files.
|
static interface |
TextRowCountEstimator.SamplingStrategy
Sampling Strategy shows us when should we stop reading further files.
|
Constructor and Description |
---|
TextRowCountEstimator() |
Modifier and Type | Method and Description |
---|---|
static TextRowCountEstimator.Builder |
builder() |
java.lang.Double |
estimateRowCount(PipelineOptions pipelineOptions)
Estimates the number of non empty rows.
|
abstract Compression |
getCompression() |
abstract byte[] |
getDelimiters() |
abstract FileIO.ReadMatches.DirectoryTreatment |
getDirectoryTreatment() |
abstract EmptyMatchTreatment |
getEmptyMatchTreatment() |
abstract java.lang.String |
getFilePattern() |
abstract long |
getNumSampledBytesPerFile() |
abstract TextRowCountEstimator.SamplingStrategy |
getSamplingStrategy() |
abstract int |
getSkipHeaderLines() |
public abstract long getNumSampledBytesPerFile()
public abstract byte[] getDelimiters()
public abstract int getSkipHeaderLines()
public abstract java.lang.String getFilePattern()
public abstract Compression getCompression()
public abstract TextRowCountEstimator.SamplingStrategy getSamplingStrategy()
public abstract EmptyMatchTreatment getEmptyMatchTreatment()
public abstract FileIO.ReadMatches.DirectoryTreatment getDirectoryTreatment()
public static TextRowCountEstimator.Builder builder()
public java.lang.Double estimateRowCount(PipelineOptions pipelineOptions) throws java.io.IOException, TextRowCountEstimator.NoEstimationException
TextRowCountEstimator.NoEstimationException
- if all the sampled lines are empty and we have not read all the
lines in the matched files.java.io.IOException