@Experimental public abstract class DLPInspectText extends PTransform<PCollection<KV<java.lang.String,java.lang.String>>,PCollection<KV<java.lang.String,com.google.privacy.dlp.v2.InspectContentResponse>>>
PTransform
connecting to Cloud DLP (https://cloud.google.com/dlp/docs/libraries) and
inspecting text for identifying data according to provided settings. The transform supports both
delimited columnar input data (eg. CSV) and unstructured input.
If the headerColumns property is set and a sideinput with table headers is added to the PTransform, delimiter also should be set, else the results will be incorrect. If headerColumns is neither set nor passed as sideinput, input is assumed to be unstructured.
Batch size defines how big are batches sent to DLP at once in bytes.
The transform consumes KV
of String
s (assumed to be filename as key and
contents as value) and outputs KV
of String
(eg. filename) and InspectContentResponse
, which will contain a list of InspectResult
for the user to consume.
Either inspectTemplateName (String) or inspectConfig InspectConfig
need to be set.
Batch size defines how big are batches sent to DLP at once in bytes.
Modifier and Type | Class and Description |
---|---|
static class |
DLPInspectText.Builder |
Modifier and Type | Field and Description |
---|---|
static java.lang.Integer |
DLP_PAYLOAD_LIMIT_BYTES |
name, resourceHints
Constructor and Description |
---|
DLPInspectText() |
Modifier and Type | Method and Description |
---|---|
PCollection<KV<java.lang.String,com.google.privacy.dlp.v2.InspectContentResponse>> |
expand(PCollection<KV<java.lang.String,java.lang.String>> input)
The transform converts the contents of input PCollection into
Table.Row s and then calls
Cloud DLP service to perform the data inspection according to provided settings. |
abstract java.lang.Integer |
getBatchSizeBytes() |
abstract @Nullable java.lang.String |
getColumnDelimiter() |
abstract @Nullable PCollectionView<java.util.List<java.lang.String>> |
getHeaderColumns() |
abstract @Nullable com.google.privacy.dlp.v2.InspectConfig |
getInspectConfig() |
abstract @Nullable java.lang.String |
getInspectTemplateName() |
abstract java.lang.String |
getProjectId() |
static DLPInspectText.Builder |
newBuilder() |
compose, compose, getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setResourceHints, toString, validate
public abstract @Nullable java.lang.String getInspectTemplateName()
public abstract @Nullable com.google.privacy.dlp.v2.InspectConfig getInspectConfig()
public abstract java.lang.Integer getBatchSizeBytes()
public abstract java.lang.String getProjectId()
public abstract @Nullable java.lang.String getColumnDelimiter()
public abstract @Nullable PCollectionView<java.util.List<java.lang.String>> getHeaderColumns()
public static DLPInspectText.Builder newBuilder()
public PCollection<KV<java.lang.String,com.google.privacy.dlp.v2.InspectContentResponse>> expand(PCollection<KV<java.lang.String,java.lang.String>> input)
Table.Row
s and then calls
Cloud DLP service to perform the data inspection according to provided settings.expand
in class PTransform<PCollection<KV<java.lang.String,java.lang.String>>,PCollection<KV<java.lang.String,com.google.privacy.dlp.v2.InspectContentResponse>>>
input
- input PCollection