@Experimental public abstract class DLPReidentifyText extends PTransform<PCollection<KV<java.lang.String,java.lang.String>>,PCollection<KV<java.lang.String,com.google.privacy.dlp.v2.ReidentifyContentResponse>>>
PTransform connecting to Cloud DLP (https://cloud.google.com/dlp/docs/libraries) and
inspecting text for identifying data according to provided settings.
The transform supports both delimited columnar input data and unstructured input.
If the headerColumns property is set and a sideinput with headers is added to the PTransform, delimiter also should be set, else the results will be incorrect. If headerColumns is neither set nor passed as sideinput, input is assumed to be unstructured.
Batch size defines how big are batches sent to DLP at once in bytes.
The transform consumes KV of Strings (assumed to be filename as key and
contents as value) and outputs KV of String (eg. filename) and ReidentifyContentResponse, which will contain Table of results for the user to consume.
Batch size defines how big are batches sent to DLP at once in bytes.
Either reidentifyTemplateName String or reidentifyConfig DeidentifyConfig need
to be set. inspectConfig InspectConfig and inspectTemplateName String are
optional.
Batch size defines how big are batches sent to DLP at once in bytes.
| Modifier and Type | Class and Description |
|---|---|
static class |
DLPReidentifyText.Builder |
| Modifier and Type | Field and Description |
|---|---|
static java.lang.Integer |
DLP_PAYLOAD_LIMIT_BYTES |
name| Constructor and Description |
|---|
DLPReidentifyText() |
| Modifier and Type | Method and Description |
|---|---|
PCollection<KV<java.lang.String,com.google.privacy.dlp.v2.ReidentifyContentResponse>> |
expand(PCollection<KV<java.lang.String,java.lang.String>> input)
The transform converts the contents of input PCollection into
Table.Rows and then calls
Cloud DLP service to perform the reidentification according to provided settings. |
abstract java.lang.Integer |
getBatchSizeBytes() |
abstract java.lang.String |
getColumnDelimiter() |
abstract PCollectionView<java.util.List<java.lang.String>> |
getHeaderColumns() |
abstract com.google.privacy.dlp.v2.InspectConfig |
getInspectConfig() |
abstract java.lang.String |
getInspectTemplateName() |
abstract java.lang.String |
getProjectId() |
abstract com.google.privacy.dlp.v2.DeidentifyConfig |
getReidentifyConfig() |
abstract java.lang.String |
getReidentifyTemplateName() |
static DLPReidentifyText.Builder |
newBuilder() |
compose, compose, getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, populateDisplayData, toString, validate@Nullable public abstract java.lang.String getInspectTemplateName()
@Nullable public abstract java.lang.String getReidentifyTemplateName()
@Nullable public abstract com.google.privacy.dlp.v2.InspectConfig getInspectConfig()
@Nullable public abstract com.google.privacy.dlp.v2.DeidentifyConfig getReidentifyConfig()
@Nullable public abstract java.lang.String getColumnDelimiter()
@Nullable public abstract PCollectionView<java.util.List<java.lang.String>> getHeaderColumns()
public abstract java.lang.Integer getBatchSizeBytes()
public abstract java.lang.String getProjectId()
public static DLPReidentifyText.Builder newBuilder()
public PCollection<KV<java.lang.String,com.google.privacy.dlp.v2.ReidentifyContentResponse>> expand(PCollection<KV<java.lang.String,java.lang.String>> input)
Table.Rows and then calls
Cloud DLP service to perform the reidentification according to provided settings.expand in class PTransform<PCollection<KV<java.lang.String,java.lang.String>>,PCollection<KV<java.lang.String,com.google.privacy.dlp.v2.ReidentifyContentResponse>>>input - input PCollection