public abstract class DLPDeidentifyText extends PTransform<PCollection<KV<java.lang.String,java.lang.String>>,PCollection<KV<java.lang.String,com.google.privacy.dlp.v2.DeidentifyContentResponse>>>
PTransform connecting to Cloud DLP (https://cloud.google.com/dlp/docs/libraries) and
deidentifying text according to provided settings. The transform supports both columnar delimited
input data (eg. CSV) and unstructured input.
If the headerColumns property is set and a sideinput with table headers is added to the PTransform, delimiter also should be set, else the results will be incorrect. If headerColumns is neither set nor passed as side input, input is assumed to be unstructured.
Either deidentifyTemplateName (String) or deidentifyConfig DeidentifyConfig need to be
set. inspectTemplateName and inspectConfig (InspectConfig are optional.
Batch size defines how big are batches sent to DLP at once in bytes.
The transform consumes KV of Strings (assumed to be filename as key and
contents as value) and outputs KV of String (eg. filename) and DeidentifyContentResponse, which will contain Table of results for the user to consume.
| Modifier and Type | Class and Description |
|---|---|
static class |
DLPDeidentifyText.Builder |
| Modifier and Type | Field and Description |
|---|---|
static java.lang.Integer |
DLP_PAYLOAD_LIMIT_BYTES |
name, resourceHints| Constructor and Description |
|---|
DLPDeidentifyText() |
| Modifier and Type | Method and Description |
|---|---|
PCollection<KV<java.lang.String,com.google.privacy.dlp.v2.DeidentifyContentResponse>> |
expand(PCollection<KV<java.lang.String,java.lang.String>> input)
The transform converts the contents of input PCollection into
Table.Rows and then calls
Cloud DLP service to perform the deidentification according to provided settings. |
abstract java.lang.Integer |
getBatchSizeBytes() |
abstract @Nullable java.lang.String |
getColumnDelimiter() |
abstract @Nullable com.google.privacy.dlp.v2.DeidentifyConfig |
getDeidentifyConfig() |
abstract @Nullable java.lang.String |
getDeidentifyTemplateName() |
abstract @Nullable PCollectionView<java.util.List<java.lang.String>> |
getHeaderColumns() |
abstract @Nullable com.google.privacy.dlp.v2.InspectConfig |
getInspectConfig() |
abstract @Nullable java.lang.String |
getInspectTemplateName() |
abstract java.lang.String |
getProjectId() |
static DLPDeidentifyText.Builder |
newBuilder() |
compose, compose, getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setResourceHints, toString, validate, validatepublic abstract @Nullable java.lang.String getInspectTemplateName()
public abstract @Nullable java.lang.String getDeidentifyTemplateName()
public abstract @Nullable com.google.privacy.dlp.v2.InspectConfig getInspectConfig()
public abstract @Nullable com.google.privacy.dlp.v2.DeidentifyConfig getDeidentifyConfig()
public abstract @Nullable PCollectionView<java.util.List<java.lang.String>> getHeaderColumns()
public abstract @Nullable java.lang.String getColumnDelimiter()
public abstract java.lang.Integer getBatchSizeBytes()
public abstract java.lang.String getProjectId()
public static DLPDeidentifyText.Builder newBuilder()
public PCollection<KV<java.lang.String,com.google.privacy.dlp.v2.DeidentifyContentResponse>> expand(PCollection<KV<java.lang.String,java.lang.String>> input)
Table.Rows and then calls
Cloud DLP service to perform the deidentification according to provided settings.expand in class PTransform<PCollection<KV<java.lang.String,java.lang.String>>,PCollection<KV<java.lang.String,com.google.privacy.dlp.v2.DeidentifyContentResponse>>>input - input PCollection