Class DLPDeidentifyText
java.lang.Object
org.apache.beam.sdk.transforms.PTransform<PCollection<KV<String,String>>,PCollection<KV<String,com.google.privacy.dlp.v2.DeidentifyContentResponse>>>
org.apache.beam.sdk.extensions.ml.DLPDeidentifyText
- All Implemented Interfaces:
Serializable,HasDisplayData
public abstract class DLPDeidentifyText
extends PTransform<PCollection<KV<String,String>>,PCollection<KV<String,com.google.privacy.dlp.v2.DeidentifyContentResponse>>>
A
PTransform connecting to Cloud DLP (https://cloud.google.com/dlp/docs/libraries) and
deidentifying text according to provided settings. The transform supports both columnar delimited
input data (eg. CSV) and unstructured input.
If the headerColumns property is set and a sideinput with table headers is added to the PTransform, delimiter also should be set, else the results will be incorrect. If headerColumns is neither set nor passed as side input, input is assumed to be unstructured.
Either deidentifyTemplateName (String) or deidentifyConfig DeidentifyConfig need to be
set. inspectTemplateName and inspectConfig (InspectConfig are optional.
Batch size defines how big are batches sent to DLP at once in bytes.
The transform consumes KV of Strings (assumed to be filename as key and
contents as value) and outputs KV of String (eg. filename) and DeidentifyContentResponse, which will contain Table of results for the user to consume.
- See Also:
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsFields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionPCollection<KV<String, com.google.privacy.dlp.v2.DeidentifyContentResponse>> expand(PCollection<KV<String, String>> input) The transform converts the contents of input PCollection intoTable.Rows and then calls Cloud DLP service to perform the deidentification according to provided settings.abstract intReturns size of input elements batch to be sent to Cloud DLP service in one request.Returns delimiter to be used when splitting values from input strings into columns.abstract @Nullable com.google.privacy.dlp.v2.DeidentifyConfigReturns configuration object for deidentification.Returns template name for data deidentification.abstract @Nullable PCollectionView<List<String>> Returns list of column names if the input KV value is a delimited row.abstract @Nullable com.google.privacy.dlp.v2.InspectConfigReturns configuration object for data inspection.Returns template name for data inspection.abstract StringReturns ID of Google Cloud project to be used when deidentifying data.static DLPDeidentifyText.BuilderMethods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setDisplayData, setResourceHints, toString, validate, validate
-
Field Details
-
DLP_PAYLOAD_LIMIT_BYTES
-
-
Constructor Details
-
DLPDeidentifyText
public DLPDeidentifyText()
-
-
Method Details
-
getInspectTemplateName
Returns template name for data inspection. -
getDeidentifyTemplateName
Returns template name for data deidentification. -
getInspectConfig
Returns configuration object for data inspection. If present, supersedes the template settings. -
getDeidentifyConfig
Returns configuration object for deidentification. If present, supersedes the template. -
getHeaderColumns
Returns list of column names if the input KV value is a delimited row. -
getColumnDelimiter
Returns delimiter to be used when splitting values from input strings into columns. -
getBatchSizeBytes
public abstract int getBatchSizeBytes()Returns size of input elements batch to be sent to Cloud DLP service in one request. -
getProjectId
Returns ID of Google Cloud project to be used when deidentifying data. -
newBuilder
-
expand
public PCollection<KV<String,com.google.privacy.dlp.v2.DeidentifyContentResponse>> expand(PCollection<KV<String, String>> input) The transform converts the contents of input PCollection intoTable.Rows and then calls Cloud DLP service to perform the deidentification according to provided settings.- Specified by:
expandin classPTransform<PCollection<KV<String,String>>, PCollection<KV<String, com.google.privacy.dlp.v2.DeidentifyContentResponse>>> - Parameters:
input- input PCollection- Returns:
- PCollection after transformations
-