Class DLPInspectText
java.lang.Object
org.apache.beam.sdk.transforms.PTransform<PCollection<KV<String,String>>,PCollection<KV<String,com.google.privacy.dlp.v2.InspectContentResponse>>>
org.apache.beam.sdk.extensions.ml.DLPInspectText
- All Implemented Interfaces:
Serializable
,HasDisplayData
public abstract class DLPInspectText
extends PTransform<PCollection<KV<String,String>>,PCollection<KV<String,com.google.privacy.dlp.v2.InspectContentResponse>>>
A
PTransform
connecting to Cloud DLP (https://cloud.google.com/dlp/docs/libraries) and
inspecting text for identifying data according to provided settings. The transform supports both
delimited columnar input data (eg. CSV) and unstructured input.
If the headerColumns property is set and a sideinput with table headers is added to the PTransform, delimiter also should be set, else the results will be incorrect. If headerColumns is neither set nor passed as sideinput, input is assumed to be unstructured.
Batch size defines how big are batches sent to DLP at once in bytes.
The transform consumes KV
of String
s (assumed to be filename as key and
contents as value) and outputs KV
of String
(eg. filename) and InspectContentResponse
, which will contain a list of InspectResult
for the user to consume.
Either inspectTemplateName (String) or inspectConfig InspectConfig
need to be set.
Batch size defines how big are batches sent to DLP at once in bytes.
- See Also:
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsFields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionPCollection
<KV<String, com.google.privacy.dlp.v2.InspectContentResponse>> expand
(PCollection<KV<String, String>> input) The transform converts the contents of input PCollection intoTable.Row
s and then calls Cloud DLP service to perform the data inspection according to provided settings.abstract Integer
abstract @Nullable PCollectionView
<List<String>> abstract @Nullable com.google.privacy.dlp.v2.InspectConfig
abstract String
static DLPInspectText.Builder
Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setDisplayData, setResourceHints, toString, validate, validate
-
Field Details
-
DLP_PAYLOAD_LIMIT_BYTES
-
-
Constructor Details
-
DLPInspectText
public DLPInspectText()
-
-
Method Details
-
getInspectTemplateName
- Returns:
- Template name for data inspection.
-
getInspectConfig
- Returns:
- Configuration object for data inspection. If present, supersedes the template settings.
-
getBatchSizeBytes
- Returns:
- Size of input elements batch to be sent to Cloud DLP service in one request.
-
getProjectId
- Returns:
- ID of Google Cloud project to be used when deidentifying data.
-
getColumnDelimiter
- Returns:
- Delimiter to be used when splitting values from input strings into columns.
-
getHeaderColumns
- Returns:
- List of column names if the input KV value is a delimited row.
-
newBuilder
-
expand
public PCollection<KV<String,com.google.privacy.dlp.v2.InspectContentResponse>> expand(PCollection<KV<String, String>> input) The transform converts the contents of input PCollection intoTable.Row
s and then calls Cloud DLP service to perform the data inspection according to provided settings.- Specified by:
expand
in classPTransform<PCollection<KV<String,
String>>, PCollection<KV<String, com.google.privacy.dlp.v2.InspectContentResponse>>> - Parameters:
input
- input PCollection- Returns:
- PCollection after transformations
-