apache_beam.ml.transforms.handlers module

class apache_beam.ml.transforms.handlers.TFTProcessHandler(*, artifact_location: str, transforms: Sequence[TFTOperation] | None = None, artifact_mode: str = 'produce')[source]

A handler class for processing data with TensorFlow Transform (TFT) operations.

append_transform(transform)[source]

get_raw_data_feature_spec(input_types: dict[str, type]) → dict[str, tensorflow.io.VarLenFeature][source]

Return a DatasetMetadata object to be used with tft_beam.AnalyzeAndTransformDataset. :param input_types: A dictionary of column names and types.

Returns:: A DatasetMetadata object.

convert_raw_data_feature_spec_to_dataset_metadata(raw_data_feature_spec) → tensorflow_transform.tf_metadata.dataset_metadata.DatasetMetadata[source]

get_raw_data_metadata(input_types: dict[str, type]) → tensorflow_transform.tf_metadata.dataset_metadata.DatasetMetadata[source]

write_transform_artifacts(transform_fn, location)[source]

Write transform artifacts to the given location. :param transform_fn: A transform_fn object. :param location: A location to write the artifacts.

Returns:: A PCollection of WriteTransformFn writing a TF transform graph.

process_data_fn(inputs: dict[str, tensorflow_transform.common_types.ConsistentTensorType]) → dict[str, tensorflow_transform.common_types.ConsistentTensorType][source]

This method is used in the AnalyzeAndTransformDataset step. It applies the transforms to the inputs in sequential order on the columns provided for a given transform. :param inputs: A dictionary of column names and data.

Returns:: A dictionary of column names and transformed data.

This method also computes the required dataset metadata for the tft AnalyzeDataset/TransformDataset step.

This method uses tensorflow_transform’s Analyze step to produce the artifacts and Transform step to apply the transforms on the data. Artifacts are only produced if the artifact_mode is set to produce. If artifact_mode is set to consume, then the artifacts are read from the artifact_location, which was previously used to store the produced artifacts.

with_exception_handling()[source]