apache_beam.ml.transforms.handlers module

class apache_beam.ml.transforms.handlers.TFTProcessHandler(*, artifact_location: str, transforms: Sequence[TFTOperation] | None = None, artifact_mode: str = 'produce')[source]

Bases: ProcessHandler[NamedTuple | Row | Dict[str, str | float | int | bytes | ndarray], Row | Dict[str, ndarray]]

A handler class for processing data with TensorFlow Transform (TFT) operations.

append_transform(transform)[source]
get_raw_data_feature_spec(input_types: Dict[str, type]) Dict[str, tensorflow.io.VarLenFeature][source]

Return a DatasetMetadata object to be used with tft_beam.AnalyzeAndTransformDataset. :param input_types: A dictionary of column names and types.

Returns:

A DatasetMetadata object.

convert_raw_data_feature_spec_to_dataset_metadata(raw_data_feature_spec) tensorflow_transform.tf_metadata.dataset_metadata.DatasetMetadata[source]
get_raw_data_metadata(input_types: Dict[str, type]) tensorflow_transform.tf_metadata.dataset_metadata.DatasetMetadata[source]
write_transform_artifacts(transform_fn, location)[source]

Write transform artifacts to the given location. :param transform_fn: A transform_fn object. :param location: A location to write the artifacts.

Returns:

A PCollection of WriteTransformFn writing a TF transform graph.

process_data_fn(inputs: Dict[str, tensorflow_transform.common_types.ConsistentTensorType]) Dict[str, tensorflow_transform.common_types.ConsistentTensorType][source]

This method is used in the AnalyzeAndTransformDataset step. It applies the transforms to the inputs in sequential order on the columns provided for a given transform. :param inputs: A dictionary of column names and data.

Returns:

A dictionary of column names and transformed data.

expand(raw_data: PCollection[NamedTuple | Row | Dict[str, str | float | int | bytes | ndarray]]) PCollection[Row | Dict[str, ndarray]][source]

This method also computes the required dataset metadata for the tft AnalyzeDataset/TransformDataset step.

This method uses tensorflow_transform’s Analyze step to produce the artifacts and Transform step to apply the transforms on the data. Artifacts are only produced if the artifact_mode is set to produce. If artifact_mode is set to consume, then the artifacts are read from the artifact_location, which was previously used to store the produced artifacts.

with_exception_handling()[source]