apache_beam.ml.transforms.handlers module
- class apache_beam.ml.transforms.handlers.TFTProcessHandler(*, artifact_location: str, transforms: Sequence[TFTOperation] | None = None, artifact_mode: str = 'produce')[source]
Bases:
ProcessHandler[NamedTuple|Row|dict[str,str|float|int|bytes|ndarray],Row|dict[str,ndarray]]A handler class for processing data with TensorFlow Transform (TFT) operations.
- get_raw_data_feature_spec(input_types: dict[str, type]) dict[str, tensorflow.io.VarLenFeature][source]
Return a DatasetMetadata object to be used with tft_beam.AnalyzeAndTransformDataset. :param input_types: A dictionary of column names and types.
- Returns:
A DatasetMetadata object.
- convert_raw_data_feature_spec_to_dataset_metadata(raw_data_feature_spec) tensorflow_transform.tf_metadata.dataset_metadata.DatasetMetadata[source]
- get_raw_data_metadata(input_types: dict[str, type]) tensorflow_transform.tf_metadata.dataset_metadata.DatasetMetadata[source]
- write_transform_artifacts(transform_fn, location)[source]
Write transform artifacts to the given location. :param transform_fn: A transform_fn object. :param location: A location to write the artifacts.
- Returns:
A PCollection of WriteTransformFn writing a TF transform graph.
- process_data_fn(inputs: dict[str, tensorflow_transform.common_types.ConsistentTensorType]) dict[str, tensorflow_transform.common_types.ConsistentTensorType][source]
This method is used in the AnalyzeAndTransformDataset step. It applies the transforms to the inputs in sequential order on the columns provided for a given transform. :param inputs: A dictionary of column names and data.
- Returns:
A dictionary of column names and transformed data.
- expand(raw_data: PCollection[NamedTuple | Row | dict[str, str | float | int | bytes | ndarray]]) PCollection[Row | dict[str, ndarray]][source]
This method also computes the required dataset metadata for the tft AnalyzeDataset/TransformDataset step.
This method uses tensorflow_transform’s Analyze step to produce the artifacts and Transform step to apply the transforms on the data. Artifacts are only produced if the artifact_mode is set to produce. If artifact_mode is set to consume, then the artifacts are read from the artifact_location, which was previously used to store the produced artifacts.