apache_beam.ml.transforms.handlers module

class apache_beam.ml.transforms.handlers.TFTProcessHandler(*, artifact_location: str, transforms: Optional[Sequence[apache_beam.ml.transforms.tft.TFTOperation]] = None, artifact_mode: str = 'produce')[source]

Bases: apache_beam.ml.transforms.base.ProcessHandler

A handler class for processing data with TensorFlow Transform (TFT) operations.

append_transform(transform)[source]
get_raw_data_feature_spec(input_types: Dict[str, type]) → Dict[str, <sphinx.ext.autodoc.importer._MockObject object at 0x7fd715bbf040>][source]

Return a DatasetMetadata object to be used with tft_beam.AnalyzeAndTransformDataset. :param input_types: A dictionary of column names and types.

Returns:A DatasetMetadata object.
convert_raw_data_feature_spec_to_dataset_metadata(raw_data_feature_spec) → <sphinx.ext.autodoc.importer._MockObject object at 0x7fd715bb8b80>[source]
get_raw_data_metadata(input_types: Dict[str, type]) → <sphinx.ext.autodoc.importer._MockObject object at 0x7fd715bb8b80>[source]
write_transform_artifacts(transform_fn, location)[source]

Write transform artifacts to the given location. :param transform_fn: A transform_fn object. :param location: A location to write the artifacts.

Returns:A PCollection of WriteTransformFn writing a TF transform graph.
process_data_fn(inputs: Dict[str, <sphinx.ext.autodoc.importer._MockObject object at 0x7fd715bad430>]) → Dict[str, <sphinx.ext.autodoc.importer._MockObject object at 0x7fd715bad430>][source]

This method is used in the AnalyzeAndTransformDataset step. It applies the transforms to the inputs in sequential order on the columns provided for a given transform. :param inputs: A dictionary of column names and data.

Returns:A dictionary of column names and transformed data.
process_data(raw_data: apache_beam.pvalue.PCollection[typing.Union[typing.NamedTuple, apache_beam.pvalue.Row, typing.Dict[str, typing.Union[str, float, int, bytes, numpy.ndarray]]]][Union[NamedTuple, apache_beam.pvalue.Row, Dict[str, Union[str, float, int, bytes, numpy.ndarray]]]]) → apache_beam.pvalue.PCollection[typing.Union[apache_beam.pvalue.Row, typing.Dict[str, numpy.ndarray]]][Union[apache_beam.pvalue.Row, Dict[str, numpy.ndarray]]][source]

This method also computes the required dataset metadata for the tft AnalyzeDataset/TransformDataset step.

This method uses tensorflow_transform’s Analyze step to produce the artifacts and Transform step to apply the transforms on the data. Artifacts are only produced if the artifact_mode is set to produce. If artifact_mode is set to consume, then the artifacts are read from the artifact_location, which was previously used to store the produced artifacts.