apache_beam.ml.transforms.handlers module¶
-
class
apache_beam.ml.transforms.handlers.
TFTProcessHandler
(*, artifact_location: str, transforms: Optional[Sequence[apache_beam.ml.transforms.tft.TFTOperation]] = None, artifact_mode: str = 'produce')[source]¶ Bases:
apache_beam.ml.transforms.base.ProcessHandler
A handler class for processing data with TensorFlow Transform (TFT) operations.
-
get_raw_data_feature_spec
(input_types: Dict[str, type]) → Dict[str, <sphinx.ext.autodoc.importer._MockObject object at 0x7fd715bbf040>][source]¶ Return a DatasetMetadata object to be used with tft_beam.AnalyzeAndTransformDataset. :param input_types: A dictionary of column names and types.
Returns: A DatasetMetadata object.
-
convert_raw_data_feature_spec_to_dataset_metadata
(raw_data_feature_spec) → <sphinx.ext.autodoc.importer._MockObject object at 0x7fd715bb8b80>[source]¶
-
get_raw_data_metadata
(input_types: Dict[str, type]) → <sphinx.ext.autodoc.importer._MockObject object at 0x7fd715bb8b80>[source]¶
-
write_transform_artifacts
(transform_fn, location)[source]¶ Write transform artifacts to the given location. :param transform_fn: A transform_fn object. :param location: A location to write the artifacts.
Returns: A PCollection of WriteTransformFn writing a TF transform graph.
-
process_data_fn
(inputs: Dict[str, <sphinx.ext.autodoc.importer._MockObject object at 0x7fd715bad430>]) → Dict[str, <sphinx.ext.autodoc.importer._MockObject object at 0x7fd715bad430>][source]¶ This method is used in the AnalyzeAndTransformDataset step. It applies the transforms to the inputs in sequential order on the columns provided for a given transform. :param inputs: A dictionary of column names and data.
Returns: A dictionary of column names and transformed data.
-
process_data
(raw_data: apache_beam.pvalue.PCollection[typing.Union[typing.NamedTuple, apache_beam.pvalue.Row, typing.Dict[str, typing.Union[str, float, int, bytes, numpy.ndarray]]]][Union[NamedTuple, apache_beam.pvalue.Row, Dict[str, Union[str, float, int, bytes, numpy.ndarray]]]]) → apache_beam.pvalue.PCollection[typing.Union[apache_beam.pvalue.Row, typing.Dict[str, numpy.ndarray]]][Union[apache_beam.pvalue.Row, Dict[str, numpy.ndarray]]][source]¶ This method also computes the required dataset metadata for the tft AnalyzeDataset/TransformDataset step.
This method uses tensorflow_transform’s Analyze step to produce the artifacts and Transform step to apply the transforms on the data. Artifacts are only produced if the artifact_mode is set to produce. If artifact_mode is set to consume, then the artifacts are read from the artifact_location, which was previously used to store the produced artifacts.
-