apache_beam.ml.inference.sklearn_inference module¶

class apache_beam.ml.inference.sklearn_inference.ModelFileType[source]¶

Bases: enum.Enum

Defines how a model file is serialized. Options are pickle or joblib.

PICKLE = 1¶

JOBLIB = 2¶

class apache_beam.ml.inference.sklearn_inference.SklearnModelHandlerNumpy(model_uri: str, model_file_type: apache_beam.ml.inference.sklearn_inference.ModelFileType = <ModelFileType.PICKLE: 1>)[source]¶

Bases: apache_beam.ml.inference.base.ModelHandler

Implementation of the ModelHandler interface for scikit-learn using numpy arrays as input.

Example Usage:

pcoll | RunInference(SklearnModelHandlerNumpy(model_uri="my_uri"))

Parameters:	model_uri – The URI to where the model is saved. model_file_type – The method of serialization of the argument. default=pickle

load_model() → sklearn.base.BaseEstimator[source]¶: Loads and initializes a model for processing.

run_inference(batch: Sequence[numpy.ndarray], model: sklearn.base.BaseEstimator, inference_args: Optional[Dict[str, Any]] = None) → Iterable[apache_beam.ml.inference.base.PredictionResult][source]¶

Runs inferences on a batch of numpy arrays.

Parameters:	batch – A sequence of examples as numpy arrays. They should be single examples. model – A numpy model or pipeline. Must implement predict(X). Where the parameter X is a numpy array. inference_args – Any additional arguments for an inference.
Returns:	An Iterable of type PredictionResult.

get_num_bytes(batch: Sequence[pandas.core.frame.DataFrame]) → int[source]¶

Returns:	The number of bytes of data for a batch.

class apache_beam.ml.inference.sklearn_inference.SklearnModelHandlerPandas(model_uri: str, model_file_type: apache_beam.ml.inference.sklearn_inference.ModelFileType = <ModelFileType.PICKLE: 1>)[source]¶

Bases: apache_beam.ml.inference.base.ModelHandler

Implementation of the ModelHandler interface for scikit-learn that supports pandas dataframes.

Example Usage:

pcoll | RunInference(SklearnModelHandlerPandas(model_uri="my_uri"))

NOTE: This API and its implementation are under development and do not provide backward compatibility guarantees.

Parameters:	model_uri – The URI to where the model is saved. model_file_type – The method of serialization of the argument. default=pickle

load_model() → sklearn.base.BaseEstimator[source]¶: Loads and initializes a model for processing.

run_inference(batch: Sequence[pandas.core.frame.DataFrame], model: sklearn.base.BaseEstimator, inference_args: Optional[Dict[str, Any]] = None) → Iterable[apache_beam.ml.inference.base.PredictionResult][source]¶

Runs inferences on a batch of pandas dataframes.

Parameters:	batch – A sequence of examples as numpy arrays. They should be single examples. model – A dataframe model or pipeline. Must implement predict(X). Where the parameter X is a pandas dataframe. inference_args – Any additional arguments for an inference.
Returns:	An Iterable of type PredictionResult.

get_num_bytes(batch: Sequence[pandas.core.frame.DataFrame]) → int[source]¶

Returns:	The number of bytes of data for a batch.