apache_beam.ml.inference.xgboost_inference module

class apache_beam.ml.inference.xgboost_inference.XGBoostModelHandler(model_class: Union[Callable[[...], <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89bda8b0>], Callable[[...], <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c91cd0>]], model_state: str, inference_fn: Callable[[Sequence[object], Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c19820>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89bdab50>], Optional[Dict[str, Any]]], Iterable[apache_beam.ml.inference.base.PredictionResult]] = <function default_xgboost_inference_fn>, *, min_batch_size: Optional[int] = None, max_batch_size: Optional[int] = None, max_batch_duration_secs: Optional[int] = None, **kwargs)[source]

Bases: apache_beam.ml.inference.base.ModelHandler, abc.ABC

Implementation of the ModelHandler interface for XGBoost.

Example Usage:

pcoll | RunInference(
            XGBoostModelHandler(
                model_class="XGBoost Model Class",
                model_state="my_model_state.json")))

See https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html for details

Parameters:
  • model_class – class of the XGBoost model that defines the model structure.
  • model_state – path to a json file that contains the model’s configuration.
  • inference_fn – the inference function to use during RunInference. default=default_xgboost_inference_fn
  • min_batch_size – optional. the minimum batch size to use when batching inputs.
  • max_batch_size – optional. the maximum batch size to use when batching inputs.
  • max_batch_duration_secs – optional. the maximum amount of time to buffer a batch before emitting; used in streaming contexts.
  • kwargs – ‘env_vars’ can be used to set environment variables before loading the model.

Supported Versions: RunInference APIs in Apache Beam have been tested with XGBoost 1.6.0 and 1.7.0

XGBoost 1.0.0 introduced support for using JSON to save and load XGBoost models. XGBoost 1.6.0, additional support for Universal Binary JSON. It is recommended to use a model trained in XGBoost 1.6.0 or higher. While you should be able to load models created in older versions, there are no guarantees this will work as expected.

This class is the superclass of all the various XGBoostModelhandlers and should not be instantiated directly. (See instead XGBoostModelHandlerNumpy, XGBoostModelHandlerPandas, etc.)

load_model() → Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c91a00>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c66d30>][source]
get_metrics_namespace() → str[source]
batch_elements_kwargs() → Mapping[str, Any][source]
class apache_beam.ml.inference.xgboost_inference.XGBoostModelHandlerNumpy(model_class: Union[Callable[[...], <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89bda8b0>], Callable[[...], <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c91cd0>]], model_state: str, inference_fn: Callable[[Sequence[object], Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c19820>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89bdab50>], Optional[Dict[str, Any]]], Iterable[apache_beam.ml.inference.base.PredictionResult]] = <function default_xgboost_inference_fn>, *, min_batch_size: Optional[int] = None, max_batch_size: Optional[int] = None, max_batch_duration_secs: Optional[int] = None, **kwargs)[source]

Bases: apache_beam.ml.inference.xgboost_inference.XGBoostModelHandler

Implementation of the ModelHandler interface for XGBoost using numpy arrays as input.

Example Usage:

pcoll | RunInference(
            XGBoostModelHandlerNumpy(
                model_class="XGBoost Model Class",
                model_state="my_model_state.json")))
Parameters:
  • model_class – class of the XGBoost model that defines the model structure.
  • model_state – path to a json file that contains the model’s configuration.
  • inference_fn – the inference function to use during RunInference. default=default_xgboost_inference_fn

Implementation of the ModelHandler interface for XGBoost.

Example Usage:

pcoll | RunInference(
            XGBoostModelHandler(
                model_class="XGBoost Model Class",
                model_state="my_model_state.json")))

See https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html for details

Parameters:
  • model_class – class of the XGBoost model that defines the model structure.
  • model_state – path to a json file that contains the model’s configuration.
  • inference_fn – the inference function to use during RunInference. default=default_xgboost_inference_fn
  • min_batch_size – optional. the minimum batch size to use when batching inputs.
  • max_batch_size – optional. the maximum batch size to use when batching inputs.
  • max_batch_duration_secs – optional. the maximum amount of time to buffer a batch before emitting; used in streaming contexts.
  • kwargs – ‘env_vars’ can be used to set environment variables before loading the model.

Supported Versions: RunInference APIs in Apache Beam have been tested with XGBoost 1.6.0 and 1.7.0

XGBoost 1.0.0 introduced support for using JSON to save and load XGBoost models. XGBoost 1.6.0, additional support for Universal Binary JSON. It is recommended to use a model trained in XGBoost 1.6.0 or higher. While you should be able to load models created in older versions, there are no guarantees this will work as expected.

This class is the superclass of all the various XGBoostModelhandlers and should not be instantiated directly. (See instead XGBoostModelHandlerNumpy, XGBoostModelHandlerPandas, etc.)

run_inference(batch: Sequence[numpy.ndarray], model: Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c47370>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c47430>], inference_args: Optional[Dict[str, Any]] = None) → Iterable[apache_beam.ml.inference.base.PredictionResult][source]

Runs inferences on a batch of 2d numpy arrays.

Parameters:
  • batch – A sequence of examples as 2d numpy arrays. Each row in an array is a single example. The dimensions must match the dimensions of the data used to train the model.
  • model – XGBoost booster or XBGModel (sklearn interface). Must implement predict(X). Where the parameter X is a 2d numpy array.
  • inference_args – Any additional arguments for an inference.
Returns:

An Iterable of type PredictionResult.

get_num_bytes(batch: Sequence[numpy.ndarray]) → int[source]
Returns:The number of bytes of data for a batch.
class apache_beam.ml.inference.xgboost_inference.XGBoostModelHandlerPandas(model_class: Union[Callable[[...], <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89bda8b0>], Callable[[...], <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c91cd0>]], model_state: str, inference_fn: Callable[[Sequence[object], Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c19820>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89bdab50>], Optional[Dict[str, Any]]], Iterable[apache_beam.ml.inference.base.PredictionResult]] = <function default_xgboost_inference_fn>, *, min_batch_size: Optional[int] = None, max_batch_size: Optional[int] = None, max_batch_duration_secs: Optional[int] = None, **kwargs)[source]

Bases: apache_beam.ml.inference.xgboost_inference.XGBoostModelHandler

Implementation of the ModelHandler interface for XGBoost using pandas dataframes as input.

Example Usage:

pcoll | RunInference(
            XGBoostModelHandlerPandas(
                model_class="XGBoost Model Class",
                model_state="my_model_state.json")))
Parameters:
  • model_class – class of the XGBoost model that defines the model structure.
  • model_state – path to a json file that contains the model’s configuration.
  • inference_fn – the inference function to use during RunInference. default=default_xgboost_inference_fn

Implementation of the ModelHandler interface for XGBoost.

Example Usage:

pcoll | RunInference(
            XGBoostModelHandler(
                model_class="XGBoost Model Class",
                model_state="my_model_state.json")))

See https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html for details

Parameters:
  • model_class – class of the XGBoost model that defines the model structure.
  • model_state – path to a json file that contains the model’s configuration.
  • inference_fn – the inference function to use during RunInference. default=default_xgboost_inference_fn
  • min_batch_size – optional. the minimum batch size to use when batching inputs.
  • max_batch_size – optional. the maximum batch size to use when batching inputs.
  • max_batch_duration_secs – optional. the maximum amount of time to buffer a batch before emitting; used in streaming contexts.
  • kwargs – ‘env_vars’ can be used to set environment variables before loading the model.

Supported Versions: RunInference APIs in Apache Beam have been tested with XGBoost 1.6.0 and 1.7.0

XGBoost 1.0.0 introduced support for using JSON to save and load XGBoost models. XGBoost 1.6.0, additional support for Universal Binary JSON. It is recommended to use a model trained in XGBoost 1.6.0 or higher. While you should be able to load models created in older versions, there are no guarantees this will work as expected.

This class is the superclass of all the various XGBoostModelhandlers and should not be instantiated directly. (See instead XGBoostModelHandlerNumpy, XGBoostModelHandlerPandas, etc.)

run_inference(batch: Sequence[pandas.core.frame.DataFrame], model: Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c475b0>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c47670>], inference_args: Optional[Dict[str, Any]] = None) → Iterable[apache_beam.ml.inference.base.PredictionResult][source]

Runs inferences on a batch of pandas dataframes.

Parameters:
  • batch – A sequence of examples as pandas dataframes. Each row in a dataframe is a single example. The dimensions must match the dimensions of the data used to train the model.
  • model – XGBoost booster or XBGModel (sklearn interface). Must implement predict(X). Where the parameter X is a pandas dataframe.
  • inference_args – Any additional arguments for an inference.
Returns:

An Iterable of type PredictionResult.

get_num_bytes(batch: Sequence[pandas.core.frame.DataFrame]) → int[source]
Returns:The number of bytes of data for a batch of Numpy arrays.
class apache_beam.ml.inference.xgboost_inference.XGBoostModelHandlerSciPy(model_class: Union[Callable[[...], <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89bda8b0>], Callable[[...], <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c91cd0>]], model_state: str, inference_fn: Callable[[Sequence[object], Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c19820>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89bdab50>], Optional[Dict[str, Any]]], Iterable[apache_beam.ml.inference.base.PredictionResult]] = <function default_xgboost_inference_fn>, *, min_batch_size: Optional[int] = None, max_batch_size: Optional[int] = None, max_batch_duration_secs: Optional[int] = None, **kwargs)[source]

Bases: apache_beam.ml.inference.xgboost_inference.XGBoostModelHandler

Implementation of the ModelHandler interface for XGBoost using scipy matrices as input.

Example Usage:

pcoll | RunInference(
            XGBoostModelHandlerSciPy(
                model_class="XGBoost Model Class",
                model_state="my_model_state.json")))
Parameters:
  • model_class – class of the XGBoost model that defines the model structure.
  • model_state – path to a json file that contains the model’s configuration.
  • inference_fn – the inference function to use during RunInference. default=default_xgboost_inference_fn

Implementation of the ModelHandler interface for XGBoost.

Example Usage:

pcoll | RunInference(
            XGBoostModelHandler(
                model_class="XGBoost Model Class",
                model_state="my_model_state.json")))

See https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html for details

Parameters:
  • model_class – class of the XGBoost model that defines the model structure.
  • model_state – path to a json file that contains the model’s configuration.
  • inference_fn – the inference function to use during RunInference. default=default_xgboost_inference_fn
  • min_batch_size – optional. the minimum batch size to use when batching inputs.
  • max_batch_size – optional. the maximum batch size to use when batching inputs.
  • max_batch_duration_secs – optional. the maximum amount of time to buffer a batch before emitting; used in streaming contexts.
  • kwargs – ‘env_vars’ can be used to set environment variables before loading the model.

Supported Versions: RunInference APIs in Apache Beam have been tested with XGBoost 1.6.0 and 1.7.0

XGBoost 1.0.0 introduced support for using JSON to save and load XGBoost models. XGBoost 1.6.0, additional support for Universal Binary JSON. It is recommended to use a model trained in XGBoost 1.6.0 or higher. While you should be able to load models created in older versions, there are no guarantees this will work as expected.

This class is the superclass of all the various XGBoostModelhandlers and should not be instantiated directly. (See instead XGBoostModelHandlerNumpy, XGBoostModelHandlerPandas, etc.)

run_inference(batch: Sequence[scipy.sparse._csr.csr_matrix], model: Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c477f0>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c47940>], inference_args: Optional[Dict[str, Any]] = None) → Iterable[apache_beam.ml.inference.base.PredictionResult][source]

Runs inferences on a batch of SciPy sparse matrices.

Parameters:
  • batch – A sequence of examples as Scipy sparse matrices. The dimensions must match the dimensions of the data used to train the model.
  • model – XGBoost booster or XBGModel (sklearn interface). Must implement predict(X). Where the parameter X is a SciPy sparse matrix.
  • inference_args – Any additional arguments for an inference.
Returns:

An Iterable of type PredictionResult.

get_num_bytes(batch: Sequence[scipy.sparse._csr.csr_matrix]) → int[source]
Returns:The number of bytes of data for a batch.
class apache_beam.ml.inference.xgboost_inference.XGBoostModelHandlerDatatable(model_class: Union[Callable[[...], <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89bda8b0>], Callable[[...], <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c91cd0>]], model_state: str, inference_fn: Callable[[Sequence[object], Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c19820>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89bdab50>], Optional[Dict[str, Any]]], Iterable[apache_beam.ml.inference.base.PredictionResult]] = <function default_xgboost_inference_fn>, *, min_batch_size: Optional[int] = None, max_batch_size: Optional[int] = None, max_batch_duration_secs: Optional[int] = None, **kwargs)[source]

Bases: apache_beam.ml.inference.xgboost_inference.XGBoostModelHandler

Implementation of the ModelHandler interface for XGBoost using datatable dataframes as input.

Example Usage:

pcoll | RunInference(
            XGBoostModelHandlerDatatable(
                model_class="XGBoost Model Class",
                model_state="my_model_state.json")))
Parameters:
  • model_class – class of the XGBoost model that defines the model structure.
  • model_state – path to a json file that contains the model’s configuration.
  • inference_fn – the inference function to use during RunInference. default=default_xgboost_inference_fn

Implementation of the ModelHandler interface for XGBoost.

Example Usage:

pcoll | RunInference(
            XGBoostModelHandler(
                model_class="XGBoost Model Class",
                model_state="my_model_state.json")))

See https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html for details

Parameters:
  • model_class – class of the XGBoost model that defines the model structure.
  • model_state – path to a json file that contains the model’s configuration.
  • inference_fn – the inference function to use during RunInference. default=default_xgboost_inference_fn
  • min_batch_size – optional. the minimum batch size to use when batching inputs.
  • max_batch_size – optional. the maximum batch size to use when batching inputs.
  • max_batch_duration_secs – optional. the maximum amount of time to buffer a batch before emitting; used in streaming contexts.
  • kwargs – ‘env_vars’ can be used to set environment variables before loading the model.

Supported Versions: RunInference APIs in Apache Beam have been tested with XGBoost 1.6.0 and 1.7.0

XGBoost 1.0.0 introduced support for using JSON to save and load XGBoost models. XGBoost 1.6.0, additional support for Universal Binary JSON. It is recommended to use a model trained in XGBoost 1.6.0 or higher. While you should be able to load models created in older versions, there are no guarantees this will work as expected.

This class is the superclass of all the various XGBoostModelhandlers and should not be instantiated directly. (See instead XGBoostModelHandlerNumpy, XGBoostModelHandlerPandas, etc.)

run_inference(batch: Sequence[<sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c47b20>], model: Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c47be0>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c47cd0>], inference_args: Optional[Dict[str, Any]] = None) → Iterable[apache_beam.ml.inference.base.PredictionResult][source]

Runs inferences on a batch of datatable dataframe.

Parameters:
  • batch – A sequence of examples as datatable dataframes. Each row in a dataframe is a single example. The dimensions must match the dimensions of the data used to train the model.
  • model – XGBoost booster or XBGModel (sklearn interface). Must implement predict(X). Where the parameter X is a datatable dataframe.
  • inference_args – Any additional arguments for an inference.
Returns:

An Iterable of type PredictionResult.

get_num_bytes(batch: Sequence[<sphinx.ext.autodoc.importer._MockObject object at 0x7f0a89c47d30>]) → int[source]
Returns:The number of bytes of data for a batch.