apache_beam.ml.anomaly.base module

Base classes for anomaly detection

class apache_beam.ml.anomaly.base.AnomalyPrediction(model_id: str | None = None, score: float | None = None, label: int | None = None, threshold: float | None = None, info: str = '', source_predictions: Iterable[AnomalyPrediction] | None = None)[source]

Bases: object

A dataclass for anomaly detection predictions.

model_id: str | None = None

The ID of detector (model) that generates the prediction.

score: float | None = None

The outlier score resulting from applying the detector to the input data.

label: int | None = None

The outlier label (normal or outlier) derived from the outlier score.

threshold: float | None = None

The threshold used to determine the label.

info: str = ''

Additional information about the prediction.

source_predictions: Iterable[AnomalyPrediction] | None = None

If enabled, a list of AnomalyPrediction objects used to derive the aggregated prediction.

class apache_beam.ml.anomaly.base.AnomalyResult(example: Row, predictions: Iterable[AnomalyPrediction])[source]

Bases: object

A dataclass for the anomaly detection results

example: Row

The original input data.

predictions: Iterable[AnomalyPrediction]

The iterable of AnomalyPrediction objects containing the predictions. Expect length 1 if it is a result for a non-ensemble detector or an ensemble detector with an aggregation strategy applied.

class apache_beam.ml.anomaly.base.ThresholdFn(normal_label: int = 0, outlier_label: int = 1, missing_label: int = -2)[source]

Bases: ABC

An abstract base class for threshold functions.

Parameters:
  • normal_label – The integer label used to identify normal data. Defaults to 0.

  • outlier_label – The integer label used to identify outlier data. Defaults to 1.

  • missing_label – The integer label used when a score is missing because the model is not ready to score.

abstract property is_stateful: bool

Indicates whether the threshold function is stateful or not.

abstract property threshold: float | None

Retrieves the current threshold value, or None if not set.

abstract apply(score: float | None) int | None[source]

Applies the threshold function to a given score to classify it as normal or outlier.

Parameters:

score – The outlier score generated from the detector (model).

Returns:

The label assigned to the score, either self._normal_label or self._outlier_label

class apache_beam.ml.anomaly.base.AggregationFn[source]

Bases: ABC

An abstract base class for aggregation functions.

abstract apply(predictions: Iterable[AnomalyPrediction]) AnomalyPrediction[source]

Applies the aggregation function to an iterable of predictions, either on their outlier scores or labels.

Parameters:

predictions – An Iterable of AnomalyPrediction objects to aggregate.

Returns:

An AnomalyPrediction object containing the aggregated result.

class apache_beam.ml.anomaly.base.AnomalyDetector(model_id: str | None = None, features: Iterable[str] | None = None, target: str | None = None, threshold_criterion: ThresholdFn | None = None, **kwargs)[source]

Bases: ABC

An abstract base class for anomaly detectors.

Parameters:
  • model_id – The ID of detector (model). Defaults to the value of the spec_type attribute, or ‘unknown’ if not set.

  • features – An Iterable of strings representing the names of the input features in the beam.Row

  • target – The name of the target field in the beam.Row.

  • threshold_criterion – An optional ThresholdFn to apply to the outlier score and yield a label.

abstract learn_one(x: Row) None[source]

Trains the detector on a single data instance.

Parameters:

x – A beam.Row representing the data instance.

abstract score_one(x: Row) float | None[source]

Scores a single data instance for anomalies.

Parameters:

x – A beam.Row representing the data instance.

Returns:

The outlier score as a float. None if an exception occurs during scoring, and NaN if the model is not ready.

class apache_beam.ml.anomaly.base.EnsembleAnomalyDetector(sub_detectors: List[AnomalyDetector] | None = None, aggregation_strategy: AggregationFn | None = None, **kwargs)[source]

Bases: AnomalyDetector

An abstract base class for an ensemble of anomaly (sub-)detectors.

Parameters:
  • sub_detectors – A List of AnomalyDetector used in this ensemble model.

  • aggregation_strategy – An optional AggregationFn to apply to the predictions from all sub-detectors and yield an aggregated result.

  • model_id – Inherited from AnomalyDetector.

  • features – Inherited from AnomalyDetector.

  • target – Inherited from AnomalyDetector.

  • threshold_criterion – Inherited from AnomalyDetector.

learn_one(x: Row) None[source]

Inherited from AnomalyDetector.learn_one.

This method is never called during ensemble detector training. The training process is done on each sub-detector independently and in parallel.

score_one(x: Row) float[source]

Inherited from AnomalyDetector.score_one.

This method is never called during ensemble detector scoring. The scoring process is done on sub-detector independently and in parallel, and then the results are aggregated in the pipeline.