apache_beam.ml.anomaly.base module

Base classes for anomaly detection

class apache_beam.ml.anomaly.base.AnomalyPrediction(model_id: str | None = None, score: float | None = None, label: int | None = None, threshold: float | None = None, info: str = '', source_predictions: Iterable[AnomalyPrediction] | None = None)[source]

Bases: object

A dataclass for anomaly detection predictions.

model_id: str | None = None: The ID of detector (model) that generates the prediction.

score: float | None = None: The outlier score resulting from applying the detector to the input data.

label: int | None = None: The outlier label (normal or outlier) derived from the outlier score.

threshold: float | None = None: The threshold used to determine the label.

info: str = '': Additional information about the prediction.

source_predictions: Iterable[AnomalyPrediction] | None = None: If enabled, a list of AnomalyPrediction objects used to derive the aggregated prediction.

class apache_beam.ml.anomaly.base.AnomalyResult(example: Row, predictions: Iterable[AnomalyPrediction])[source]

Bases: object

A dataclass for the anomaly detection results

example: Row: The original input data.

predictions: Iterable[AnomalyPrediction]: The iterable of AnomalyPrediction objects containing the predictions. Expect length 1 if it is a result for a non-ensemble detector or an ensemble detector with an aggregation strategy applied.

class apache_beam.ml.anomaly.base.ThresholdFn(normal_label: int = 0, outlier_label: int = 1, missing_label: int = -2)[source]

Bases: ABC

An abstract base class for threshold functions.

Parameters:

normal_label – The integer label used to identify normal data. Defaults to 0.
outlier_label – The integer label used to identify outlier data. Defaults to 1.
missing_label – The integer label used when a score is missing because the model is not ready to score.

abstract property is_stateful: bool: Indicates whether the threshold function is stateful or not.

abstract property threshold: float | None: Retrieves the current threshold value, or None if not set.

abstract apply(score: float | None) → int | None[source]

Applies the threshold function to a given score to classify it as normal or outlier.

Parameters:: score – The outlier score generated from the detector (model).
Returns:: The label assigned to the score, either self._normal_label or self._outlier_label

class apache_beam.ml.anomaly.base.AggregationFn[source]

Bases: ABC

An abstract base class for aggregation functions.

abstract apply(predictions: Iterable[AnomalyPrediction]) → AnomalyPrediction[source]

Applies the aggregation function to an iterable of predictions, either on their outlier scores or labels.

Parameters:: predictions – An Iterable of AnomalyPrediction objects to aggregate.
Returns:: An AnomalyPrediction object containing the aggregated result.

class apache_beam.ml.anomaly.base.AnomalyDetector(model_id: str | None = None, features: Iterable[str] | None = None, target: str | None = None, threshold_criterion: ThresholdFn | None = None, **kwargs)[source]

Bases: ABC

An abstract base class for anomaly detectors.

Parameters:

model_id – The ID of detector (model). Defaults to the value of the spec_type attribute, or ‘unknown’ if not set.
features – An Iterable of strings representing the names of the input features in the beam.Row
target – The name of the target field in the beam.Row.
threshold_criterion – An optional ThresholdFn to apply to the outlier score and yield a label.

abstract learn_one(x: Row) → None[source]

Trains the detector on a single data instance.

Parameters:: x – A beam.Row representing the data instance.

abstract score_one(x: Row) → float | None[source]

Scores a single data instance for anomalies.

Parameters:: x – A beam.Row representing the data instance.
Returns:: The outlier score as a float. None if an exception occurs during scoring, and NaN if the model is not ready.

class apache_beam.ml.anomaly.base.EnsembleAnomalyDetector(sub_detectors: list[AnomalyDetector] | None = None, aggregation_strategy: AggregationFn | None = None, **kwargs)[source]

Bases: AnomalyDetector

An abstract base class for an ensemble of anomaly (sub-)detectors.

Parameters:

sub_detectors – A list of AnomalyDetector used in this ensemble model.
aggregation_strategy – An optional AggregationFn to apply to the predictions from all sub-detectors and yield an aggregated result.
model_id – Inherited from AnomalyDetector.
features – Inherited from AnomalyDetector.
target – Inherited from AnomalyDetector.
threshold_criterion – Inherited from AnomalyDetector.

learn_one(x: Row) → None[source]

Inherited from AnomalyDetector.learn_one.

This method is never called during ensemble detector training. The training process is done on each sub-detector independently and in parallel.

score_one(x: Row) → float[source]

Inherited from AnomalyDetector.score_one.

This method is never called during ensemble detector scoring. The scoring process is done on sub-detector independently and in parallel, and then the results are aggregated in the pipeline.