org.apache.beam.sdk.extensions.sql.impl.transform.agg

## Class VarianceFn<T extends java.lang.Number>

• All Implemented Interfaces:
java.io.Serializable, CombineFnBase.GlobalCombineFn<T,org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator,T>, HasDisplayData

```@Internal
public class VarianceFn<T extends java.lang.Number>
extends Combine.CombineFn<T,org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator,T>```
`Combine.CombineFn` for Variance on `Number` types.

Calculates Population Variance and Sample Variance using incremental formulas described, for example, by Chan, Golub, and LeVeque in "Algorithms for computing the sample variance: analysis and recommendations", The American Statistician, 37 (1983) pp. 242--247.

If variance is defined like this:

• Input elements: `(x[1], ... , x[n])`
• Sum of elements: {sum(x) = x[1] + ... + x[n]}
• Average of all elements in the input: `mean(x) = sum(x) / n`
• Deviation of `i`th element from the current mean: ```deviation(x, i) = x[i] - mean(n)```
• Variance: `variance(x) = deviation(x, 1)^2 + ... + deviation(x, n)^2`

Then variance of combined input of 2 samples `(x[1], ... , x[n])` and ```(y[1], ... , y[m])``` is calculated using this formula:

• `variance(concat(x,y)) = variance(x) + variance(y) + increment`, where:
• `increment = m/(n(m+n)) * (n/m * sum(x) - sum(y))^2`

This is also applicable for a single element increment, assuming that variance of a single element input is zero

To implement the above formula we keep track of the current variation, sum, and count of elements, and then use the formula whenever new element comes or we need to merge variances for 2 samples.

Serialized Form
• ### Method Summary

All Methods
Modifier and Type Method and Description
`org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator` ```addInput(org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator currentVariance, T rawInput)```
Adds the given input value to the given accumulator, returning the new accumulator value.
`org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator` `createAccumulator()`
Returns a new, mutable accumulator value, representing the accumulation of zero input values.
`T` `extractOutput(org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator accumulator)`
Returns the output value that is the result of combining all the input values represented by the given accumulator.
`java.lang.reflect.TypeVariable<?>` `getAccumTVariable()`
Returns the `TypeVariable` of `AccumT`.
`Coder<org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator>` ```getAccumulatorCoder(CoderRegistry registry, Coder<T> inputCoder)```
Returns the `Coder` to use for accumulator `AccumT` values, or null if it is not able to be inferred.
`Coder<OutputT>` ```getDefaultOutputCoder(CoderRegistry registry, Coder<InputT> inputCoder)```
Returns the `Coder` to use by default for output `OutputT` values, or null if it is not able to be inferred.
`java.lang.String` `getIncompatibleGlobalWindowErrorMessage()`
Returns the error message for not supported default values in Combine.globally().
`java.lang.reflect.TypeVariable<?>` `getInputTVariable()`
Returns the `TypeVariable` of `InputT`.
`java.lang.reflect.TypeVariable<?>` `getOutputTVariable()`
Returns the `TypeVariable` of `OutputT`.
`org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator` `mergeAccumulators(java.lang.Iterable<org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator> variances)`
Returns an accumulator representing the accumulation of all the input values accumulated in the merging accumulators.
`static <V extends java.lang.Number>VarianceFn` `newPopulation(Schema.TypeName typeName)`
`static <V extends java.lang.Number>VarianceFn` `newPopulation(SerializableFunction<java.math.BigDecimal,V> decimalConverter)`
`static <V extends java.lang.Number>VarianceFn` `newSample(Schema.TypeName typeName)`
`static <V extends java.lang.Number>VarianceFn` `newSample(SerializableFunction<java.math.BigDecimal,V> decimalConverter)`
`void` `populateDisplayData(DisplayData.Builder builder)`
Register display data for the given transform or component.
• ### Methods inherited from class org.apache.beam.sdk.transforms.Combine.CombineFn

`apply, compact, defaultValue, getInputType, getOutputType`
• ### Methods inherited from class java.lang.Object

`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`
• ### Method Detail

• #### newPopulation

`public static <V extends java.lang.Number> VarianceFn newPopulation(Schema.TypeName typeName)`
• #### newPopulation

`public static <V extends java.lang.Number> VarianceFn newPopulation(SerializableFunction<java.math.BigDecimal,V> decimalConverter)`
• #### newSample

`public static <V extends java.lang.Number> VarianceFn newSample(Schema.TypeName typeName)`
• #### newSample

`public static <V extends java.lang.Number> VarianceFn newSample(SerializableFunction<java.math.BigDecimal,V> decimalConverter)`
• #### createAccumulator

`public org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator createAccumulator()`
Description copied from class: `Combine.CombineFn`
Returns a new, mutable accumulator value, representing the accumulation of zero input values.
Specified by:
`createAccumulator` in class `Combine.CombineFn<T extends java.lang.Number,org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator,T extends java.lang.Number>`

```public org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator addInput(org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator currentVariance,
T rawInput)```
Description copied from class: `Combine.CombineFn`
Adds the given input value to the given accumulator, returning the new accumulator value.
Specified by:
`addInput` in class `Combine.CombineFn<T extends java.lang.Number,org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator,T extends java.lang.Number>`
Parameters:
`currentVariance` - may be modified and returned for efficiency
`rawInput` - should not be mutated
• #### mergeAccumulators

`public org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator mergeAccumulators(java.lang.Iterable<org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator> variances)`
Description copied from class: `Combine.CombineFn`
Returns an accumulator representing the accumulation of all the input values accumulated in the merging accumulators.
Specified by:
`mergeAccumulators` in class `Combine.CombineFn<T extends java.lang.Number,org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator,T extends java.lang.Number>`
Parameters:
`variances` - only the first accumulator may be modified and returned for efficiency; the other accumulators should not be mutated, because they may be shared with other code and mutating them could lead to incorrect results or data corruption.
• #### getAccumulatorCoder

```public Coder<org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator> getAccumulatorCoder(CoderRegistry registry,
Coder<T> inputCoder)```
Description copied from interface: `CombineFnBase.GlobalCombineFn`
Returns the `Coder` to use for accumulator `AccumT` values, or null if it is not able to be inferred.

By default, uses the knowledge of the `Coder` being used for `InputT` values and the enclosing `Pipeline`'s `CoderRegistry` to try to infer the Coder for `AccumT` values.

This is the Coder used to send data through a communication-intensive shuffle step, so a compact and efficient representation may have significant performance benefits.

Specified by:
`getAccumulatorCoder` in interface `CombineFnBase.GlobalCombineFn<T extends java.lang.Number,org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator,T extends java.lang.Number>`
• #### extractOutput

`public T extractOutput(org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator accumulator)`
Description copied from class: `Combine.CombineFn`
Returns the output value that is the result of combining all the input values represented by the given accumulator.
Specified by:
`extractOutput` in class `Combine.CombineFn<T extends java.lang.Number,org.apache.beam.sdk.extensions.sql.impl.transform.agg.VarianceAccumulator,T extends java.lang.Number>`
Parameters:
`accumulator` - can be modified for efficiency
• #### getDefaultOutputCoder

```public Coder<OutputT> getDefaultOutputCoder(CoderRegistry registry,
Coder<InputT> inputCoder)
throws CannotProvideCoderException```
Description copied from interface: `CombineFnBase.GlobalCombineFn`
Returns the `Coder` to use by default for output `OutputT` values, or null if it is not able to be inferred.

By default, uses the knowledge of the `Coder` being used for input `InputT` values and the enclosing `Pipeline`'s `CoderRegistry` to try to infer the Coder for `OutputT` values.

Specified by:
`getDefaultOutputCoder` in interface `CombineFnBase.GlobalCombineFn<InputT,AccumT,OutputT>`
Throws:
`CannotProvideCoderException`
• #### getIncompatibleGlobalWindowErrorMessage

`public java.lang.String getIncompatibleGlobalWindowErrorMessage()`
Description copied from interface: `CombineFnBase.GlobalCombineFn`
Returns the error message for not supported default values in Combine.globally().
Specified by:
`getIncompatibleGlobalWindowErrorMessage` in interface `CombineFnBase.GlobalCombineFn<InputT,AccumT,OutputT>`
• #### getInputTVariable

`public java.lang.reflect.TypeVariable<?> getInputTVariable()`
Returns the `TypeVariable` of `InputT`.
• #### getAccumTVariable

`public java.lang.reflect.TypeVariable<?> getAccumTVariable()`
Returns the `TypeVariable` of `AccumT`.
• #### getOutputTVariable

`public java.lang.reflect.TypeVariable<?> getOutputTVariable()`
Returns the `TypeVariable` of `OutputT`.
• #### populateDisplayData

`public void populateDisplayData(DisplayData.Builder builder)`
Register display data for the given transform or component.

`populateDisplayData(DisplayData.Builder)` is invoked by Pipeline runners to collect display data via `DisplayData.from(HasDisplayData)`. Implementations may call `super.populateDisplayData(builder)` in order to register display data in the current namespace, but should otherwise use `subcomponent.populateDisplayData(builder)` to use the namespace of the subcomponent.

By default, does not register any display data. Implementors may override this method to provide their own display data.

Specified by:
`populateDisplayData` in interface `HasDisplayData`
Parameters:
`builder` - The builder to populate with display data.
`HasDisplayData`