public class Sample
extends java.lang.Object
PTransforms for taking samples of the elements in a
 PCollection, or samples of the values associated with each
 key in a PCollection of KVs.
 combineFn(int) can also be used manually, in combination with state and with the
 Combine transform.
| Modifier and Type | Class and Description | 
|---|---|
static class  | 
Sample.FixedSizedSampleFn<T>
CombineFn that computes a fixed-size sample of a
 collection of values. | 
| Constructor and Description | 
|---|
Sample()  | 
| Modifier and Type | Method and Description | 
|---|---|
static <T> PTransform<PCollection<T>,PCollection<T>> | 
any(long limit)
Sample#any(long) takes a PCollection<T> and a limit, and
 produces a new PCollection<T> containing up to limit
 elements of the input PCollection. | 
static <T> Combine.CombineFn<T,?,java.lang.Iterable<T>> | 
combineFn(int sampleSize)
Returns a  
Combine.CombineFn that computes a fixed-sized sample of its inputs. | 
static <T> PTransform<PCollection<T>,PCollection<java.lang.Iterable<T>>> | 
fixedSizeGlobally(int sampleSize)
Returns a  
PTransform that takes a PCollection<T>, selects sampleSize
 elements, uniformly at random, and returns a PCollection<Iterable<T>> containing the
 selected elements. | 
static <K,V> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.lang.Iterable<V>>>> | 
fixedSizePerKey(int sampleSize)
Returns a  
PTransform that takes an input PCollection<KV<K, V>> and returns a
 PCollection<KV<K, Iterable<V>>> that contains an output element mapping each distinct
 key in the input PCollection to a sample of sampleSize values associated with
 that key in the input PCollection, taken uniformly at random. | 
public static <T> Combine.CombineFn<T,?,java.lang.Iterable<T>> combineFn(int sampleSize)
Combine.CombineFn that computes a fixed-sized sample of its inputs.public static <T> PTransform<PCollection<T>,PCollection<T>> any(long limit)
Sample#any(long) takes a PCollection<T> and a limit, and
 produces a new PCollection<T> containing up to limit
 elements of the input PCollection.
 If limit is greater than or equal to the size of the input
 PCollection, then all the input's elements will be selected.
 
All of the elements of the output PCollection should fit into
 main memory of a single worker machine.  This operation does not
 run in parallel.
 
Example of use:
 
 PCollection<String> input = ...;
 PCollection<String> output = input.apply(Sample.<String>any(100));
  T - the type of the elements of the input and output
 PCollectionslimit - the number of elements to take from the inputpublic static <T> PTransform<PCollection<T>,PCollection<java.lang.Iterable<T>>> fixedSizeGlobally(int sampleSize)
PTransform that takes a PCollection<T>, selects sampleSize
 elements, uniformly at random, and returns a PCollection<Iterable<T>> containing the
 selected elements. If the input PCollection has fewer than sampleSize elements,
 then the output Iterable<T> will be all the input's elements.
 All of the elements of the output PCollection should fit into
 main memory of a single worker machine.  This operation does not
 run in parallel.
 
Example of use:
 PCollection<String> pc = ...;
 PCollection<Iterable<String>> sampleOfSize10 =
     pc.apply(Sample.fixedSizeGlobally(10));
 
 T - the type of the elementssampleSize - the number of elements to select; must be >= 0public static <K,V> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.lang.Iterable<V>>>> fixedSizePerKey(int sampleSize)
PTransform that takes an input PCollection<KV<K, V>> and returns a
 PCollection<KV<K, Iterable<V>>> that contains an output element mapping each distinct
 key in the input PCollection to a sample of sampleSize values associated with
 that key in the input PCollection, taken uniformly at random. If a key in the input
 PCollection has fewer than sampleSize values associated with it, then the
 output Iterable<V> associated with that key will be all the values associated with that
 key in the input PCollection.
 Example of use:
 PCollection<KV<String, Integer>> pc = ...;
 PCollection<KV<String, Iterable<Integer>>> sampleOfSize10PerKey =
     pc.apply(Sample.<String, Integer>fixedSizePerKey());
 
 K - the type of the keysV - the type of the valuessampleSize - the number of values to select for each distinct key; must be >= 0