apache_beam.options package¶
Submodules¶
apache_beam.options.pipeline_options module¶
Pipeline options obtained from command line parsing.
-
class
apache_beam.options.pipeline_options.
PipelineOptions
(flags=None, **kwargs)[source]¶ Bases:
apache_beam.transforms.display.HasDisplayData
Pipeline options class used as container for command line options.
The class is essentially a wrapper over the standard argparse Python module (see https://docs.python.org/3/library/argparse.html). To define one option or a group of options you subclass from PipelineOptions:
class XyzOptions(PipelineOptions): @classmethod def _add_argparse_args(cls, parser): parser.add_argument('--abc', default='start') parser.add_argument('--xyz', default='end')
The arguments for the add_argument() method are exactly the ones described in the argparse public documentation.
Pipeline objects require an options object during initialization. This is obtained simply by initializing an options class as defined above:
p = Pipeline(options=XyzOptions()) if p.options.xyz == 'end': raise ValueError('Option xyz has an invalid value.')
By default the options classes will use command line arguments to initialize the options.
-
classmethod
from_dictionary
(options)[source]¶ Returns a PipelineOptions from a dictionary of arguments.
Parameters: options – Dictionary of argument value pairs. Returns: A PipelineOptions object representing the given arguments.
-
get_all_options
(drop_default=False)[source]¶ Returns a dictionary of all defined arguments.
Returns a dictionary of all defined arguments (arguments that are defined in any subclass of PipelineOptions) into a dictionary.
Parameters: drop_default – If set to true, options that are equal to their default values, are not returned as part of the result dictionary. Returns: Dictionary of all args and values.
-
classmethod
-
class
apache_beam.options.pipeline_options.
StandardOptions
(flags=None, **kwargs)[source]¶ Bases:
apache_beam.options.pipeline_options.PipelineOptions
-
DEFAULT_RUNNER
= 'DirectRunner'¶
-
-
class
apache_beam.options.pipeline_options.
DirectOptions
(flags=None, **kwargs)[source]¶ Bases:
apache_beam.options.pipeline_options.PipelineOptions
DirectRunner-specific execution options.
-
class
apache_beam.options.pipeline_options.
GoogleCloudOptions
(flags=None, **kwargs)[source]¶ Bases:
apache_beam.options.pipeline_options.PipelineOptions
Google Cloud Dataflow service execution options.
-
BIGQUERY_API_SERVICE
= 'bigquery.googleapis.com'¶
-
COMPUTE_API_SERVICE
= 'compute.googleapis.com'¶
-
DATAFLOW_ENDPOINT
= 'https://dataflow.googleapis.com'¶
-
STORAGE_API_SERVICE
= 'storage.googleapis.com'¶
-
-
class
apache_beam.options.pipeline_options.
WorkerOptions
(flags=None, **kwargs)[source]¶ Bases:
apache_beam.options.pipeline_options.PipelineOptions
Worker pool configuration options.
apache_beam.options.pipeline_options_validator module¶
Pipeline options validator.
For internal use only; no backwards-compatibility guarantees.
-
class
apache_beam.options.pipeline_options_validator.
PipelineOptionsValidator
(options, runner)[source]¶ Bases:
object
Validates PipelineOptions.
Goes through a list of known PipelineOption subclassess and calls:
validate(validator)
if one is implemented. Aggregates a list of validation errors from all and returns an aggregated list.
-
ENDPOINT_PATTERN
= 'https://[\\S]*googleapis\\.com[/]?'¶
-
ERR_INVALID_GCS_BUCKET
= 'Invalid GCS bucket (%s), given for the option: %s. See https://developers.google.com/storage/docs/bucketnaming for more details.'¶
-
ERR_INVALID_GCS_OBJECT
= 'Invalid GCS object (%s), given for the option: %s.'¶
-
ERR_INVALID_GCS_PATH
= 'Invalid GCS path (%s), given for the option: %s.'¶
-
ERR_INVALID_JOB_NAME
= 'Invalid job_name (%s); the name must consist of only the characters [-a-z0-9], starting with a letter and ending with a letter or number'¶
-
ERR_INVALID_NOT_POSITIVE
= 'Invalid value (%s) for option: %s. Value needs to be positive.'¶
-
ERR_INVALID_PROJECT_ID
= 'Invalid Project ID (%s). Please make sure you specified the Project ID, not project description.'¶
-
ERR_INVALID_PROJECT_NUMBER
= 'Invalid Project ID (%s). Please make sure you specified the Project ID, not project number.'¶
-
ERR_INVALID_TEST_MATCHER_TYPE
= 'Invalid value (%s) for option: %s. Please extend your matcher object from hamcrest.core.base_matcher.BaseMatcher.'¶
-
ERR_INVALID_TEST_MATCHER_UNPICKLABLE
= 'Invalid value (%s) for option: %s. Please make sure the test matcher is unpicklable.'¶
-
ERR_MISSING_GCS_PATH
= 'Missing GCS path option: %s.'¶
-
ERR_MISSING_OPTION
= 'Missing required option: %s.'¶
-
GCS_BUCKET
= '^[a-z0-9][-_a-z0-9.]+[a-z0-9]$'¶
-
GCS_SCHEME
= 'gs'¶
-
GCS_URI
= '(?P<SCHEME>[^:]+)://(?P<BUCKET>[^/]+)(/(?P<OBJECT>.*))?'¶
-
JOB_PATTERN
= '[a-z]([-a-z0-9]*[a-z0-9])?'¶
-
OPTIONS
= [<class 'apache_beam.options.pipeline_options.DebugOptions'>, <class 'apache_beam.options.pipeline_options.GoogleCloudOptions'>, <class 'apache_beam.options.pipeline_options.SetupOptions'>, <class 'apache_beam.options.pipeline_options.StandardOptions'>, <class 'apache_beam.options.pipeline_options.TypeOptions'>, <class 'apache_beam.options.pipeline_options.WorkerOptions'>, <class 'apache_beam.options.pipeline_options.TestOptions'>]¶
-
PROJECT_ID_PATTERN
= '[a-z][-a-z0-9:.]+[a-z0-9]'¶
-
PROJECT_NUMBER_PATTERN
= '[0-9]*'¶
-
is_full_string_match
(pattern, string)[source]¶ Returns True if the pattern matches the whole string.
-
validate
()[source]¶ Calls validate on subclassess and returns a list of errors.
validate will call validate method on subclasses, accumulate the returned list of errors, and returns the aggregate list.
Returns: Aggregate list of errors after all calling all possible validate methods.
-
validate_gcs_path
(view, arg_name)[source]¶ Validates a GCS path against gs://bucket/object URI format.
-
apache_beam.options.value_provider module¶
A ValueProvider class to implement templates with both statically and dynamically provided values.
-
class
apache_beam.options.value_provider.
RuntimeValueProvider
(option_name, value_type, default_value)[source]¶ Bases:
apache_beam.options.value_provider.ValueProvider
-
runtime_options
= None¶
-