apache_beam.options.pipeline_options_validator module
Pipeline options validator.
For internal use only; no backwards-compatibility guarantees.
- class apache_beam.options.pipeline_options_validator.PipelineOptionsValidator(options, runner)[source]
Bases:
object
Validates PipelineOptions.
Goes through a list of known PipelineOption subclassess and calls:
validate(validator)
if one is implemented. Aggregates a list of validation errors from all and returns an aggregated list.
- OPTIONS = [<class 'apache_beam.options.pipeline_options.DebugOptions'>, <class 'apache_beam.options.pipeline_options.GoogleCloudOptions'>, <class 'apache_beam.options.pipeline_options.PortableOptions'>, <class 'apache_beam.options.pipeline_options.SetupOptions'>, <class 'apache_beam.options.pipeline_options.StandardOptions'>, <class 'apache_beam.options.pipeline_options.TestOptions'>, <class 'apache_beam.options.pipeline_options.TypeOptions'>, <class 'apache_beam.options.pipeline_options.WorkerOptions'>]
- REQUIRED_ENVIRONMENT_OPTIONS = {'DOCKER': [], 'EXTERNAL': ['external_service_address'], 'LOOPBACK': [], 'PROCESS': ['process_command']}
- OPTIONAL_ENVIRONMENT_OPTIONS = {'DOCKER': ['docker_container_image'], 'EXTERNAL': [], 'LOOPBACK': [], 'PROCESS': ['process_variables']}
- ERR_MISSING_OPTION = 'Missing required option: %s.'
- ERR_MISSING_GCS_PATH = 'Missing GCS path option: %s.'
- ERR_INVALID_GCS_PATH = 'Invalid GCS path (%s), given for the option: %s.'
- ERR_INVALID_GCS_BUCKET = 'Invalid GCS bucket (%s), given for the option: %s. See https://developers.google.com/storage/docs/bucketnaming for more details.'
- ERR_INVALID_GCS_OBJECT = 'Invalid GCS object (%s), given for the option: %s.'
- ERR_INVALID_JOB_NAME = 'Invalid job_name (%s); the name must consist of only the characters [-a-z0-9], starting with a letter and ending with a letter or number'
- ERR_INVALID_PROJECT_NUMBER = 'Invalid Project ID (%s). Please make sure you specified the Project ID, not project number.'
- ERR_INVALID_PROJECT_ID = 'Invalid Project ID (%s). Please make sure you specified the Project ID, not project description.'
- ERR_INVALID_ENDPOINT = 'Invalid url (%s) for dataflow endpoint. Please provide a valid url.'
- ERR_INVALID_NOT_POSITIVE = 'Invalid value (%s) for option: %s. Value needs to be positive.'
- ERR_INVALID_TEST_MATCHER_TYPE = 'Invalid value (%s) for option: %s. Please extend your matcher object from hamcrest.core.base_matcher.BaseMatcher.'
- ERR_INVALID_TEST_MATCHER_UNPICKLABLE = 'Invalid value (%s) for option: %s. Please make sure the test matcher is unpicklable.'
- ERR_INVALID_TRANSFORM_NAME_MAPPING = 'Invalid transform name mapping format. Please make sure the mapping is string key-value pairs. Invalid pair: (%s:%s)'
- ERR_INVALID_ENVIRONMENT = 'Option %s is not compatible with environment type %s.'
- ERR_ENVIRONMENT_CONFIG = 'Option environment_config is incompatible with option(s) %s.'
- ERR_MISSING_REQUIRED_ENVIRONMENT_OPTION = 'Option %s is required for environment type %s.'
- ERR_NUM_WORKERS_TOO_HIGH = 'num_workers (%s) cannot exceed max_num_workers (%s)'
- ERR_REPEATABLE_OPTIONS_NOT_SET_AS_LIST = '(%s) is a string. Programmatically set PipelineOptions like (%s) options need to be specified as a list.'
- GCS_URI = '(?P<SCHEME>[^:]+)://(?P<BUCKET>[^/]+)(/(?P<OBJECT>.*))?'
- GCS_BUCKET = '^[a-z0-9][-_a-z0-9.]+[a-z0-9]$'
- GCS_SCHEME = 'gs'
- JOB_PATTERN = '[a-z]([-a-z0-9]*[a-z0-9])?'
- PROJECT_ID_PATTERN = '[a-z][-a-z0-9:.]+[a-z0-9]'
- PROJECT_NUMBER_PATTERN = '[0-9]*'
- validate()[source]
Calls validate on subclassess and returns a list of errors.
validate will call validate method on subclasses, accumulate the returned list of errors, and returns the aggregate list.
- Returns:
Aggregate list of errors after all calling all possible validate methods.
- is_full_string_match(pattern, string)[source]
Returns True if the pattern matches the whole string.
- validate_gcs_path(view, arg_name)[source]
Validates a GCS path against gs://bucket/object URI format.
- validate_worker_region_zone(view)[source]
Validates Dataflow worker region and zone arguments are consistent.
- validate_optional_argument_positive(view, arg_name)[source]
Validates that an optional argument (if set) has a positive value.
- validate_test_matcher(view, arg_name)[source]
Validates that on_success_matcher argument if set.
Validates that on_success_matcher is unpicklable and is instance of hamcrest.core.base_matcher.BaseMatcher.
- validate_repeatable_argument_passed_as_list(view, arg_name)[source]
Validates that repeatable PipelineOptions like dataflow_service_options or experiments are specified as a list when set programmatically. This way, users do not inadvertently specify it as a string, mirroring the way they are set via the command lineRepeatable options, which are as passed a list.