apache_beam.transforms.error_handling module

Utilities for gracefully handling errors and excluding bad elements.

class apache_beam.transforms.error_handling.ErrorHandler(consumer)[source]

Bases: object

ErrorHandlers are used to skip and otherwise process bad records.

Error handlers allow one to implement the “dead letter queue” pattern in a fluent manner, disaggregating the error processing specification from the main processing chain.

This is typically used as follows:

with error_handling.ErrorHandler(WriteToSomewhere(...)) as error_handler:
  result = pcoll | SomeTransform().with_error_handler(error_handler)

in which case errors encountered by SomeTransform()` in processing pcoll will be written by the PTransform WriteToSomewhere(…) and excluded from result rather than failing the pipeline.

To implement with_error_handling on a PTransform, one caches the provided error handler for use in expand. During expand() one can invoke error_handler.add_error_pcollection(…) any number of times with PCollections containing error records to be processed by the given error handler, or (if applicable) simply invoke with_error_handling(…) on any subtransforms.

The with_error_handling should accept None to indicate that error handling is not enabled (and make implementation-by-forwarding-error-handlers easier). In this case, any non-recoverable errors should fail the pipeline (e.g. propagate exceptions in process methods) rather than silently ignore errors.

close()[source]

Indicates all error-producing operations have reported any errors.

Invokes the provided error consuming PTransform on any provided error PCollections.

output()[source]

Returns result of applying the error consumer to the error pcollections.

add_error_pcollection(pcoll)[source]

Called by a class implementing error handling on the error records.

verify_closed()[source]

Called at end of pipeline construction to ensure errors are not ignored.

class apache_beam.transforms.error_handling.CollectingErrorHandler[source]

Bases: ErrorHandler

An ErrorHandler that simply collects all errors for further processing.

This ErrorHandler requires the set of errors be retrieved via output() and consumed (or explicitly discarded).

output()[source]
verify_closed()[source]