apache_beam.utils.multi_process_shared module

Implements a shared object that spans processes.

This object will be instanciated once per VM and methods will be invoked on it via rpc.

apache_beam.utils.multi_process_shared.patched_autoproxy(token, serializer, manager=None, authkey=None, exposed=None, incref=True, manager_owned=True)[source]
class apache_beam.utils.multi_process_shared.MultiProcessShared(constructor: Callable[[], T], tag: Any, *, path: str = '/home/runner/work/beam/beam/beam/sdks/python/target/.tox/docs/tmp', always_proxy: Optional[bool] = None)[source]

Bases: typing.Generic

MultiProcessShared is used to share a single object across processes.

For example, one could have the class:

class MyExpensiveObject(object):
  def __init__(self, args):
    [expensive initialization and memory allocation]

  def method(self, arg):
    ...

One could share a single instance of this class by wrapping it as:

shared_ptr = MultiProcessShared(lambda: MyExpensiveObject(...))
my_expensive_object = shared_ptr.acquire()

which could then be invoked as:

my_expensive_object.method(arg)

This can then be released with:

shared_ptr.release(my_expensive_object)

but care should be taken to avoid releasing the object too soon or expensive re-initialization may be required, defeating the point of using a shared object.

Parameters:
  • constructor – function that initialises / constructs the object if not present in the cache. This function should take no arguments. It should return an initialised object, or raise an exception if the object could not be initialised / constructed.
  • tag – an indentifier to store with the cached object. If multiple MultiProcessShared instances are created with the same tag, they will all share the same proxied object.
  • path – a temporary path in which to create the inter-process lock
  • always_proxy – whether to direct all calls through the proxy, rather than call the object directly for the process that created it
acquire()[source]
release(obj)[source]