Class HDFSSynchronization

java.lang.Object
org.apache.beam.sdk.io.hadoop.format.HDFSSynchronization
All Implemented Interfaces:
Serializable, ExternalSynchronization

public class HDFSSynchronization extends Object implements ExternalSynchronization
Implementation of ExternalSynchronization which registers locks in the HDFS.

Requires locksDir to be specified. This directory MUST be different that directory which is possibly stored under "mapreduce.output.fileoutputformat.outputdir" key. Otherwise setup of job will fail because the directory will exist before job setup.

See Also:
  • Constructor Details

    • HDFSSynchronization

      public HDFSSynchronization(String locksDir)
      Creates instance of HDFSSynchronization.
      Parameters:
      locksDir - directory where locks will be stored. This directory MUST be different that directory which is possibly stored under "mapreduce.output.fileoutputformat.outputdir" key. Otherwise setup of job will fail because the directory will exist before job setup.
  • Method Details

    • tryAcquireJobLock

      public boolean tryAcquireJobLock(org.apache.hadoop.conf.Configuration conf)
      Description copied from interface: ExternalSynchronization
      Tries to acquire lock for given job.
      Specified by:
      tryAcquireJobLock in interface ExternalSynchronization
      Parameters:
      conf - configuration bounded with given job.
      Returns:
      true if the lock was acquired, false otherwise.
    • releaseJobIdLock

      public void releaseJobIdLock(org.apache.hadoop.conf.Configuration conf)
      Description copied from interface: ExternalSynchronization
      Deletes lock ids bounded with given job if any exists.
      Specified by:
      releaseJobIdLock in interface ExternalSynchronization
      Parameters:
      conf - hadoop configuration of given job.
    • acquireTaskIdLock

      public org.apache.hadoop.mapreduce.TaskID acquireTaskIdLock(org.apache.hadoop.conf.Configuration conf)
      Description copied from interface: ExternalSynchronization
      Creates TaskID with unique id among given job.
      Specified by:
      acquireTaskIdLock in interface ExternalSynchronization
      Parameters:
      conf - hadoop configuration of given job.
      Returns:
      TaskID with unique id among given job.
    • acquireTaskAttemptIdLock

      public org.apache.hadoop.mapreduce.TaskAttemptID acquireTaskAttemptIdLock(org.apache.hadoop.conf.Configuration conf, int taskId)
      Description copied from interface: ExternalSynchronization
      Creates unique TaskAttemptID for given taskId.
      Specified by:
      acquireTaskAttemptIdLock in interface ExternalSynchronization
      Parameters:
      conf - configuration of given task and job
      taskId - id of the task
      Returns:
      Unique TaskAttemptID for given taskId.