Friday, February 17, 2012

Identify user from python mapper in Hadoop

If you are running a mapreduce job using hadoop streaming with python, and want to know the user the job runs as or other parameters in the OS environment on the task tracker node, use

if(envdict.has_key('user_name')):
           user = os.environ['user_name']
            sys.stdout.write('Job runs as username: '+user)

To make sure what variables you have, use this

       envdict = os.environ
       keylist = envdict.keys()
       sys.stdout.write('Variables available'+str(keylist)+'\n')


No comments: