Friday, February 17, 2012

Identify user from python mapper in Hadoop

If you are running a mapreduce job using hadoop streaming with python, and want to know the user the job runs as or other parameters in the OS environment on the task tracker node, use

if(envdict.has_key('user_name')):
           user = os.environ['user_name']
            sys.stdout.write('Job runs as username: '+user)

To make sure what variables you have, use this

       envdict = os.environ
       keylist = envdict.keys()
       sys.stdout.write('Variables available'+str(keylist)+'\n')


Friday, February 10, 2012

Moving Hbase heap dump locations

Occasionally HBase can generate heap dumps on OOM and put it in the bin folder (which is default). This can cause quite a lot of issues, if the system partitioning is not prepared for huge gigabyte files in the bin folder for hbase. To move the dump cores to another folder ,

Do this in the file  hbase-ver-conf/hbase-env.sh, add  -XX:HeapDumpPath=/path/to/dump to the line

>> export HBASE_OPTS="$HBASE_OPTS -XX:+HeapDumpOnOutOfMemoryError $HBASE_GC"

Restart HBase.

Now you can sleep well without fears of heap dumps filling up the hbase binary partitions :)