Saturday, November 03, 2012

Pig HBase integration on MapR

This procedure helps integrate HBase transactions using Pig on MapR Hadoop clusters.

On the client node where you have installed Pig, in /opt/mapr/conf/env.sh

Add
export PIG_CLASSPATH=$PIG_CLASSPATH:/location-to-hbase-jar

If you are launching PIG on a node where you have hbase-regionserver
or hbase-master installed, then simply add the location of the
hbase-0.92-1.jar to the PIG_CLASSPATH variable above. Eg

export
PIG_CLASSPATH="$PIG_CLASSPATH:/opt/mapr/hbase/hbase-0.92.1/hbase-0.92.1.jar"

If you don't have hbase installed, the HBase jar can be copied over
directly from any node where hbase is installed to some location on
the pig client node. Include the location where you copied it to in
the above definition.Eg
export PIG_CLASSPATH=$PIG_CLASSPATH:/opt/mapr/lib/hbase-0.92.1.jar

Then identify your zookeeper nodes,
maprcli node listzookeepers

and accordingly add this variable to /opt/mapr/conf/env.sh

export PIG_OPTS="-Dhbase.zookeeper.property.clientPort=5181
-Dhbase.zookeeper.quorum=10.10.80.61,10.10.80.62,10.10.80.63"

Launch the pig job and you should be able to access HBase.
NB: Use only the HBase table name directly for accessing tables. Do not use
hbase:// prefixes. Eg script

Sample env.sh

[root@nmk-centos-60-3 ~]# cat /opt/mapr/conf/env.sh
#!/bin/bash
# Copyright (c) 2009 & onwards. MapR Tech, Inc., All rights reserved
# Please set all environment variable you want to be used during MapR cluster
# runtime here.
# namely MAPR_HOME, JAVA_HOME, MAPR_SUBNETS

export PIG_OPTS="-Dhbase.zookeeper.property.clientPort=5181
-Dhbase.zookeeper.quorum=10.10.80.61,10.10.80.62,10.10.80.63"
export
PIG_CLASSPATH="$PIG_CLASSPATH:/opt/mapr/hbase/hbase-0.92.1/conf:/usr/java/default/lib/tools.jar:/opt/mapr/hbase/hbase-0.92.1:/opt/mapr/hbase/hbase-0.92.1/hbase-0.92.1.jar"
export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:$PIG_CLASSPATH"
export CLASSPATH="$CLASSPATH:$HADOOP_CLASSPATH"
#export JAVA_HOME=
#export MAPR_SUBNETS=
#export MAPR_HOME=
#export MAPR_ULIMIT_U=
#export MAPR_ULIMIT_N=
#export MAPR_SYSCTL_SOMAXCONN=
#export PIG_CLASSPATH=:$PIG_CLASSPATH
[root@nmk-centos-60-3 ~]#

Sample hbase insertion script

[root@nmk-centos-60-3 nabeel]# cat hbase_pig.pig
raw_data = LOAD '/user/mapr/input2.csv' USING PigStorage(',') AS (
listing_id: chararray,
fname: chararray,
lname: chararray );

STORE raw_data INTO 'sample_names' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage (
'info:fname info:lname');