Here's how to get pig code running with a java wrapper on mapr hadoop.
[root@nmk-centos-60-1 ~]# cat idmapreduce.java
import java.io.IOException;
import org.apache.pig.PigServer;
public class idmapreduce{
public static void main(String[] args) {
try {
PigServer pigServer = new PigServer("mapreduce");
runIdQuery(pigServer, "/test/Mapr_rpm_Files");
}
catch(Exception e) {
}
}
public static void runIdQuery(PigServer pigServer, String inputFile)
throws IOException {
pigServer.registerQuery("A = load '" + inputFile + "' using
PigStorage('/');");
pigServer.registerQuery("B = foreach A generate $0 as id;");
pigServer.store("B", "/test/idout");
}
}
[root@nmk-centos-60-1 ~]#
Then compile it
[root@nmk-centos-60-1 ~]# javac -cp
/opt/mapr/pig/pig-0.10/pig-0.
The binary is now in /root (my current working directory)
[root@nmk-centos-60-1 ~]# ls idmapreduce.*
idmapreduce.class idmapreduce.java
Then run the java wrapper program with the pig jar location,
[root@nmk-centos-60-1 pig-0.10]# ls
autocomplete* CHANGES.txt* contrib/ ivy.xml* lib-src/
mapr-build.properties* pig-0.10.0.jar* pigperf.jar*
README.txt* src/
bin/ conf/ doap_Pig.rdf* KEYS* license/
NOTICE.txt* pig-0.10.0-withouthadoop.jar*
pig-withouthadoop.jar* RELEASE_NOTES.txt* test/
build.xml* conf.new/ ivy/ lib/ LICENSE.txt*
pig-0.10.0-core.jar* pig.jar* readme.md*
shims/ tutorial/
and the location where the wrapper binary is located,
[root@nmk-centos-60-1 pig-0.10]# cd -
/root
and the whole classpath `hadoop classpath` expands to :
and also the location to the native io library as a -D option.
So the final command would be :
[root@nmk-centos-60-1 ~]# java -cp
/opt/mapr/pig/pig-0.10/pig.
-Djava.library.path=/opt/mapr/
13/04/03 09:33:48 INFO executionengine.
hadoop file system at: maprfs:///
13/04/03 09:33:48 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/04/03 09:33:48 INFO security.
JniBasedUnixGroupsMapping for Group resolution
13/04/03 09:33:48 INFO executionengine.
map-reduce job tracker at: maprfs:///
13/04/03 09:33:49 INFO pigstats.ScriptState: Pig features used in the
script: UNKNOWN
13/04/03 09:33:49 INFO mapReduceLayer.MRCompiler: File concatenation
threshold: 100 optimistic? false
13/04/03 09:33:49 INFO mapReduceLayer.
size before optimization: 1
13/04/03 09:33:49 INFO mapReduceLayer.
size after optimization: 1
13/04/03 09:33:49 INFO pigstats.ScriptState: Pig script settings are
added to the job
13/04/03 09:33:49 INFO mapReduceLayer.
mapred.job.reduce.markreset.
0.3
13/04/03 09:33:49 INFO mapReduceLayer.
file Job6414903787816249153.jar
13/04/03 09:33:56 INFO mapReduceLayer.
Job6414903787816249153.jar created
13/04/03 09:33:56 INFO mapReduceLayer.
single store job
13/04/03 09:33:56 INFO mapReduceLayer.
job(s) waiting for submission.
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=
GMT
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:host.name=nmk-
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.6.
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:os.version=2.6.32-
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client environment:user.name=root
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Initiating client
connection, connectString=nmkc1:5181,
sessionTimeout=30000 watcher=com.mapr.fs.
13/04/03 09:33:56 INFO zookeeper.ClientCnxn: Opening socket connection
to server nmkc3/10.10.80.93:5181
13/04/03 09:33:56 INFO zookeeper.ClientCnxn: Socket connection
established to nmkc3/10.10.80.93:5181, initiating session
13/04/03 09:33:56 INFO zookeeper.ClientCnxn: Session establishment
complete on server nmkc3/10.10.80.93:5181, sessionid =
0x23db71ab8a301d9, negotiated timeout = 30000
13/04/03 09:33:56 INFO fs.JobTrackerWatcher: Current running
JobTracker is: nmk-centos-60-1/10.10.80.91:
13/04/03 09:33:56 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
13/04/03 09:33:57 INFO mapReduceLayer.
13/04/03 09:33:57 INFO input.FileInputFormat: Total input paths to process : 1
13/04/03 09:33:57 INFO util.MapRedUtil: Total input paths to process : 1
13/04/03 09:33:57 WARN snappy.LoadSnappy: Snappy native library not loaded
13/04/03 09:33:57 INFO util.MapRedUtil: Total input paths (combined)
to process : 1
13/04/03 09:33:58 INFO mapReduceLayer.
job_201304010834_0003
13/04/03 09:33:58 INFO mapReduceLayer.
information at:
http://maprfs:50030/
13/04/03 09:33:58 INFO fs.JobTrackerWatcher: Current running
JobTracker is: nmk-centos-60-1/10.10.80.91:
13/04/03 09:34:17 INFO mapReduceLayer.
13/04/03 09:34:18 INFO mapReduceLayer.
13/04/03 09:34:18 INFO pigstats.SimplePigStats: Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.0.3 0.10.0 root 2013-04-03 09:33:49 2013-04-03 09:34:18 UNKNOWN
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime
MaxReduceTime MinReduceTime AvgReduceTime Alias Feature
Outputs
job_201304010834_0003 1 0 3 3 3 0
0 0 A,B MAP_ONLY /test/idout,
Input(s):
Successfully read 4826 records (5229 bytes) from: "/test/Mapr_rpm_Files"
Output(s):
Successfully stored 4826 records in: "/test/idout"
Counters:
Total records written : 4826
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201304010834_0003
13/04/03 09:34:18 INFO mapReduceLayer.
[root@nmk-centos-60-1 ~]#
To see output,
[root@nmk-centos-60-1 ~]# hadoop fs -ls /test*
Found 2 items
-rwxr-xr-x 3 root root 342071 2013-04-03 08:54 /test/Mapr_rpm_Files
drwxr-xr-x - root root 2 2013-04-03 09:34 /test/idout
[root@nmk-centos-60-1 ~]#
No comments:
Post a Comment