Thursday, April 24, 2014

Identifying the jar to which a particular class belongs

This can come quite handy when working on java stack traces. Use the command below to identify the jar which shipped the particular class.


[root@nmk-centos-65-2 ~]# javap -classpath `hadoop classpath` -sysinfo org.apache.hadoop.conf.Configuration
Classfile jar:file:/opt/mapr/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar!/org/apache/hadoop/conf/Configuration.class
  Last modified 29 Jan, 2014; size 35537 bytes
  MD5 checksum f1dc26ef7b8ee5aa93f9f1672d7229bd
  Compiled from "Configuration.java"
public class org.apache.hadoop.conf.Configuration implements java.lang.Iterable>, org.apache.hadoop.io.Writable {
  static final java.lang.String UNKNOWN_RESOURCE;
  public org.apache.hadoop.conf.Configuration();
  public org.apache.hadoop.conf.Configuration(boolean);
  public org.apache.hadoop.conf.Configuration(org.apache.hadoop.conf.Configuration);
  public static org.apache.hadoop.fs.Path getCachedCentralFilePath(org.apache.hadoop.fs.Path);
  public static void addRemoteResource(java.net.URI, boolean);
  public static void addDefaultResource(java.lang.String);
  public static void addDefaultResource(java.lang.String, java.lang.Object);
  public void addResource(java.lang.String);
  public void addResource(java.net.URL);
  public void addResource(org.apache.hadoop.fs.Path);
  public void addResource(java.io.InputStream);
  public synchronized void reloadConfiguration();
  public java.lang.String get(java.lang.String);
  public java.lang.String getTrimmed(java.lang.String);
  public java.lang.String getRaw(java.lang.String);
  public void set(java.lang.String, java.lang.String);
  public synchronized void unset(java.lang.String);
  public void setIfUnset(java.lang.String, java.lang.String);
  public java.lang.String get(java.lang.String, java.lang.String);
  public int getInt(java.lang.String, int);
  public void setInt(java.lang.String, int);
  public long getLong(java.lang.String, long);
  public void setLong(java.lang.String, long);
  public float getFloat(java.lang.String, float);
  public void setFloat(java.lang.String, float);
  public boolean getBoolean(java.lang.String, boolean);
  public void setBoolean(java.lang.String, boolean);
  public void setBooleanIfUnset(java.lang.String, boolean);
  public java.util.regex.Pattern getPattern(java.lang.String, java.util.regex.Pattern);
  public void setPattern(java.lang.String, java.util.regex.Pattern);
  public > void setEnum(java.lang.String, T);
  public > T getEnum(java.lang.String, T);
  public org.apache.hadoop.conf.Configuration$IntegerRanges getRange(java.lang.String, java.lang.String);
  public java.util.Collection getStringCollection(java.lang.String);
  public java.lang.String[] getStrings(java.lang.String);
  public java.lang.String[] getStrings(java.lang.String, java.lang.String...);
  public java.util.Collection getTrimmedStringCollection(java.lang.String);
  public java.lang.String[] getTrimmedStrings(java.lang.String);
  public java.lang.String[] getTrimmedStrings(java.lang.String, java.lang.String...);
  public void setStrings(java.lang.String, java.lang.String...);
  public java.lang.Class getClassByName(java.lang.String) throws java.lang.ClassNotFoundException;
  public java.lang.Class[] getClasses(java.lang.String, java.lang.Class...);
  public java.lang.Class getClass(java.lang.String, java.lang.Class);
  public java.lang.Class getClass(java.lang.String, java.lang.Class, java.lang.Class);
  public java.util.List getInstances(java.lang.String, java.lang.Class);
  public void setClass(java.lang.String, java.lang.Class, java.lang.Class);
  public org.apache.hadoop.fs.Path getLocalPath(java.lang.String, java.lang.String) throws java.io.IOException;
  public java.io.File getFile(java.lang.String, java.lang.String) throws java.io.IOException;
  public java.net.URL getResource(java.lang.String);
  public java.io.InputStream getConfResourceAsInputStream(java.lang.String);
  public java.io.Reader getConfResourceAsReader(java.lang.String);
  public int size();
  public void clear();
  public java.util.Iterator> iterator();
  public void writeXml(java.io.OutputStream) throws java.io.IOException;
  public void writeXml(java.io.Writer) throws java.io.IOException;
  public static void dumpConfiguration(org.apache.hadoop.conf.Configuration, java.io.Writer) throws java.io.IOException;
  public java.lang.ClassLoader getClassLoader();
  public void setClassLoader(java.lang.ClassLoader);
  public java.lang.String toString();
  public synchronized void setQuietMode(boolean);
  public static void main(java.lang.String[]) throws java.lang.Exception;
  public void readFields(java.io.DataInput) throws java.io.IOException;
  public void write(java.io.DataOutput) throws java.io.IOException;
  public java.util.Map getValByRegex(java.lang.String);
  public java.lang.String[] getLocalDirs();
  public int getIoFileBufferSize();
  static {};
}
[root@nmk-centos-65-2 ~]#

Friday, February 14, 2014

Populating Hbase tables with sample data


Quite often we may require to populate hbase tables for issue recreations. Here's a simple procedure for this ( from cloudavenue.com
) :

1) For creating a table 'testtable' with a column family 'colfam1'

create 'testtable', 'colfam1'

list 'testtable'

3) To test insert data into the 'testtable' table.

put 'testtable', 'myrow-1', 'colfam1:q1', 'value-1'
put 'testtable', 'myrow-2', 'colfam1:q2', 'value-2'
put 'testtable', 'myrow-2', 'colfam1:q3', 'value-3'

The HBase Shell is (J)Ruby’s IRB with some HBase-related commands added. Anything that can be done in IRB, can also be done in the HBase Shell. The below command will insert 1K rows into the 'testtable' table.

for i in '0'..'9' do for j in '0'..'9' do \
for k in '0'..'9' do put 'testtable', "row-#{i}#{j}#{k}", \
"colfam1:#{j}#{k}", "#{j}#{k}" end end end

4) For getting data from the 'testtable' table

get 'testtable', 'myrow-1'
scan 'testtable'

5) For deleting data from the 'testtable' table.

delete 'testtable', 'myrow-2', 'colfam1:q2'

6) For deleting the table.

disable 'testtable'
drop 'testtable'


If you want to test a sample csv import,

Use this bash one liner to generate a csv as you want :

for i in `seq 1 19`; do for j in `seq 1 9`; do for k in `seq 1 9`; do echo "row"$i",col"$j",value"$i"-"$j"-"$k; done; done; done

According to the amount of data you want to load, increase the max value of the loop variables.

This should produce output of the form

row1,col1,value1-1-1
row1,col1,value1-1-2
row1,col1,value1-1-3
row1,col1,value1-1-4
row1,col1,value1-1-5
row1,col1,value1-1-6
row1,col1,value1-1-7
row1,col1,value1-1-8
row1,col1,value1-1-9
row1,col2,value1-2-1
row1,col2,value1-2-2
row1,col2,value1-2-3
row1,col2,value1-2-4
row1,col2,value1-2-5
row1,col2,value1-2-6
row1,col2,value1-2-7

Create a sample table

create 'testtable', 'colfam1'

hadoop jar /opt/mapr/hbase/hbase-0.94.5/hbase-0.94.5-mapr.jar importtsv -Dimporttsv.columns=colfam1:row,colfam1:col,colfam1:val

Update :

In case you want to generate and print random data on the screen from a ruby shell,

irb(main):014:0> require 'securerandom'
=> true
irb(main):015:0> for i in '1'..'10' do puts SecureRandom.hex
irb(main):016:1> end
8917ccbb7f0bea0d54d0e98e12b416cf
9cd1865fd43482174b3088c6749075de
1d009056e9fcc0b2ddf4352eb824a97d
1abeb9bb4b0993ad732335818fdc8835
d41cf0ca16be930d0aa3925651a10ec4
732dc0d79e7b7d82e4b5ac21d8b00f5c
519fc21d6d0a76a467dd2f2d14741090
27fb689fd3d9b8f4b17b17535681214b
6454ff61e5ef116688ca172ba13aa80c
83ecb50f1e9ab42d1e320119e24a9a9c
=> "1".."10"
irb(main):017:0>

This can be used on the HBase shell to insert into the table

hbase(main):001:0> require 'securerandom'; for i in '0'..'9' do for j in '0'..'9' do \
for k in '0'..'9' do put 'testtable', SecureRandom.hex , \
"colfam1:#{j}#{k}", "#{j}#{k}" end end end


Can't open /tmp/mapr-hadoop/mapred/local/taskTracker/root/jobcache//jobToken for output - File exists

If you hit this error on any task attempt :

Can't open /tmp/mapr-hadoop/mapred/local/taskTracker/root/jobcache/job_201402140511_0001/jobToken for output - File exists

Eg.

14/02/14 06:07:09 INFO mapred.JobClient: Task Id : attempt_201402140511_0001_r_000001_0, Status : FAILED on node nmk-centos-60-3
Error initializing attempt_201402140511_0001_r_000001_0 java.io.IOException: Job initialization failed (255). with output: Reading task controller config from /opt/mapr/hadoop/hadoop-0.20.2/conf/taskcontroller.cfg
number of groups = 8
main : command provided 0
main : user is root
number of groups = 7
Can't open /tmp/mapr-hadoop/mapred/local/taskTracker/root/jobcache/job_201402140511_0001/jobToken for output - File exists
failed to copy credential file

at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:195)
at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1564)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1540)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1425)
at org.apache.hadoop.mapred.TaskTracker$6.run(TaskTracker.java:3802)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:322)
at org.apache.hadoop.util.Shell.run(Shell.java:249)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:442)
at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:188)

Chances are that you have either of the two issues :
1) Different UID GID mapping for the same user on the nodes in the cluster 
2) Incorrect credentials for /tmp/mapr-hadoop folder on the tasktracker where the job failed.

The simple fix for that is :

Stop tasktracker
Remove /tmp/mapr-hadoop/
Start TT again
Run job