From the notes of a system administrator: 2014

Thursday, April 24, 2014

Identifying the jar to which a particular class belongs

This can come quite handy when working on java stack traces. Use the command below to identify the jar which shipped the particular class.

[root@nmk-centos-65-2 ~]# javap -classpath `hadoop classpath` -sysinfo org.apache.hadoop.conf.Configuration
Classfile jar:file:/opt/mapr/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar!/org/apache/hadoop/conf/Configuration.class
Last modified 29 Jan, 2014; size 35537 bytes
MD5 checksum f1dc26ef7b8ee5aa93f9f1672d7229bd
Compiled from "Configuration.java"
public class org.apache.hadoop.conf.Configuration implements java.lang.Iterable>, org.apache.hadoop.io.Writable {
static final java.lang.String UNKNOWN_RESOURCE;
public org.apache.hadoop.conf.Configuration();
public org.apache.hadoop.conf.Configuration(boolean);
public org.apache.hadoop.conf.Configuration(org.apache.hadoop.conf.Configuration);
public static org.apache.hadoop.fs.Path getCachedCentralFilePath(org.apache.hadoop.fs.Path);
public static void addRemoteResource(java.net.URI, boolean);
public static void addDefaultResource(java.lang.String);
public static void addDefaultResource(java.lang.String, java.lang.Object);
public void addResource(java.lang.String);
public void addResource(java.net.URL);
public void addResource(org.apache.hadoop.fs.Path);
public void addResource(java.io.InputStream);
public synchronized void reloadConfiguration();
public java.lang.String get(java.lang.String);
public java.lang.String getTrimmed(java.lang.String);
public java.lang.String getRaw(java.lang.String);
public void set(java.lang.String, java.lang.String);
public synchronized void unset(java.lang.String);
public void setIfUnset(java.lang.String, java.lang.String);
public java.lang.String get(java.lang.String, java.lang.String);
public int getInt(java.lang.String, int);
public void setInt(java.lang.String, int);
public long getLong(java.lang.String, long);
public void setLong(java.lang.String, long);
public float getFloat(java.lang.String, float);
public void setFloat(java.lang.String, float);
public boolean getBoolean(java.lang.String, boolean);
public void setBoolean(java.lang.String, boolean);
public void setBooleanIfUnset(java.lang.String, boolean);
public java.util.regex.Pattern getPattern(java.lang.String, java.util.regex.Pattern);
public void setPattern(java.lang.String, java.util.regex.Pattern);
public > void setEnum(java.lang.String, T);
public > T getEnum(java.lang.String, T);
public org.apache.hadoop.conf.Configuration$IntegerRanges getRange(java.lang.String, java.lang.String);
public java.util.Collection getStringCollection(java.lang.String);
public java.lang.String[] getStrings(java.lang.String);
public java.lang.String[] getStrings(java.lang.String, java.lang.String...);
public java.util.Collection getTrimmedStringCollection(java.lang.String);
public java.lang.String[] getTrimmedStrings(java.lang.String);
public java.lang.String[] getTrimmedStrings(java.lang.String, java.lang.String...);
public void setStrings(java.lang.String, java.lang.String...);
public java.lang.Class getClassByName(java.lang.String) throws java.lang.ClassNotFoundException;
public java.lang.Class[] getClasses(java.lang.String, java.lang.Class...);
public java.lang.Class getClass(java.lang.String, java.lang.Class);
public java.lang.Class getClass(java.lang.String, java.lang.Class, java.lang.Class);
public java.util.List getInstances(java.lang.String, java.lang.Class);
public void setClass(java.lang.String, java.lang.Class, java.lang.Class);
public org.apache.hadoop.fs.Path getLocalPath(java.lang.String, java.lang.String) throws java.io.IOException;
public java.io.File getFile(java.lang.String, java.lang.String) throws java.io.IOException;
public java.net.URL getResource(java.lang.String);
public java.io.InputStream getConfResourceAsInputStream(java.lang.String);
public java.io.Reader getConfResourceAsReader(java.lang.String);
public int size();
public void clear();
public java.util.Iterator> iterator();
public void writeXml(java.io.OutputStream) throws java.io.IOException;
public void writeXml(java.io.Writer) throws java.io.IOException;
public static void dumpConfiguration(org.apache.hadoop.conf.Configuration, java.io.Writer) throws java.io.IOException;
public java.lang.ClassLoader getClassLoader();
public void setClassLoader(java.lang.ClassLoader);
public java.lang.String toString();
public synchronized void setQuietMode(boolean);
public static void main(java.lang.String[]) throws java.lang.Exception;
public void readFields(java.io.DataInput) throws java.io.IOException;
public void write(java.io.DataOutput) throws java.io.IOException;
public java.util.Map getValByRegex(java.lang.String);
public java.lang.String[] getLocalDirs();
public int getIoFileBufferSize();
static {};
}
[root@nmk-centos-65-2 ~]#

Friday, February 14, 2014

Populating Hbase tables with sample data

Quite often we may require to populate hbase tables for issue recreations. Here's a simple procedure for this ( from cloudavenue.com

) :

1) For creating a table 'testtable' with a column family 'colfam1'

create 'testtable', 'colfam1'

list 'testtable'

3) To test insert data into the 'testtable' table.

put 'testtable', 'myrow-1', 'colfam1:q1', 'value-1'
put 'testtable', 'myrow-2', 'colfam1:q2', 'value-2'
put 'testtable', 'myrow-2', 'colfam1:q3', 'value-3'

The HBase Shell is (J)Ruby’s IRB with some HBase-related commands added. Anything that can be done in IRB, can also be done in the HBase Shell. The below command will insert 1K rows into the 'testtable' table.

for i in '0'..'9' do for j in '0'..'9' do \
for k in '0'..'9' do put 'testtable', "row-#{i}#{j}#{k}", \
"colfam1:#{j}#{k}", "#{j}#{k}" end end end

4) For getting data from the 'testtable' table

get 'testtable', 'myrow-1'
scan 'testtable'

5) For deleting data from the 'testtable' table.

delete 'testtable', 'myrow-2', 'colfam1:q2'

6) For deleting the table.

disable 'testtable'
drop 'testtable'

If you want to test a sample csv import,

Use this bash one liner to generate a csv as you want :

for i in `seq 1 19`; do for j in `seq 1 9`; do for k in `seq 1 9`; do echo "row"$i",col"$j",value"$i"-"$j"-"$k; done; done; done

According to the amount of data you want to load, increase the max value of the loop variables.

This should produce output of the form

row1,col1,value1-1-1

row1,col1,value1-1-2

row1,col1,value1-1-3

row1,col1,value1-1-4

row1,col1,value1-1-5

row1,col1,value1-1-6

row1,col1,value1-1-7

row1,col1,value1-1-8

row1,col1,value1-1-9

row1,col2,value1-2-1

row1,col2,value1-2-2

row1,col2,value1-2-3

row1,col2,value1-2-4

row1,col2,value1-2-5

row1,col2,value1-2-6

row1,col2,value1-2-7

Create a sample table

create 'testtable', 'colfam1'

hadoop jar /opt/mapr/hbase/hbase-0.94.5/hbase-0.94.5-mapr.jar importtsv -Dimporttsv.columns=colfam1:row,colfam1:col,colfam1:val

Update :

In case you want to generate and print random data on the screen from a ruby shell,

irb(main):014:0> require 'securerandom'
=> true
irb(main):015:0> for i in '1'..'10' do puts SecureRandom.hex
irb(main):016:1> end
8917ccbb7f0bea0d54d0e98e12b416cf
9cd1865fd43482174b3088c6749075de
1d009056e9fcc0b2ddf4352eb824a97d
1abeb9bb4b0993ad732335818fdc8835
d41cf0ca16be930d0aa3925651a10ec4
732dc0d79e7b7d82e4b5ac21d8b00f5c
519fc21d6d0a76a467dd2f2d14741090
27fb689fd3d9b8f4b17b17535681214b
6454ff61e5ef116688ca172ba13aa80c
83ecb50f1e9ab42d1e320119e24a9a9c
=> "1".."10"
irb(main):017:0>

This can be used on the HBase shell to insert into the table

hbase(main):001:0> require 'securerandom'; for i in '0'..'9' do for j in '0'..'9' do \
for k in '0'..'9' do put 'testtable', SecureRandom.hex , \
"colfam1:#{j}#{k}", "#{j}#{k}" end end end

Can't open /tmp/mapr-hadoop/mapred/local/taskTracker/root/jobcache//jobToken for output - File exists

If you hit this error on any task attempt :

Can't open /tmp/mapr-hadoop/mapred/local/taskTracker/root/jobcache/job_201402140511_0001/jobToken for output - File exists

Eg.

14/02/14 06:07:09 INFO mapred.JobClient: Task Id : attempt_201402140511_0001_r_000001_0, Status : FAILED on node nmk-centos-60-3
Error initializing attempt_201402140511_0001_r_000001_0 java.io.IOException: Job initialization failed (255). with output: Reading task controller config from /opt/mapr/hadoop/hadoop-0.20.2/conf/taskcontroller.cfg
number of groups = 8
main : command provided 0
main : user is root
number of groups = 7
Can't open /tmp/mapr-hadoop/mapred/local/taskTracker/root/jobcache/job_201402140511_0001/jobToken for output - File exists
failed to copy credential file

at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:195)
at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1564)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1540)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1425)
at org.apache.hadoop.mapred.TaskTracker$6.run(TaskTracker.java:3802)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:322)
at org.apache.hadoop.util.Shell.run(Shell.java:249)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:442)
at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:188)

Chances are that you have either of the two issues :

1) Different UID GID mapping for the same user on the nodes in the cluster

2) Incorrect credentials for /tmp/mapr-hadoop folder on the tasktracker where the job failed.

The simple fix for that is :

Stop tasktracker
Remove /tmp/mapr-hadoop/
Start TT again
Run job