From the notes of a system administrator

Thursday, January 08, 2015

Scalding error: value toTypedPipe is not a member of cascading.pipe.Pipe

If you are running a scalding code and hit this error

[root@node code-scalding]# /root/scalding/scripts/scald.rb linecount.scala --input /mapr/my.cluster.com/wordcount/in/url+page.200.txt --output /mapr/my.cluster.com/wordcount/out/scal1
rsyncing 19.0M from scalding-core-assembly-0.13.0-SNAPSHOT.jar to my.host.here in background...
compiling linecount.scala
scalac -classpath /tmp/scald.rb/scala_home/2.10.4/scala-library-2.10.4.jar:/tmp/scald.rb/scala_home/2.10.4/scala-reflect-2.10.4.jar:/tmp/scald.rb/scala_home/2.10.4/scala-compiler-2.10.4.jar:/root/scalding/scalding-core/target/scala-2.10/scalding-core-assembly-0.13.0-SNAPSHOT.jar:/tmp/maven/hadoop-core-1.1.2.jar:/tmp/maven/commons-codec-1.8.jar:/tmp/maven/commons-configuration-1.9.jar:/tmp/maven/jackson-asl-0.9.5.jar:/tmp/maven/jackson-mapper-asl-1.9.13.jar:/tmp/maven/commons-lang-2.6.jar:/tmp/maven/slf4j-log4j12-1.6.6.jar:/tmp/maven/log4j-1.2.15.jar:/tmp/maven/commons-httpclient-3.1.jar:/tmp/maven/commons-cli-1.2.jar:/tmp/maven/commons-logging-1.1.1.jar:/tmp/maven/zookeeper-3.3.4.jar -d /tmp/script-build linecount.scala
linecount.scala:6: error: value toTypedPipe is not a member of cascading.pipe.Pipe
def matricesFromJson = TextLine(args("input")).read.toTypedPipe[String]('line).groupAll.size.values.jsonOutput
^
one error found
[SUGGESTION]: Try scald.rb --clean, you may have corrupt jars lying around
[root@node code-scalding]#

Fix:
Add this import to the code.

import TDsl._

Thursday, April 24, 2014

Identifying the jar to which a particular class belongs

This can come quite handy when working on java stack traces. Use the command below to identify the jar which shipped the particular class.

[root@nmk-centos-65-2 ~]# javap -classpath `hadoop classpath` -sysinfo org.apache.hadoop.conf.Configuration
Classfile jar:file:/opt/mapr/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar!/org/apache/hadoop/conf/Configuration.class
Last modified 29 Jan, 2014; size 35537 bytes
MD5 checksum f1dc26ef7b8ee5aa93f9f1672d7229bd
Compiled from "Configuration.java"
public class org.apache.hadoop.conf.Configuration implements java.lang.Iterable>, org.apache.hadoop.io.Writable {
static final java.lang.String UNKNOWN_RESOURCE;
public org.apache.hadoop.conf.Configuration();
public org.apache.hadoop.conf.Configuration(boolean);
public org.apache.hadoop.conf.Configuration(org.apache.hadoop.conf.Configuration);
public static org.apache.hadoop.fs.Path getCachedCentralFilePath(org.apache.hadoop.fs.Path);
public static void addRemoteResource(java.net.URI, boolean);
public static void addDefaultResource(java.lang.String);
public static void addDefaultResource(java.lang.String, java.lang.Object);
public void addResource(java.lang.String);
public void addResource(java.net.URL);
public void addResource(org.apache.hadoop.fs.Path);
public void addResource(java.io.InputStream);
public synchronized void reloadConfiguration();
public java.lang.String get(java.lang.String);
public java.lang.String getTrimmed(java.lang.String);
public java.lang.String getRaw(java.lang.String);
public void set(java.lang.String, java.lang.String);
public synchronized void unset(java.lang.String);
public void setIfUnset(java.lang.String, java.lang.String);
public java.lang.String get(java.lang.String, java.lang.String);
public int getInt(java.lang.String, int);
public void setInt(java.lang.String, int);
public long getLong(java.lang.String, long);
public void setLong(java.lang.String, long);
public float getFloat(java.lang.String, float);
public void setFloat(java.lang.String, float);
public boolean getBoolean(java.lang.String, boolean);
public void setBoolean(java.lang.String, boolean);
public void setBooleanIfUnset(java.lang.String, boolean);
public java.util.regex.Pattern getPattern(java.lang.String, java.util.regex.Pattern);
public void setPattern(java.lang.String, java.util.regex.Pattern);
public > void setEnum(java.lang.String, T);
public > T getEnum(java.lang.String, T);
public org.apache.hadoop.conf.Configuration$IntegerRanges getRange(java.lang.String, java.lang.String);
public java.util.Collection getStringCollection(java.lang.String);
public java.lang.String[] getStrings(java.lang.String);
public java.lang.String[] getStrings(java.lang.String, java.lang.String...);
public java.util.Collection getTrimmedStringCollection(java.lang.String);
public java.lang.String[] getTrimmedStrings(java.lang.String);
public java.lang.String[] getTrimmedStrings(java.lang.String, java.lang.String...);
public void setStrings(java.lang.String, java.lang.String...);
public java.lang.Class getClassByName(java.lang.String) throws java.lang.ClassNotFoundException;
public java.lang.Class[] getClasses(java.lang.String, java.lang.Class...);
public java.lang.Class getClass(java.lang.String, java.lang.Class);
public java.lang.Class getClass(java.lang.String, java.lang.Class, java.lang.Class);
public java.util.List getInstances(java.lang.String, java.lang.Class);
public void setClass(java.lang.String, java.lang.Class, java.lang.Class);
public org.apache.hadoop.fs.Path getLocalPath(java.lang.String, java.lang.String) throws java.io.IOException;
public java.io.File getFile(java.lang.String, java.lang.String) throws java.io.IOException;
public java.net.URL getResource(java.lang.String);
public java.io.InputStream getConfResourceAsInputStream(java.lang.String);
public java.io.Reader getConfResourceAsReader(java.lang.String);
public int size();
public void clear();
public java.util.Iterator> iterator();
public void writeXml(java.io.OutputStream) throws java.io.IOException;
public void writeXml(java.io.Writer) throws java.io.IOException;
public static void dumpConfiguration(org.apache.hadoop.conf.Configuration, java.io.Writer) throws java.io.IOException;
public java.lang.ClassLoader getClassLoader();
public void setClassLoader(java.lang.ClassLoader);
public java.lang.String toString();
public synchronized void setQuietMode(boolean);
public static void main(java.lang.String[]) throws java.lang.Exception;
public void readFields(java.io.DataInput) throws java.io.IOException;
public void write(java.io.DataOutput) throws java.io.IOException;
public java.util.Map getValByRegex(java.lang.String);
public java.lang.String[] getLocalDirs();
public int getIoFileBufferSize();
static {};
}
[root@nmk-centos-65-2 ~]#

Friday, February 14, 2014

Populating Hbase tables with sample data

Quite often we may require to populate hbase tables for issue recreations. Here's a simple procedure for this ( from cloudavenue.com

) :

1) For creating a table 'testtable' with a column family 'colfam1'

create 'testtable', 'colfam1'

list 'testtable'

3) To test insert data into the 'testtable' table.

put 'testtable', 'myrow-1', 'colfam1:q1', 'value-1'
put 'testtable', 'myrow-2', 'colfam1:q2', 'value-2'
put 'testtable', 'myrow-2', 'colfam1:q3', 'value-3'

The HBase Shell is (J)Ruby’s IRB with some HBase-related commands added. Anything that can be done in IRB, can also be done in the HBase Shell. The below command will insert 1K rows into the 'testtable' table.

for i in '0'..'9' do for j in '0'..'9' do \
for k in '0'..'9' do put 'testtable', "row-#{i}#{j}#{k}", \
"colfam1:#{j}#{k}", "#{j}#{k}" end end end

4) For getting data from the 'testtable' table

get 'testtable', 'myrow-1'
scan 'testtable'

5) For deleting data from the 'testtable' table.

delete 'testtable', 'myrow-2', 'colfam1:q2'

6) For deleting the table.

disable 'testtable'
drop 'testtable'

If you want to test a sample csv import,

Use this bash one liner to generate a csv as you want :

for i in `seq 1 19`; do for j in `seq 1 9`; do for k in `seq 1 9`; do echo "row"$i",col"$j",value"$i"-"$j"-"$k; done; done; done

According to the amount of data you want to load, increase the max value of the loop variables.

This should produce output of the form

row1,col1,value1-1-1

row1,col1,value1-1-2

row1,col1,value1-1-3

row1,col1,value1-1-4

row1,col1,value1-1-5

row1,col1,value1-1-6

row1,col1,value1-1-7

row1,col1,value1-1-8

row1,col1,value1-1-9

row1,col2,value1-2-1

row1,col2,value1-2-2

row1,col2,value1-2-3

row1,col2,value1-2-4

row1,col2,value1-2-5

row1,col2,value1-2-6

row1,col2,value1-2-7

Create a sample table

create 'testtable', 'colfam1'

hadoop jar /opt/mapr/hbase/hbase-0.94.5/hbase-0.94.5-mapr.jar importtsv -Dimporttsv.columns=colfam1:row,colfam1:col,colfam1:val

Update :

In case you want to generate and print random data on the screen from a ruby shell,

irb(main):014:0> require 'securerandom'
=> true
irb(main):015:0> for i in '1'..'10' do puts SecureRandom.hex
irb(main):016:1> end
8917ccbb7f0bea0d54d0e98e12b416cf
9cd1865fd43482174b3088c6749075de
1d009056e9fcc0b2ddf4352eb824a97d
1abeb9bb4b0993ad732335818fdc8835
d41cf0ca16be930d0aa3925651a10ec4
732dc0d79e7b7d82e4b5ac21d8b00f5c
519fc21d6d0a76a467dd2f2d14741090
27fb689fd3d9b8f4b17b17535681214b
6454ff61e5ef116688ca172ba13aa80c
83ecb50f1e9ab42d1e320119e24a9a9c
=> "1".."10"
irb(main):017:0>

This can be used on the HBase shell to insert into the table

hbase(main):001:0> require 'securerandom'; for i in '0'..'9' do for j in '0'..'9' do \
for k in '0'..'9' do put 'testtable', SecureRandom.hex , \
"colfam1:#{j}#{k}", "#{j}#{k}" end end end

Can't open /tmp/mapr-hadoop/mapred/local/taskTracker/root/jobcache//jobToken for output - File exists

If you hit this error on any task attempt :

Can't open /tmp/mapr-hadoop/mapred/local/taskTracker/root/jobcache/job_201402140511_0001/jobToken for output - File exists

Eg.

14/02/14 06:07:09 INFO mapred.JobClient: Task Id : attempt_201402140511_0001_r_000001_0, Status : FAILED on node nmk-centos-60-3
Error initializing attempt_201402140511_0001_r_000001_0 java.io.IOException: Job initialization failed (255). with output: Reading task controller config from /opt/mapr/hadoop/hadoop-0.20.2/conf/taskcontroller.cfg
number of groups = 8
main : command provided 0
main : user is root
number of groups = 7
Can't open /tmp/mapr-hadoop/mapred/local/taskTracker/root/jobcache/job_201402140511_0001/jobToken for output - File exists
failed to copy credential file

at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:195)
at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1564)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1540)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1425)
at org.apache.hadoop.mapred.TaskTracker$6.run(TaskTracker.java:3802)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:322)
at org.apache.hadoop.util.Shell.run(Shell.java:249)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:442)
at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:188)

Chances are that you have either of the two issues :

1) Different UID GID mapping for the same user on the nodes in the cluster

2) Incorrect credentials for /tmp/mapr-hadoop folder on the tasktracker where the job failed.

The simple fix for that is :

Stop tasktracker
Remove /tmp/mapr-hadoop/
Start TT again
Run job

Wednesday, November 20, 2013

Hive syntax errors - NoViableAltException

I was trying to load a csv into hive and kept hitting the errors below. Though there was a very simple fix, googling did not give much pointers. Hence posting here.

All errors below were solved by simply removing the quotes around the tablename.

hive> LOAD DATA INPATH '/user/root/retroID.txt' INTO TABLE 'retroID' ;
NoViableAltException(282@[179:1: tableName : (db= identifier DOT tab= identifier -> ^( TOK_TABNAME $db $tab) |tab= identifier -> ^( TOK_TABNAME $tab) );])
at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
at org.antlr.runtime.DFA.predict(DFA.java:144)
at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.tableName(HiveParser_FromClauseParser.java:3737)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableName(HiveParser.java:30629)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.tableOrPartition(HiveParser_IdentifiersParser.java:7967)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableOrPartition(HiveParser.java:30676)
at org.apache.hadoop.hive.ql.parse.HiveParser.loadStatement(HiveParser.java:1429)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1228)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:928)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:418)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
FAILED: ParseException line 1:53 cannot recognize input near ''retroID'' '' '' in table name

hive> LOAD DATA LOCAL INPATH '/user/root/retroID.txt' INTO TABLE 'retroID' PARTITION (debut_date='05/13/1994');
NoViableAltException(282@[179:1: tableName : (db= identifier DOT tab= identifier -> ^( TOK_TABNAME $db $tab) |tab= identifier -> ^( TOK_TABNAME $tab) );])
at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
at org.antlr.runtime.DFA.predict(DFA.java:144)
at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.tableName(HiveParser_FromClauseParser.java:3737)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableName(HiveParser.java:30629)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.tableOrPartition(HiveParser_IdentifiersParser.java:7967)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableOrPartition(HiveParser.java:30676)
at org.apache.hadoop.hive.ql.parse.HiveParser.loadStatement(HiveParser.java:1429)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1228)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:928)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:418)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
FAILED: ParseException line 1:59 cannot recognize input near ''retroID'' 'PARTITION' '(' in table name

hive> describe retroID;
OK
last_name string None
first_name string None
player_id string None
debut_date string None

# Partition Information
# col_name data_type comment

debut_date string None
Time taken: 0.099 seconds, Fetched: 9 row(s)

hive> LOAD DATA LOCAL INPATH '/user/root/retroID.txt' OVERWRITES6 INTO TABLE 'retroID' PARTITION (debut_date='05/13/1994');
NoViableAltException(282@[179:1: tableName : (db= identifier DOT tab= identifier -> ^( TOK_TABNAME $db $tab) |tab= identifier -> ^( TOK_TABNAME $tab) );])
at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
at org.antlr.runtime.DFA.predict(DFA.java:144)
at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.tableName(HiveParser_FromClauseParser.java:3737)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableName(HiveParser.java:30629)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.tableOrPartition(HiveParser_IdentifiersParser.java:7967)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableOrPartition(HiveParser.java:30676)
at org.apache.hadoop.hive.ql.parse.HiveParser.loadStatement(HiveParser.java:1429)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1228)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:928)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:418)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
FAILED: ParseException line 1:48 extraneous input 'OVERWRITES6' expecting INTO near 'TABLE' in table name
line 1:71 cannot recognize input near ''retroID'' 'PARTITION' '(' in table name

hive> LOAD DATA LOCAL INPATH '/user/root/retroID.txt' OVERWRITE INTO TABLE 'retroID' PARTITION (debut_date='05/13/1994');
NoViableAltException(282@[179:1: tableName : (db= identifier DOT tab= identifier -> ^( TOK_TABNAME $db $tab) |tab= identifier -> ^( TOK_TABNAME $tab) );])
at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
at org.antlr.runtime.DFA.predict(DFA.java:144)
at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.tableName(HiveParser_FromClauseParser.java:3737)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableName(HiveParser.java:30629)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.tableOrPartition(HiveParser_IdentifiersParser.java:7967)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableOrPartition(HiveParser.java:30676)
at org.apache.hadoop.hive.ql.parse.HiveParser.loadStatement(HiveParser.java:1429)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1228)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:928)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:418)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
FAILED: ParseException line 1:69 cannot recognize input near ''retroID'' 'PARTITION' '(' in table name

hive> LOAD DATA INPATH '/user/root/retroID.txt' OVERWRITE INTO TABLE 'retroID' PARTITION (debut_date='05/13/1994');
NoViableAltException(282@[179:1: tableName : (db= identifier DOT tab= identifier -> ^( TOK_TABNAME $db $tab) |tab= identifier -> ^( TOK_TABNAME $tab) );])
at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
at org.antlr.runtime.DFA.predict(DFA.java:144)
at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.tableName(HiveParser_FromClauseParser.java:3737)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableName(HiveParser.java:30629)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.tableOrPartition(HiveParser_IdentifiersParser.java:7967)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableOrPartition(HiveParser.java:30676)
at org.apache.hadoop.hive.ql.parse.HiveParser.loadStatement(HiveParser.java:1429)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1228)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:928)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:418)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

The successful runs

hive> LOAD DATA INPATH '/user/root/retroID.txt' OVERWRITE INTO TABLE retroID PARTITION (debut_date='05/13/1994');

Loading data to table default.retroid partition (debut_date=05/13/1994)

Partition default.retroid{debut_date=05/13/1994} stats: [num_files: 1, num_rows: 0, total_size: 631160, raw_data_size: 0]

Table default.retroid stats: [num_partitions: 1, num_files: 1, num_rows: 0, total_size: 631160, raw_data_size: 0]

Time taken: 2.431 seconds

hive>

hive> LOAD DATA LOCAL INPATH '/root/retroID.txt' INTO TABLE retroid1;

Copying data from file:/root/retroID.txt

Copying file: file:/root/retroID.txt

Loading data to table default.retroid1

Table default.retroid1 stats: [num_partitions: 0, num_files: 2, num_rows: 0, total_size: 1262320, raw_data_size: 0]

Time taken: 2.292 seconds

hive> describe extended retroid;

last_name string None

first_name string None

player_id string None

debut_date string None

# Partition Information

# col_name data_type comment

debut_date string None

Detailed Table Information Table(tableName:retroid, dbName:default, owner:root, createTime:1384933577, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:last_name, type:string, comment:null), FieldSchema(name:first_name, type:string, comment:null), FieldSchema(name:player_id, type:string, comment:null), FieldSchema(name:debut_date, type:string, comment:null)], location:maprfs:/user/hive/warehouse/retroid, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=,, field.delim=,}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[FieldSchema(name:debut_date, type:string, comment:null)], parameters:{transient_lastDdlTime=1384933577}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)

Time taken: 0.178 seconds, Fetched: 11 row(s)

hive> describe formatted retroid;

# col_name data_type comment

last_name string None

first_name string None

player_id string None

# Partition Information

# col_name data_type comment

debut_date string None

# Detailed Table Information

Database: default

Owner: root

CreateTime: Tue Nov 19 23:46:17 PST 2013

LastAccessTime: UNKNOWN

Protect Mode: None

Retention: 0

Location: maprfs:/user/hive/warehouse/retroid

Table Type: MANAGED_TABLE

Table Parameters:

transient_lastDdlTime 1384933577

# Storage Information

SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

InputFormat: org.apache.hadoop.mapred.TextInputFormat

OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

Compressed: No

Num Buckets: -1

Bucket Columns: []

Sort Columns: []

Storage Desc Params:

field.delim ,

serialization.format ,

Time taken: 0.096 seconds, Fetched: 34 row(s)

hive>

Wednesday, April 03, 2013

Running pig job inside java wrapper on mapr hadoop

Here's how to get pig code running with a java wrapper on mapr hadoop.

[root@nmk-centos-60-1 ~]# cat idmapreduce.java
import java.io.IOException;
import org.apache.pig.PigServer;
public class idmapreduce{
public static void main(String[] args) {
try {
PigServer pigServer = new PigServer("mapreduce");
runIdQuery(pigServer, "/test/Mapr_rpm_Files");
}
catch(Exception e) {
}
}
public static void runIdQuery(PigServer pigServer, String inputFile)
throws IOException {
pigServer.registerQuery("A = load '" + inputFile + "' using
PigStorage('/');");
pigServer.registerQuery("B = foreach A generate $0 as id;");
pigServer.store("B", "/test/idout");
}
}
[root@nmk-centos-60-1 ~]#

Then compile it

[root@nmk-centos-60-1 ~]# javac -cp
/opt/mapr/pig/pig-0.10/pig-0.10.0.jar idmapreduce.java

The binary is now in /root (my current working directory)

[root@nmk-centos-60-1 ~]# ls idmapreduce.*
idmapreduce.class idmapreduce.java
Then run the java wrapper program with the pig jar location,

[root@nmk-centos-60-1 pig-0.10]# ls
autocomplete* CHANGES.txt* contrib/ ivy.xml* lib-src/
mapr-build.properties* pig-0.10.0.jar* pigperf.jar*
README.txt* src/
bin/ conf/ doap_Pig.rdf* KEYS* license/
NOTICE.txt* pig-0.10.0-withouthadoop.jar*
pig-withouthadoop.jar* RELEASE_NOTES.txt* test/
build.xml* conf.new/ ivy/ lib/ LICENSE.txt*
pig-0.10.0-core.jar* pig.jar* readme.md*
shims/ tutorial/

and the location where the wrapper binary is located,

[root@nmk-centos-60-1 pig-0.10]# cd -
/root
and the whole classpath `hadoop classpath` expands to :

and also the location to the native io library as a -D option.

So the final command would be :

[root@nmk-centos-60-1 ~]# java -cp
/opt/mapr/pig/pig-0.10/pig.jar:.:`hadoop classpath`
-Djava.library.path=/opt/mapr/hadoop/hadoop-0.20.2/lib/native/Linux-amd64-64 idmapreduce

13/04/03 09:33:48 INFO executionengine.HExecutionEngine: Connecting to
hadoop file system at: maprfs:///
13/04/03 09:33:48 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/04/03 09:33:48 INFO security.JniBasedUnixGroupsMapping: Using
JniBasedUnixGroupsMapping for Group resolution
13/04/03 09:33:48 INFO executionengine.HExecutionEngine: Connecting to
map-reduce job tracker at: maprfs:///
13/04/03 09:33:49 INFO pigstats.ScriptState: Pig features used in the
script: UNKNOWN
13/04/03 09:33:49 INFO mapReduceLayer.MRCompiler: File concatenation
threshold: 100 optimistic? false
13/04/03 09:33:49 INFO mapReduceLayer.MultiQueryOptimizer: MR plan
size before optimization: 1
13/04/03 09:33:49 INFO mapReduceLayer.MultiQueryOptimizer: MR plan
size after optimization: 1
13/04/03 09:33:49 INFO pigstats.ScriptState: Pig script settings are
added to the job
13/04/03 09:33:49 INFO mapReduceLayer.JobControlCompiler:
mapred.job.reduce.markreset.buffer.percent is not set, set to default
0.3
13/04/03 09:33:49 INFO mapReduceLayer.JobControlCompiler: creating jar
file Job6414903787816249153.jar
13/04/03 09:33:56 INFO mapReduceLayer.JobControlCompiler: jar file
Job6414903787816249153.jar created
13/04/03 09:33:56 INFO mapReduceLayer.JobControlCompiler: Setting up
single store job
13/04/03 09:33:56 INFO mapReduceLayer.MapReduceLauncher: 1 map-reduce
job(s) waiting for submission.
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.3.6-1366786, built on 07/29/2012 06:22
GMT
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:host.name=nmk-centos-60-1
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_25
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/java/jdk1.6.0_25/jre
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/opt/mapr/pig/pig-0.10/pig.jar:.:/opt/mapr/hadoop/hadoop-0.20.2/bin/../conf:/usr/java/default/lib/tools.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/..:/opt/mapr/hadoop/hadoop-0.20.2/bin/../hadoop*core*.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/amazon-s3.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/asm-3.2.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/aspectjrt-1.6.5.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/aspectjtools-1.6.5.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/aws-java-sdk-1.3.26.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-cli-1.2.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-codec-1.5.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-configuration-1.8.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-daemon-1.0.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-el-1.0.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-httpclient-3.0.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-httpclient-3.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-lang-2.6.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-logging-1.0.4.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-logging-1.1.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-logging-api-1.0.4.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-math-2.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-net-1.4.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-net-3.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/core-3.1.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/emr-metrics-1.0.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/eval-0.5.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/gson-1.4.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/guava-13.0.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/hadoop-0.20.2-dev-capacity-scheduler.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/hadoop-0.20.2-dev-core.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/hadoop-0.20.2-dev-fairscheduler.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/hsqldb-1.8.0.10.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/httpclient-4.1.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/httpcore-4.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/jackson-core-asl-1.5.2.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/jackson-mapper-asl-1.5.2.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/jasper-compiler-5.5.12.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/jasper-runtime-5.5.12.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/jersey-core-1.8.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/jersey-json-1.8.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/jersey-server-1.8.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/jets3t-0.6.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/jetty-6.1.14.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/jetty-servlet-tester-6.1.14.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/jetty-util-6.1.14.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/junit-4.5.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/kfs-0.2.2.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/log4j-1.2.15.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/logging-0.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/maprfs-0.20.2-2.1.2.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/maprfs-jni-0.20.2-2.1.2.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/mockito-all-1.8.2.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/mockito-all-1.8.5.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/mysql-connector-java-5.0.8-bin.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/oro-2.0.8.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/protobuf-java-2.4.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/servlet-api-2.5-6.1.14.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/slf4j-api-1.4.3.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/slf4j-log4j12-1.4.3.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/xmlenc-0.52.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/zookeeper-3.3.6.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/jsp-2.1/jsp-2.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/jsp-2.1/jsp-api-2.1.jar
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/opt/mapr/hadoop/hadoop-0.20.2/lib/native/Linux-amd64-64
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client
environment:os.version=2.6.32-71.el6.x86_64
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client environment:user.name=root
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root
13/04/03 09:33:56 INFO zookeeper.ZooKeeper: Initiating client
connection, connectString=nmkc1:5181,nmkc2:5181,nmkc3:5181
sessionTimeout=30000 watcher=com.mapr.fs.JobTrackerWatcher@40bb2bc3
13/04/03 09:33:56 INFO zookeeper.ClientCnxn: Opening socket connection
to server nmkc3/10.10.80.93:5181
13/04/03 09:33:56 INFO zookeeper.ClientCnxn: Socket connection
established to nmkc3/10.10.80.93:5181, initiating session
13/04/03 09:33:56 INFO zookeeper.ClientCnxn: Session establishment
complete on server nmkc3/10.10.80.93:5181, sessionid =
0x23db71ab8a301d9, negotiated timeout = 30000
13/04/03 09:33:56 INFO fs.JobTrackerWatcher: Current running
JobTracker is: nmk-centos-60-1/10.10.80.91:9001
13/04/03 09:33:56 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
13/04/03 09:33:57 INFO mapReduceLayer.MapReduceLauncher: 0% complete
13/04/03 09:33:57 INFO input.FileInputFormat: Total input paths to process : 1
13/04/03 09:33:57 INFO util.MapRedUtil: Total input paths to process : 1
13/04/03 09:33:57 WARN snappy.LoadSnappy: Snappy native library not loaded
13/04/03 09:33:57 INFO util.MapRedUtil: Total input paths (combined)
to process : 1
13/04/03 09:33:58 INFO mapReduceLayer.MapReduceLauncher: HadoopJobId:
job_201304010834_0003
13/04/03 09:33:58 INFO mapReduceLayer.MapReduceLauncher: More
information at:
http://maprfs:50030/jobdetails.jsp?jobid=job_201304010834_0003
13/04/03 09:33:58 INFO fs.JobTrackerWatcher: Current running
JobTracker is: nmk-centos-60-1/10.10.80.91:9001
13/04/03 09:34:17 INFO mapReduceLayer.MapReduceLauncher: 50% complete
13/04/03 09:34:18 INFO mapReduceLayer.MapReduceLauncher: 100% complete
13/04/03 09:34:18 INFO pigstats.SimplePigStats: Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.0.3 0.10.0 root 2013-04-03 09:33:49 2013-04-03 09:34:18 UNKNOWN

Success!

Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime
MaxReduceTime MinReduceTime AvgReduceTime Alias Feature
Outputs
job_201304010834_0003 1 0 3 3 3 0
0 0 A,B MAP_ONLY /test/idout,

Input(s):
Successfully read 4826 records (5229 bytes) from: "/test/Mapr_rpm_Files"

Output(s):
Successfully stored 4826 records in: "/test/idout"

Counters:
Total records written : 4826
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201304010834_0003

13/04/03 09:34:18 INFO mapReduceLayer.MapReduceLauncher: Success!
[root@nmk-centos-60-1 ~]#

To see output,

[root@nmk-centos-60-1 ~]# hadoop fs -ls /test*
Found 2 items
-rwxr-xr-x 3 root root 342071 2013-04-03 08:54 /test/Mapr_rpm_Files
drwxr-xr-x - root root 2 2013-04-03 09:34 /test/idout
[root@nmk-centos-60-1 ~]#

Thursday, March 07, 2013

Error sqoop seems to think you are running a JRE, Please install JDK

If you hit this error

13/03/07 12:31:51 ERROR orm.CompilationManager: It seems as though you are running sqoop with a JRE.
13/03/07 12:31:51 ERROR orm.CompilationManager: Sqoop requires a JDK that can compile Java code.
13/03/07 12:31:51 ERROR orm.CompilationManager: Please install a JDK and set $JAVA_HOME to use it.
13/03/07 12:31:51 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Could not start Java compiler.
at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:175)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:83)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:390)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)

you need to install a package that has javac

like java-1.6.0-openjdk-devel

Thursday, January 10, 2013

unable to resolve class org.apache.solr.client.solrj.impl.CommonsHttpSolrServer

[root@nmk-centos-60-3 samples]# groovy bugadd.groovy 7603.xml

org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
/opt/bugs/sources/samples/bugadd.groovy: 4: unable to resolve class org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
@ line 4, column 1.
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
^

1 error

[root@nmk-centos-60-3 samples]#

The problem was that CommonsHttpSolrServer was changed to HttpSolrServer and StreamingUpdateSolrServer is now ConcurrentUpdateSolrServer as of solr 4.0.

So replace the lines

import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer

with

import org.apache.solr.client.solrj.impl.HttpSolrServer

and

SolrServer server = new CommonsHttpSolrServer("http://10.10.80.93:8080/solr/");

with

SolrServer server = new HttpSolrServer("http://10.10.80.93:8080/solr/");

and you should be ready to go.

Saturday, November 03, 2012

Pig HBase integration on MapR

This procedure helps integrate HBase transactions using Pig on MapR Hadoop clusters.

On the client node where you have installed Pig, in /opt/mapr/conf/env.sh

Add
export PIG_CLASSPATH=$PIG_CLASSPATH:/location-to-hbase-jar

If you are launching PIG on a node where you have hbase-regionserver
or hbase-master installed, then simply add the location of the
hbase-0.92-1.jar to the PIG_CLASSPATH variable above. Eg

export
PIG_CLASSPATH="$PIG_CLASSPATH:/opt/mapr/hbase/hbase-0.92.1/hbase-0.92.1.jar"

If you don't have hbase installed, the HBase jar can be copied over
directly from any node where hbase is installed to some location on
the pig client node. Include the location where you copied it to in
the above definition.Eg
export PIG_CLASSPATH=$PIG_CLASSPATH:/opt/mapr/lib/hbase-0.92.1.jar

Then identify your zookeeper nodes,
maprcli node listzookeepers

and accordingly add this variable to /opt/mapr/conf/env.sh

export PIG_OPTS="-Dhbase.zookeeper.property.clientPort=5181
-Dhbase.zookeeper.quorum=10.10.80.61,10.10.80.62,10.10.80.63"

Launch the pig job and you should be able to access HBase.
NB: Use only the HBase table name directly for accessing tables. Do not use
hbase:// prefixes. Eg script

Sample env.sh

[root@nmk-centos-60-3 ~]# cat /opt/mapr/conf/env.sh
#!/bin/bash
# Copyright (c) 2009 & onwards. MapR Tech, Inc., All rights reserved
# Please set all environment variable you want to be used during MapR cluster
# runtime here.
# namely MAPR_HOME, JAVA_HOME, MAPR_SUBNETS

export PIG_OPTS="-Dhbase.zookeeper.property.clientPort=5181
-Dhbase.zookeeper.quorum=10.10.80.61,10.10.80.62,10.10.80.63"
export
PIG_CLASSPATH="$PIG_CLASSPATH:/opt/mapr/hbase/hbase-0.92.1/conf:/usr/java/default/lib/tools.jar:/opt/mapr/hbase/hbase-0.92.1:/opt/mapr/hbase/hbase-0.92.1/hbase-0.92.1.jar"
export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:$PIG_CLASSPATH"
export CLASSPATH="$CLASSPATH:$HADOOP_CLASSPATH"
#export JAVA_HOME=
#export MAPR_SUBNETS=
#export MAPR_HOME=
#export MAPR_ULIMIT_U=
#export MAPR_ULIMIT_N=
#export MAPR_SYSCTL_SOMAXCONN=
#export PIG_CLASSPATH=:$PIG_CLASSPATH
[root@nmk-centos-60-3 ~]#

Sample hbase insertion script

[root@nmk-centos-60-3 nabeel]# cat hbase_pig.pig
raw_data = LOAD '/user/mapr/input2.csv' USING PigStorage(',') AS (
listing_id: chararray,
fname: chararray,
lname: chararray );

STORE raw_data INTO 'sample_names' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage (
'info:fname info:lname');

Thursday, October 04, 2012

Awk display / print file from column N onwards

This can be piped to any output to display the lines from the second column onwards.

awk -F: -v nr=N '{ for (x=nr; x<=NF; x++) {printf $x " "};print " ";}'

An example usage, find the first disk re-inited by MapRFS this month.

find . -name mfs.log -exec grep -H ^2012\-10\-* {} \; | grep spinit.cc:1002 | awk -F: -v nr=2 '{ for (x=nr; x<=NF; x++) \
{printf $x " "};print " ";}' | awk -v nr=2 '{ for (x=nr; x<=NF; x++) {printf $x " "};print " ";}' | sort -n | head -n 5

UPDATE:

Just realized cut could do this in a much simpler wa

cut -d : -f N-

where N is the column from where you need to display the lines.

Thursday, September 13, 2012

Finding the culprit in df disk utilization different from du

Sometimes df and du show different outputs due to the open file descriptor issue. That's explained in detail all over google, so I'm not touching that here. Here's a quick command to find the culprit that makes the difference. This shows the top 5 files that consume around 1GB of space. You can adjust the number after the digit regex to change that. Keep in mind, occasionally temp files have names that can run into digits, so that will return in the grep results, but the sort should be accurately on the filesize columns.

lsof -n -P | grep -E [[:digit:]]{10} | sort -k 7 | head -n 5

Monday, July 02, 2012

"java.io.IOException: Could not create FileClient"

"java.io.IOException: Could not create FileClient"

If you see this error in the jobtracker logs for a failed job, you are most likely looking at an irrelevant error. To be clearer, your job failed before this point and was subsequently killed by the jobtracker. Some processes that linger in memory after the job was killed keeps attempting to access some of the resources assigned to them and this continues till they time out and eventually get killed. The resources, for eg. the file or parent directory etc. would have been removed as part of killing the job by the job tracker, hence the stray process would not be able to create any new files.

Saturday, June 23, 2012

HBase fixing region missing Region //regionname// on HDFS but not listed in .META or deployed on any server

If you see this error on HBase

Region //regionname// on HDFS but not listed in .META or deployed on any server

then use the add_regions.rb script to have this fixed.

Normally hbase errors can be fixed by hbase hbck -fix
But this approach works mostly in cases where the region is listed in .META but not assigned etc.

In such cases, the script add_regions.rb comes to our rescue. It can be used as follows:

hbase org.jruby.Main add_regions.rb //full-path-region-on-HDFS/.regioninfo

This will add the region to the .META. table. Next assign the region to a region-server. To do this , launch hbase shell and issue the command

assign 'full-region-name'

on the hbase shell.

The full region-name includes the standard format tablename.startrowkey.timestamp.md5sum.

The dot at the end is important and should be included while passing to the assign command.

Now run hbase hbck again and look at status 'simple' on the shell. Your region numbers should have increased according to the regions you have added now. The errors related to not listed in .META should be gone now.

Tuesday, June 12, 2012

Python copy absolute value of array elements into another array

In the array below, some values are positive, while others are negative.

>>> x = [-538,-181,-145,552,-847,6,141,-58,-122,314,-816,245,594,-613,-287,1232,-1479,-326,-197,715,4,-677,95,308,-1224,953,-81,-189,341,-654,242,-948,1088,-533,-328,123,552,-855,49,-443,-37,-57,199,56,-459,-47,-167,13,-521,476,-161,440,-540,180,43,-57,-236,-29,-830,265,-2,-379,-9,198,12,79,-257,113]
>>>

I want a sum total of the array after conversion to their absolute values.

>>> values = [abs(x[n]) for n in range(1,len(x))]
>>> values
[181, 145, 552, 847, 6, 141, 58, 122, 314, 816, 245, 594, 613, 287, 1232, 1479, 326, 197, 715, 4, 677, 95, 308, 1224, 953, 81, 189, 341, 654, 242, 948, 1088, 533, 328, 123, 552, 855, 49, 443, 37, 57, 199, 56, 459, 47, 167, 13, 521, 476, 161, 440, 540, 180, 43, 57, 236, 29, 830, 265, 2, 379, 9, 198, 12, 79, 257, 113]
>>> print sum (values)
24419

Thursday, June 07, 2012

Extract each archive into its own directory

Assuming the archives are all bzipped archives with .tbz2

for i in `ls`; do a=`basename $i .tbz2`; mkdir $a; cd $a;tar -xvf ../$i;cd ..; done

Friday, February 17, 2012

Identify user from python mapper in Hadoop

If you are running a mapreduce job using hadoop streaming with python, and want to know the user the job runs as or other parameters in the OS environment on the task tracker node, use

if(envdict.has_key('user_name')):
           user = os.environ['user_name']
  sys.stdout.write('Job runs as username: '+user)

To make sure what variables you have, use this

       envdict = os.environ
       keylist = envdict.keys()
       sys.stdout.write('Variables available'+str(keylist)+'\n')

Friday, February 10, 2012

Moving Hbase heap dump locations

Occasionally HBase can generate heap dumps on OOM and put it in the bin folder (which is default). This can cause quite a lot of issues, if the system partitioning is not prepared for huge gigabyte files in the bin folder for hbase. To move the dump cores to another folder ,

Do this in the file hbase-ver-conf/hbase-env.sh, add -XX:HeapDumpPath=/path/to/dump to the line

>> export HBASE_OPTS="$HBASE_OPTS -XX:+HeapDumpOnOutOfMemoryError $HBASE_GC"

Restart HBase.

Now you can sleep well without fears of heap dumps filling up the hbase binary partitions :)

Thursday, January 19, 2012

Compare list of files to identify files and directories

Say xyz is a file with a list of files and directories. (The files and directories are all on the working system itself)
You need to print only the directories amongst them.

for i in `cat xyz`; do if [ -d $i ] ; then echo $i ;fi; done

will print out only directories among them.

Sunday, May 01, 2011

Hp Openview Errors

Last week, I had a few issues with bringing up the HP Openview Operations Manager. The snap-in wouldn't load .It just gave this error.

"The Console is unable to connect to the server. Please make sure that all services including the database are up and running."

Now on the services.msc in Windows, the database services and related services were all up and running. I could also connect to the db using the SQL studio. ( My DB is MS SQL 2005)

Then i tried to run this
C:\Program Files\HP\HP BTO Software\support\OVMigration\Binn\osql -E openview -S DBserverName\InstanceName

That too showed that things were running fine.

Next I tried running the command on the OVO server
vpstat -8

It showed that all services except OvEpMsgActSrv was running fine.

You can also use ovc -status to check the services.

Now this turned out to be failing due to a huge message queue file at this location.
"\Program Files\HP Openview\Data\Shared\tmp\queues"

Once that was dumped, I restarted the services with
vpstat -8 -r

and things were fine again. This is how a good configuration should look.

D:\HP\OM\support\OVMigration\Binn>osql.exe -E openview -s SERVERNAME\DBNAME
1>
2>
3>
4>
5> quit

D:\HP\OM\support\OVMigration\Binn>
D:\HP\OM\support\OVMigration\Binn>ovc -status
ovcd OV Control CORE (5624) Running
opcmsga OVO Message Agent AGENT,EA (2148) Running
opcmsgi OVO Message Interceptor AGENT,EA (6480) Running
opcle OVO Logfile Encapsulator AGENT,EA (428) Running
opcacta OVO Action Agent AGENT,EA (5580) Running
agtrep OV Discovery Agent AGENT,AgtRep (5776) Running
coda OV Performance Core COREXT (6456) Running
opcmona OVO Monitor Agent AGENT,EA (6576) Running
ovconfd OV Config and Deploy COREXT (9720) Running
opcwbemi OVO WMI Interceptor AGENT,EA (5572) Running
ovbbccb OV Communication Broker CORE (4424) Running
ovcs OV Certificate Server SERVER (1524) Running
ovtomcatB OV Tomcat(B) Servlet Container WEB (6128) Running

D:\HP\OM\support\OVMigration\Binn>
D:\HP\OM\support\OVMigration\Binn>vpstat -3
============================================================
running: D:\HP\OM\bin\vpstat.exe
version: A.22.0.56.30 on SERVERNAME
now: Sun May 01 11:01:25 2011
============================================================

============================================================
==================== Services Information ==================
============================================================

Configuration file: D:\HP\OM\data\conf\vpstat\vpstat.conf
Ver: 0.8.10.074
Date: Thu Jun 19 15:40:36 2008
Remark: HP Operation Manager - Performance.

Services on : LOCAL MACHINE

(NT Services) :

OK: All services : SERVICE_ACTIVE

D:\HP\OM\support\OVMigration\Binn>vpstat -8
============================================================
running: D:\HP\OM\bin\vpstat.exe
version: A.22.0.56.30 on UP-OVO01
now: Sun May 01 11:01:28 2011
============================================================

============================================================
==================== Processes Information =================
============================================================

Configuration file: D:\HP\OM\data\conf\vpstat\vpstat.conf
Ver: 0.8.10.074
Date: Thu Jun 19 15:40:36 2008
Remark: HP Operation Manager - Performance.

OK:ALL CRITICAL PROCESSES ARE RUNNING

D:\HP\OM\support\OVMigration\Binn>

Thursday, April 14, 2011

PIX and SMTP troubles

Twice in our current environment, we've had to face issues with PIX interfering with very specific SMTP communications. The first time, one of our users could not receive a particular type of pdf attachments from one source. After a hell of troubleshooting, it finally turned out to be that PIX ESMTP filtering was the source of the problem.

This time, there was another similar issue. Some domains with a big list of entries in the "ADDITIONAL SECTION" of the DNS response could not me sent emails. The error being

Delivery Failure
Attempted Delivery to: Unknown Error: 554 5.4.4 [internal] domain lookup failed

This error though was not correct, since the DNS lookups were successful in the nslookup utility on the messaging gateway.

Digging further in pointed to the issue being related to large DNS query responses which were being rejected by PIX DNS restrictions. DNS UDP Packets of size greater than 512 bytes were not allowed by PIX. So Once that was removed, things started working well.