Friday, February 14, 2014

Populating Hbase tables with sample data


Quite often we may require to populate hbase tables for issue recreations. Here's a simple procedure for this ( from cloudavenue.com
) :

1) For creating a table 'testtable' with a column family 'colfam1'

create 'testtable', 'colfam1'

list 'testtable'

3) To test insert data into the 'testtable' table.

put 'testtable', 'myrow-1', 'colfam1:q1', 'value-1'
put 'testtable', 'myrow-2', 'colfam1:q2', 'value-2'
put 'testtable', 'myrow-2', 'colfam1:q3', 'value-3'

The HBase Shell is (J)Ruby’s IRB with some HBase-related commands added. Anything that can be done in IRB, can also be done in the HBase Shell. The below command will insert 1K rows into the 'testtable' table.

for i in '0'..'9' do for j in '0'..'9' do \
for k in '0'..'9' do put 'testtable', "row-#{i}#{j}#{k}", \
"colfam1:#{j}#{k}", "#{j}#{k}" end end end

4) For getting data from the 'testtable' table

get 'testtable', 'myrow-1'
scan 'testtable'

5) For deleting data from the 'testtable' table.

delete 'testtable', 'myrow-2', 'colfam1:q2'

6) For deleting the table.

disable 'testtable'
drop 'testtable'


If you want to test a sample csv import,

Use this bash one liner to generate a csv as you want :

for i in `seq 1 19`; do for j in `seq 1 9`; do for k in `seq 1 9`; do echo "row"$i",col"$j",value"$i"-"$j"-"$k; done; done; done

According to the amount of data you want to load, increase the max value of the loop variables.

This should produce output of the form

row1,col1,value1-1-1
row1,col1,value1-1-2
row1,col1,value1-1-3
row1,col1,value1-1-4
row1,col1,value1-1-5
row1,col1,value1-1-6
row1,col1,value1-1-7
row1,col1,value1-1-8
row1,col1,value1-1-9
row1,col2,value1-2-1
row1,col2,value1-2-2
row1,col2,value1-2-3
row1,col2,value1-2-4
row1,col2,value1-2-5
row1,col2,value1-2-6
row1,col2,value1-2-7

Create a sample table

create 'testtable', 'colfam1'

hadoop jar /opt/mapr/hbase/hbase-0.94.5/hbase-0.94.5-mapr.jar importtsv -Dimporttsv.columns=colfam1:row,colfam1:col,colfam1:val

Update :

In case you want to generate and print random data on the screen from a ruby shell,

irb(main):014:0> require 'securerandom'
=> true
irb(main):015:0> for i in '1'..'10' do puts SecureRandom.hex
irb(main):016:1> end
8917ccbb7f0bea0d54d0e98e12b416cf
9cd1865fd43482174b3088c6749075de
1d009056e9fcc0b2ddf4352eb824a97d
1abeb9bb4b0993ad732335818fdc8835
d41cf0ca16be930d0aa3925651a10ec4
732dc0d79e7b7d82e4b5ac21d8b00f5c
519fc21d6d0a76a467dd2f2d14741090
27fb689fd3d9b8f4b17b17535681214b
6454ff61e5ef116688ca172ba13aa80c
83ecb50f1e9ab42d1e320119e24a9a9c
=> "1".."10"
irb(main):017:0>

This can be used on the HBase shell to insert into the table

hbase(main):001:0> require 'securerandom'; for i in '0'..'9' do for j in '0'..'9' do \
for k in '0'..'9' do put 'testtable', SecureRandom.hex , \
"colfam1:#{j}#{k}", "#{j}#{k}" end end end


No comments: