gist JS

Friday, August 16, 2013

hbase scan: batch vs cache

Here's today's contribution to the Internet: tl;dr When it comes to HBase scanner settings, you want caching, not batchsize. Maybe this is totally clear to everyone else. But for those of us who are 'newer to hbase' I can never quite remember what I'm doing.
Say you've got this code:
Scan s = new Scan(startKey);s.setCaching(foo);s.setBatch(bar);ResultScanner scanner = new ResultScanner(s);for (final Result r : scanner) {  //stuff}
But you're clever and you don't want to do RPC calls to HBase for every row. You might even say you'd like to 'batch' the results from your scanner. 

So you read http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html public void setBatch(int batch)
Set the maximum number of values to return for each call to next()
public void setCaching(int caching)
Set the number of rows for caching that will be passed to scanners. If not set, the default setting from HTable.getScannerCaching() will apply. Higher caching values will enable faster scanners but will use more memory.
Annnd.... not sure. I mean, I only want one Result every time I call next()in my iterator, right? What would  a number >1 even mean? 

And I'm sure I shouldn't set 'caching' that sounds like it will 'cache' something. I want to read the real stuff.

But you do want caching. Caching is how many things come back in a batch from your scanner. 

Ok. Fine. Caching got named poorly.  What is batch?

Batch is in case you have super wide rows. Say you have 250 columns. Batch of 100 would give your iterator:
  • Iteration 1: Result id 0. Columns 0-99
  • Iteration 2: Result id 0. Columns 100-199
  • Iteration 3: Result id 0. Columns 200-249
  • Iteration 4: Result id 1. Columns 0-99
  • Iteration 5: Result id 1. Columns 100-199
Or at least that's what http://twitter.com/monkeyatlarge told me.


Anonymous said...

Does this apply for HBase REST as well?

The caching seems to not work using REST or probably me doing it wrong.

Jeff Dwyer said...

Haven't used the REST stuff myself. I presume you've seen https://issues.apache.org/jira/browse/HBASE-7803 ? Seems as though it is supposed to work.

Dan Häberlein said...

Good stuff!

Jhon Abraham said...

You have certainly explained that hbase is the process of examining big data to uncover hidden patterns, unknown correlations and other useful information that can be used to make better decisions..The big data analytics is the major part to be understood regarding training program. Via your quality content i get to know about that in deep.Thanks for sharing this here.

Big Data Training

Shashaa Tirupati said...

Hello, those who are interested in implant training for engineering students can contact us. We provide real time implant training for engineering students who aspire.

Pooja Doss said...

Oracle DBA Training in Chennai
Thanks for sharing this informative blog. I did Oracle DBA Certification in Greens Technology at Adyar. This is really useful for me to make a bright career..

Pooja Doss said...

Whatever we gathered information from the blogs, we should implement that in practically then only we can understand that exact thing clearly, but it’s no need to do it, because you have explained the concepts very well. It was crystal clear, keep sharing..
Websphere Training in Chennai

Pooja Doss said...

Data warehousing Training in Chennai
I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly..

Pooja Doss said...

Selenium Training in Chennai
Wonderful blog.. Thanks for sharing informative blog.. its very useful to me..

Pooja Doss said...

Oracle Training in chennai
Thanks for sharing such a great information..Its really nice and informative..

Pooja Doss said...

SAP Training in Chennai
This post is really nice and informative. The explanation given is really comprehensive and informative..

Pooja Doss said...

This information is impressive..I am inspired with your post writing style & how continuously you describe this topic. After reading your post,thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic
Android Training In Chennai In Chennai

Pooja Doss said...

Pretty article! I found some useful information in your blog, it was awesome to read,thanks for sharing this great content to my vision, keep sharing..
Unix Training In Chennai

Pooja Doss said...

I found some useful information in your blog, it was awesome to read, thanks for sharing this great content to my vision, keep sharing..
SalesForce Training in Chennai

Pooja Doss said...

There are lots of information about latest technology and how to get trained in them, like Best Hadoop Training In Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies Hadoop Training in Chennai By the way you are running a great blog. Thanks for sharing this blogs..