Map Reduce a Local File?
Here is what I want to do.
I have a large data file (more than 50GB), I run scripts to get some numbers out of it.
A traditional script looks at the file in a sequential order and performs some operation.
Making it multi-threaded does not help much as the file is accessed sequentially, and the threads will be blocked on it. Also writing such a multi threaded program is very painful.
Is there way I can load the file at different offsets, load it in chunks basically and process the chunks individually. Rather perform Mapper and Reducer Jobs, on a local file without the hoopla of Hadoop - Hadoop without the network.
Does this make sense, or am I bat shit crazy.
Are there any tools which allow me to do this?
February 22, 2012 at 11:09am
0 notes
It is better to be a human being dissatisfied than a pig satisfied; better to be a Socrates dissatisfied than a fool satisfied. And if the fool or the pig thinks otherwise, it is because they have no experience of the better part.
—JOHN STUART MILL, Utilitarianism
— http://logback.qos.ch/manual/groovy.html
Sentiment Analysis
While I was reading upon sentiment analysis, I came across this on Stack Overflow (link)
A linguistics professor was lecturing to her class one day. “In English,” she said, “A double negative forms a positive. In some languages, though, such as Russian, a double negative is still a negative. However, there is no language wherein a double positive can form a negative.”
voice from the back of the room piped up, “Yeah …right.”
I think semantic analysis is a tough cookie.