Logging the client IP behind Amazon ELB with Apache

When you place your Apache Web Server behind an Amazon Elastic Load Balancer, Apache receives all requests from the ELB’s IP address.

Therefore, if you wish to do anything with the real client IP address, such as logging or whitelisting, you need to make use of the X-Forwarded-For HTTP Header Amazon ELB includes in each request which contains the IP address of the original host.

Solution for logging the true client IP

Before:

After:

The one downside is that depending on how ELB treats X-Forwarded-For, it may allow clients to spoof their source IP.

Hopefully this helps out anyone experiencing this issue.

The Future of Presidential Debates

I recently discussed a topic with a friend about having IBM’s Watson moderate a presidential debate or at least using it to instant fact check their claims. My argument would be that you cannot just “fact check” like that per say. The facts that the candidates are quoting are from various studies, all of which have their own degree of bias and/or error. Or they manipulate the language that they use so that they can appear to be saying something when in fact they’re doing something else. That’s politics.

Watson was optimized for Jeopardy’s style of game play. Also, it does not have the linguistic analysis abilities needed to keep up with politics. For example, metaphors, euphemisms, sarcasm and things of the like would all confuse Watson. Some day though.

More info about IBM’s Watson from Yahoo!:

So what makes Watson’s genius possible? A whole lot of storage, sophisticated hardware, super fast processors and Apache Hadoop, the open source technology pioneered by Yahoo! and at the epicenter of big data and cloud computing.
Hadoop was used to create Watson’s “brain,” or the database of knowledge and facilitation of Watson’s processing of enormously large volumes of data in milliseconds. Watson depends on 200 million pages of content and 500 gigabytes of preprocessed information to answer Jeopardy questions. That huge catalog of documents has to be searchable in seconds. On a single computer, it would be impossible to do, but by using Hadoop and dividing the work on to many computers it can be done.
In 2005, Yahoo! created Hadoop and since then has been the most active contributor to Apache Hadoop, contributing over 70 percent of the code and running the world’s largest Hadoop implementation, with more than 40,000 servers. As a point of reference, our Hadoop implementation processes 1.5 times the amount of data in the printed collections in the Library of Congress per day, approximately 16 terabytes of data.

Using PHP/cURL to Display Hbase Version Using REST Server

I am using a rest server on port 59999 with Hadoop/Hbase.

To start a rest server:

You must have PHP/cURL installed to use this code:

Starting Hadoop HBase REST Server on CDH3 Ubuntu 10.04

Here we can start the REST server on port 59999

Test that REST has started

http://localhost:59999/version