I recently discussed a topic with a friend about having IBM’s Watson moderate a presidential debate or at least using it to instant fact check their claims. My argument would be that you cannot just “fact check” like that per say. The facts that the candidates are quoting are from various studies, all of which have their own degree of bias and/or error. Or they manipulate the language that they use so that they can appear to be saying something when in fact they’re doing something else. That’s politics.

Watson was optimized for Jeopardy’s style of game play. Also, it does not have the linguistic analysis abilities needed to keep up with politics. For example, metaphors, euphemisms, sarcasm and things of the like would all confuse Watson. Some day though.

More info about IBM's Watson from Yahoo!:

So what makes Watson’s genius possible? A whole lot of storage, sophisticated hardware, super fast processors and Apache Hadoop, the open source technology pioneered by Yahoo! and at the epicenter of big data and cloud computing.
Hadoop was used to create Watson’s “brain,” or the database of knowledge and facilitation of Watson’s processing of enormously large volumes of data in milliseconds. Watson depends on 200 million pages of content and 500 gigabytes of preprocessed information to answer Jeopardy questions. That huge catalog of documents has to be searchable in seconds. On a single computer, it would be impossible to do, but by using Hadoop and dividing the work on to many computers it can be done.
In 2005, Yahoo! created Hadoop and since then has been the most active contributor to Apache Hadoop, contributing over 70 percent of the code and running the world’s largest Hadoop implementation, with more than 40,000 servers. As a point of reference, our Hadoop implementation processes 1.5 times the amount of data in the printed collections in the Library of Congress per day, approximately 16 terabytes of data.

The Hadoop Distributed Filesystem (HDFS) forms the basis of many large-scale storage systems at Facebook and throughout the world. Our Hadoop clusters include the largest single HDFS cluster that we know of, with more than 100 PB physical disk space in a single HDFS filesystem. Optimizing HDFS is crucial to ensuring that our systems stay efficient and reliable for users and applications on Facebook.

The One Thing in Life You Can Control: Effort

It was right around then I heard something that I would hear a lot once I bought the Mavs.

In sports, the only thing a player can truly control is effort. The same applies to business. The only thing any entrepreneur, salesperson or anyone in any position can control is their effort.

I had to kick myself in the ass and recommit to getting up early, staying up late and consuming everything I possibly could to get an edge. I had to commit to making the effort to be as productive as I possibly could. It meant making sure that every hour of the day that I could contact a customer was selling time, and when customers were sleeping, I was doing things that prepared me to make more sales and to make my company better.

And finally, I had to make sure I wasn’t lying to myself about how hard I was working. It would have been easy to judge effort by how many hours a day passed while I was at work. That’s the worst way to measure effort. Effort is measured by setting goals and getting results. What did I need to do to close this account? What did I need to do to win this segment of business? What did I need to do to understand this technology or that business better than anyone? What did I need to do to find an edge? Where does that edge come from, and how was I going to get there?

The one requirement for success in our business lives is effort. Either you make the commitment to get results or you don’t.

–Mark Cuban
“How to Win at the Sport of Business.”

5 lessons for money seeking entrepreneurs

Here are 5 lessons for money seeking entrepreneurs from the Shark Tank:

1. Know your numbers: I can’t tell you how many times variations on the following conversation occurs on the show:

Kevin O’Leary (shark): “Let me get this right. You are offering 20% of your company in exchange for an investment of $500,000?
Entrepreneur: “You bet, we have a great product!”
O’Leary: “But that means that you are valuing your company at $2.5 million. You only have $120,000 in sales. Your company is not worth close to that my friend.”
Entrepreneur: “Uh, uh . . .”

Investors want to know how you are valuing your business, how much money you are going to make, how much profit you have made, and why you need their money. Potential is great and all, but numbers talk, BS walks.

2. Understand that money has no feelings: Kevin O’Leary is fond of pointing this out. This is about making a profit for the investors, nothing more, nothing less. How, exactly, will you do that? How you feel about your business is fairly irrelevant.

3. Have a real business that can be scaled: The business cannot be you doing labor, unless that labor can be duplicated en masse. If you make homemade cedar toy chests that cost $300 but take 25 hours to build, it is difficult to see how that is a business that can be ramped up to sell mass quantities. A business that makes widgets for $2 that retail for $4 is a business that is scalable.

4. Have real (not false) confidence and be emotionally intelligent: Yes, it’s all about the numbers, but then again, it’s not all about the numbers. You have to be a cheerleader for your business while being able to read the room.

Says Barbara Corcoran, “Make sure you can sell your product, because if you as the head of the company can’t sell it, who will? Also be sure you’re ready to answer the two key questions too many of the entrepreneurs who come on the show can’t: 1) What will you do with my money? and 2) How will I get my investment back?”

5. Be unique in the marketplace: The products that get some love are usually those that are 1) different, and 2) clearly serve a market need. Again, Barbara Corcoan puts it well: “If your business idea clearly answers a need in the marketplace, it’s probably a good idea. If the need is already being met by well-entrenched competitors, it can still be a good idea if it’s a new, cheaper or more clever way of doing it.”

Follow these rules and hopefully you wont be eaten by the sharks.

Hadoop incompatible build versions

Today I came across a build error with a newer version of CDH 0.20.2+320 installed on a new machine to be used as a new datanode. The datanode failed to join the cluster and the log is shown below. In order to fix this error, the build version of a data node has to be exactly the same as the namenode.

Change the hostname in Mac OSX

When I log into the network at work my Mac’s hostname always turns to:

I have my local hostname set to:

So What I would like to do is set my Mac’s hostname to my local hostname. You can do this all from Terminal in a single line.

Run this command in Terminal:

This is also helpful if you’re in Terminal and have a really long hostname at your prompt. If you want to view your current hostname, run this command in Terminal:

Update (February 13, 2012): Some people have reported that their hostname is not updating. Please try closing your current Terminal session and starting up a new one. Then type “hostname” and you should see your changes.

This is what it looked like for me:

Apache Hadoop 0.23.0 has been released

The Apache Hadoop PMC has voted to release Apache Hadoop 0.23.0. This release is significant since it is the first major release of Hadoop in over a year, and incorporates many new features and improvements over the 0.20 release series. The biggest new features are HDFS federation, and a new MapReduce framework. There is also a new build system (Maven), Kerberos HTTP SPNEGO support, as well as some significant performance improvements which we’ll be covering in future posts. Note, however, that 0.23.0 is not a production release, so please don’t install it on your production cluster.