IMPORT YOUR OWN RSA SSH KEY INTO AMAZON EC2

Recently, I needed to import my own RSA SSH key to EC2. Here is a quick way to import to all regions

NB: You need the ec2 tools set up before you can run this. You will also need to have setup an x509 certificate pair.

You can read more about the ec2-import-keypair command in the EC2 documentation.

Assign multiple security groups on Amazon EC2 Run Instances (ec2-run-instances)

Recently, I needed to deploy new servers with multiple defined security groups on AWS EC2 using CLI.

Note: Security groups can only be defined on instance launch unless using VPC.

OSX Mountain Lion and Java

My bash profile defined my JAVA_HOME, and after upgrading to ML I saw this logging in:

This will prompt an install of java

Logging the client IP behind Amazon ELB with Apache

When you place your Apache Web Server behind an Amazon Elastic Load Balancer, Apache receives all requests from the ELB’s IP address.

Therefore, if you wish to do anything with the real client IP address, such as logging or whitelisting, you need to make use of the X-Forwarded-For HTTP Header Amazon ELB includes in each request which contains the IP address of the original host.

Solution for logging the true client IP

Before:

After:

The one downside is that depending on how ELB treats X-Forwarded-For, it may allow clients to spoof their source IP.

Hopefully this helps out anyone experiencing this issue.

sudo: unable to resolve host ubuntu

Sometimes after changing elastic-IP settings or stopping/starting instances on EC2, I get an irritating error like this when I execute a command with sudo:

sudo: unable to resolve host domU-12-34-ab-cd-56-78

The fix is to lookup the instance’s private dns name (via ec2-describe-instances or the AWS console ui) and update the hostname on the instance with the first segment of that DNS name (which is something that looks like ip-12-34-56-78 or domU-12-34-ab-cd-56-78). On ubuntu, this is what you need to do (assuming ip-12-34-56-78 is the new hostname):

The first line will set the hostname until you reboot; and the second line will configure the hostname to use once you do reboot.

Install s3cmd/s3tools Debian/Ubuntu

1. Register for Amazon AWS (yes, it asks for credit card)

2. Install s3cmd (following commands are for debian/ubuntu, but you can find how-to for other Linux distributions on s3tools.org/repositories)

3. Get your key and secret key at this link

4. Configure s3cmd to work with your account

5. Make a bucket (must be an original name, s3cmd will tell you if it’s already used)

Auto-start Sphinx (searchd) after reboot Linux

By default, after you install and configure Sphinx, you will find that once your OS restarts, search will not be working. That is because searchd is not setup to auto start. The following will solve that problem.

Create file /etc/init.d/searchd.

Copy the following into searchd.

Add execute to the file

Register with auto start

Initramfs Prompt When Ubuntu Boots

Today I set up a new Ubuntu 10.04 Server on a Dell PowerEdge T110. The installation went smoothly. However, when booting for the first time, after waiting for some time, the system would drop me into a busybox shell with not much to see. Having seen this problem often, I thought that the /dev/sda1 device would not be there because of missing kernel modules.

You can see the output in the picture on the right. 

However, when checking /dev, I surely found /dev/sda1. After some poking around I found out that the integrated RAID controller needs some more time to warm up and the kernel didn’t want to wait for the disks to come available through the RAID controller any longer.

So, the solution is to add the ‘rootdelay’ parameter to the kernel option into the /boot/grub/grub.cfg

This instructs the kernel to wait for 60 seconds before trying to enter the init process. This fixed the problem and I was able to boot into the newly installed system.

In order to make this change appear in all grub entries when Ubuntu does a kernel upgrade, you have to edit the file /etc/default/grub and also add that parameter to the GRUB_CMDLINE_LINUX variable.

After that, run ‘update-grub’ and you can double-check that your changes appear in /etc/grub/grub.cfg.

Simple Shell Script to Send Email Ubuntu

We run a local SVN sever that syncs to Amazon S3 + an EC2 instance. Syncing locally to EC2 can take anywhere from 10-20 minutes and instead of waiting for the script to finish I decided to write a quick little email notify script. Enjoy.

The Future of Presidential Debates

I recently discussed a topic with a friend about having IBM’s Watson moderate a presidential debate or at least using it to instant fact check their claims. My argument would be that you cannot just “fact check” like that per say. The facts that the candidates are quoting are from various studies, all of which have their own degree of bias and/or error. Or they manipulate the language that they use so that they can appear to be saying something when in fact they’re doing something else. That’s politics.

Watson was optimized for Jeopardy’s style of game play. Also, it does not have the linguistic analysis abilities needed to keep up with politics. For example, metaphors, euphemisms, sarcasm and things of the like would all confuse Watson. Some day though.

More info about IBM’s Watson from Yahoo!:

So what makes Watson’s genius possible? A whole lot of storage, sophisticated hardware, super fast processors and Apache Hadoop, the open source technology pioneered by Yahoo! and at the epicenter of big data and cloud computing.
Hadoop was used to create Watson’s “brain,” or the database of knowledge and facilitation of Watson’s processing of enormously large volumes of data in milliseconds. Watson depends on 200 million pages of content and 500 gigabytes of preprocessed information to answer Jeopardy questions. That huge catalog of documents has to be searchable in seconds. On a single computer, it would be impossible to do, but by using Hadoop and dividing the work on to many computers it can be done.
In 2005, Yahoo! created Hadoop and since then has been the most active contributor to Apache Hadoop, contributing over 70 percent of the code and running the world’s largest Hadoop implementation, with more than 40,000 servers. As a point of reference, our Hadoop implementation processes 1.5 times the amount of data in the printed collections in the Library of Congress per day, approximately 16 terabytes of data.