Monthly Archives: January 2011

My LinkedIn InMap

Phil Windley wrote earlier today that “LinkedIn Labs has released a visualization tool called an InMap” and thought I would share mine.

The very well-connected lobe on the left contains mostly Kynetx employees, as well as people from a few other related companies like 7bound and MashWorx. These connections I largely gained last summer when I interned at Kynetx and did some independent consulting work.

The less well-connected lobe on the right is mostly people I know from BYU. There are several subgroups for the various places I’ve worked, such as the Office of Information Technology and Health and Human Performance Services. There are also a significant number of students I know; these tend to be the least inter-connected of all.

You can view my InMap in more detail here.

Blog traffic from CS 462 students

I had some interesting traffic patterns on my blog over the past week and a half. I have several blog posts of considerable utility for the students in CS 462. Take a look at this pageview graph:

Luckily my hosting provider, A Small Orange, didn’t have any trouble with the “spikes.”

Amazon Elastic Beanstalk

Amazon announced a new AWS product this morning: the Elastic Beanstalk. Insert your own trite fairy tale references here.

The thing that strikes me about Beanstalk is that it’s the first product in AWS that was designed to be accessible to the common end user. It’s based on Platform as a Service designs used by other industry players (some examples are Google App Engine and Red Hat’s JBoss PaaS). The developer has only to create his application, package it into a WAR file, and upload it to the PaaS provider.

The difference between Amazon’s offering and, say, Google’s is that Amazon gives you full, transparent control over the stack being used to run your application. App Engine doesn’t give you anywhere near that kind of control. With Elastic Beanstalk, you’re free to go log in to the EC2 instances running your application, inspect the contents of S3, etc.

If you’re a power user and want to tinker with those things, you can. If you’re just an average developer who just wants a scalable application without having to worry about the implementation details, you’re free to leave them alone.

For the time being, Beanstalk only runs Java apps on Tomcat, but I’m excited to see what else they will support in the coming months.

Why create a custom AMI?

I’ve been getting a lot of questions today about why we do this whole process of creating and registering a custom AMI and then bootstrapping it, rather than just starting from scratch each time. Their questions are motivated by various things, ranging from frustration with the process to not understanding the purpose. I haven’t been very articulate in describing why this is part of Lab 1, so I’ll try my hand at an explanation here.

The Concept of a Stem Cell Server
The Image Project requires you to create a web server and a two-part app server. These both have the same basic configuration: Apache with Python, or Apache with PHP, or IIS with .NET, etc. (as per your preference). It will save you a lot of trouble in the coming labs if you’ve created a good, general AMI configuration that can serve adequately as a base for all your real servers. You don’t want to have to write a configuration script every time that installs the same things–the AMI ought to have those already there. That way, your configuration only needs to reflect what is different or unique about each server.

You have to have perspective beyond just Lab 1.

Efficiency of Scaling

One of the primary things to keep in mind when working with Amazon Web Services is that you have to think like Amazon. They have a huge system and need to be able to scale up or down dynamically as demand fluctuates. The requirements of that system led to the current architecture of services like EC2 and S3.

The Image Project is not anywhere near the scale of Amazon.com, simply because we only have a semester to build it. But the architecture of the project is designed to imitate that of a real-world massively scalable system. Keeping that in mind throughout the semester will make some project requirements easier to understand.

When Amazon needs to fire up another EC2 instance for their company infrastructure, it needs to occur rapidly. The longer it takes, the more business they may lose. The process of creating custom AMIs was designed with that in mind.

In our case, the speed at which an instance starts up is not very important. We’re not doing it dynamically, and we do it infrequently. As far as the performance of our system is concerned, it doesn’t really matter to us whether it takes two seconds or two minutes.

But consider two setups in a large, scalable production system:

  1. In the first setup, we start from a default base AMI, which has just a bare installation of Ubuntu Server. When the instance starts up, we run a script that downloads and installs Apache, Python, several support libraries, and all our website code.
  2. In the second setup, we have created a custom AMI that has everything we need already installed. All we have to do when we launch the instance is dump our code into the Apache root and go.

In this hypothetical large, scalable system, we need to fire up instances dynamically. Setup #1 will take a few minutes to download, install, and configure everything—all that in addition to the time it takes to start the instance in the first place. In those few minutes, the rest of our instances are still struggling to handle all the requests, and a few customers have left because the pages are taking so long to load.

Setup #2 requires no extra time besides initially starting up the instance (a sunk cost). As soon as it starts up, it’s registered with the load balancer and ready to help the other instances handle the traffic. In addition, it doesn’t use any extra network bandwidth to download and install packages that could be downloaded just once and stored in the AMI.

You can see that in this type of a system, the efficiency with which we can spin instances up or down makes a difference. We’re designing The Image Project so that it is ready to scale like that as soon as our traffic starts increasing.

Summary
Keep the big picture in mind. This is designed to be a potentially huge system with loosely-joined components that can scale independently and flexibly. Think like Amazon.

Tutorial for launching an EC2 instance

This process is fairly self-explanatory. There is an in-depth screencast here that describes the whole process on a Windows machine. But to make it even simpler, here is a description of the process for Lab 1:

Click “Launch Instance” at the top of the AWS Console:

The AMI I’m using for the class is ami-a403f7cd. Select that from the list.


Nothing needs to be changed here.


Nothing needs to be changed here the first time, but you’ll paste your User Data script in that box in subsequent times you launch your custom AMI.


Give the server a name that you will remember.


Create a keypair, which will allow you (and only you) to log in. You only need to do this once; the next time you launch an instance, you will simply select your keypair from the list.


Choose a security group. Default is fine.


Confirm all the details.


Now the server is starting. This usually takes 1-2 minutes.


Right-click on your server and choose “Connect” to get details about connecting to the instance.


Follow the instructions here to connect, making sure to change the username from “root” to “ubuntu” if you’re using an Ubuntu image.


I’m now logged into my Ubuntu server!

Helpful links for Lab 1

Here are some useful pages I collected while helping people with Lab 1 today:

I’ll keep this updated throughout the day, but I thought I’d post it right now.

Setting up my development environment for AWS

This is mostly for my own benefit, since I’ll more likely than not have to explain the process to others in the class.

This describes all the pieces I needed to set up to get my dev machine set up to work with Amazon Web Services.

The first thing I needed were the security credentials. To find these, log in to the AWS Console, click on Account at the top, and then click on Security Credentials. (At the time of writing, the URL for that page is this.)

There are four security credentials you need:

  • Access Key ID (string)
  • Secret Access Key (string)
  • X.509 Certificate (.pem file)
  • Private key associated with the X.509 (.pem file)

All of those are accessible to anyone who can log in to the account, except for the private key. Amazon does not keep a copy of the private key for the X.509 certificate; you can download it when you create the certificate, but if you lose it after that you’ll have to revoke the certificate and create a new one.

Store all of these in a secure place.

Next, you’ll need a key pair to run instances on EC2. You can create this when you launch an instance. Select the option “Create a New Key Pair” when you get to that step in the launch process. This will give you a .pem file. Save that in a secure location as well; you’ll need it to SSH into your machine.

Once you have all of those things, there are a couple other things that are useful. I set up my .ssh/config file to allow me to SSH in easily. Here’s what the EC2 section looks like:

Host ec2
    HostName {host name}.amazonaws.com
    User ubuntu
    IdentityFile ~/.ec2/snay2.pem

The last thing that’s useful to have, especially when you’re creating AMIs, are the EC2 API tools for the command line. I installed these into my home directory (since the BYU CS labs don’t give me write access to the more sensible location of /usr/share) and set up my .bashrc to initialize the necessary environment variables to run the tools:

export EC2_HOME=~/.ec2/tools/
export PATH=$PATH:$EC2_HOME/bin
export EC2_PRIVATE_KEY=~/.ec2/PrivateKey.pem
export EC2_CERT=~/.ec2/X509Cert.pem

I think that’s about it.

Kynetx app ideas

This is largely a response to Mike Grace’s post this morning about some ideas he has for Kynetx apps. The first few are my take on his ideas, and the rest are some of my own ideas.

  • Goodreads integration with LDS.org (from Mike) I really like this idea. I’ve been wanting to do some Kynetx apps with the Goodreads API myself, such as an Amazon.com annotator to allow you to add books directly from there to your Goodreads shelves. However, due to some current limitations in KNS, it doesn’t seem like I can write an app that uses the necessary OAuth to talk to the API. Hopefully that will be implemented someday.
  • Google Reader suggestions (from Mike) I like this idea too. I have my own mental process for pruning out the articles that seem least relevant to me. But there’s definitely an advantage to automating that. Louis Gray and my6sense have been doing some great work in that area. The problem is that I rarely read my RSS feeds on my phone, so I can’t use any of their technology. I’ve also tried using PostRank, a Chrome plugin for Google Reader, but that isn’t helpful in determining what’s relevant to me. A Kynetx app would be the perfect solution.
  • Google Calendar/Twilio/email integration I wrote a bit about this one on Twitter a few months ago. My old boss wanted me to clock in within 15 minutes of my scheduled time every day. If there was ever a change to my schedule, he wanted an email informing him of such. I keep all that data up to date in a Google Calendar anyway, so I found it tedious to have to send him email as well. I’d like to build a Kynetx app that lets me watch my Google Calendar and send out notifications of various kinds (Twitter, SMS (Twilio), email, etc.) when something changes.
  • Twitter bot I have a Kindle, which has the ability to post my notes and annotations to Twitter and Facebook when I want. It has one problem, however. If I just highlight a piece of text that I want to share (quote, as it were), all it puts in the tweet is a link to view the quote on Amazon’s site. I want to write a Kynetx app that watches my Twitter stream (or perhaps that of another account I own dedicated to my Kindle) that can then read the quote I posted and repost it in a more reader-friendly format. This would also be a nice way to send my Kindle annotations as quotes to Goodreads, something I currently cannot do.

There are some ideas. What do you think?

Overreliance on cloud services

I woke up this morning bright and early (something to the tune of 4 AM) to do some homework for my CS 312 class. By about 7:30, after a few hours of reading and writing LaTeX on my laptop, I had it all done and stored safely away in my Dropbox.

I left my laptop at home and took my netbook (my usual M.O.). I went up to campus to print the three pages of proofs and discovered to my horror that Dropbox was down. My netbook couldn’t connect via the desktop app, and I couldn’t get on the website with the campus computers either. I had about 10 minutes before class was to start. In a word: I was stuck. Resigned to my fate, I left the library and walked to class.

Naturally, as soon as I sat down in class and logged in, Dropbox was back up. But I was already too far from a printer to turn it in on time. I emailed it to my professor; we’ll see if he accepts it. It was on time, just not printed.

The moral of the story: No matter how reliable anyone says their service is going to be, I can’t trust them to deliver when their services are mission-critical for me.

That worries me somewhat at times, because most of my work is online: Google Calendar, Gmail, Toodledo, Google Docs, Dropbox. Even this blog, which is a self-hosted installation, is running on some virtual server at A Small Orange. My digital life (and, currently, most of my digital identity) is dependent on a whole lot of Internet players. If any one of them goes down, I don’t have a lot of redundancy to help me cope.

For the time being, I’ll just have to put my assignment on a flash drive (shudder) when I go to print them. Or at least bring the same computer that had the original file….

Building up an AMI from Ubuntu 10.04 LTS

Now that I’ve gotten the hang on Amazon EC2, I’m ready to start building my actual stem cell server. I’m starting from the Ubuntu 10.04 LTS image provided by Canonical (ami-a403f7cd). It’s a bare-bones server instance that doesn’t even have Apache on it yet, although it does have Python.

Setting up the Server

I’m going to need the EC2 AMI tools, which requires allowing the Multiverse repository, so I added these to lines to my /etc/apt/sources.list.d/multiverse.list first:

deb http://us.ec2.archive.ubuntu.com/ubuntu/ karmic multiverse
deb-src http://us.ec2.archive.ubuntu.com/ubuntu/ karmic main

Here are the packages I installed (after running apt-get update):

ec2-ami-tools
ec2-api-tools
apache2
libapache2-mod-python

These packages may also be useful but aren’t strictly necessary at this point:

python-cheetah
python-dev
python-setuptools
python-simplejson
python-pycurl
python-imaging

I ran apt-get upgrade as well. This presented me with a couple prompts (one from GRUB), which I had to get through. And since it included a kernel update, I restarted the machine.

Next, I added the necessary Python directives to my /etc/apache2/sites-available/default file, as I documented earlier.

I added a simple index.py page in /var/www to handle CGI requests. One thing that confused me for a long while is that index.py can’t just be any old Python script; it has to be a CGI request handler. Here’s a simple example:

def index(req):
  return "Hello, world!"

Creating and Registering the AMI

The next step is creating an AMI from this image and saving that to S3. I’m using this tutorial from Amazon as my guide.

There are three basic steps:

  1. Bundle the volume

    First, I got the X509 certificate and my private key and put them on /mnt. I also got my account number handy (available from the same place in the AWS console as the public/private keys). Then I ran these commands:

    sudo mkdir /mnt/image
    sudo ec2-bundle-vol -k PrivateKey.pem -c X509Cert.pem -u 000000000000 -d /mnt/image

    Replace the 000000000000 with your account ID.

    After several minutes, that created the AMI bundle in /mnt/image.

  2. Upload the bundle to S3

    I used this command:

    ec2-upload-bundle -b cs462-machines/snay2-test1 -m /mnt/image/image.manifest.xml -a <access key> -s <secret key>

    Replace the snay2-test1 with your own folder name inside the bucket.

  3. Register the AMI

    I have my dev machine set up such that I don’t have to pass the keys along on the command line (see this post for how I did that). But if you don’t have it set up that way or if you want to run the command from your EC2 instance directly, that’s possible too. Here is the command:

    ec2-register cs462-machines/snay2-test1/image.manifest.xml --K PrivateKey.pem -C X509Cert.pem

    Replace the snay2-test1 with your own folder name inside the bucket.

    After a second, that came back with the AMI name. We’re done!

EDIT: Here is a detailed tutorial from the Ubuntu community on the AMI tools and other necessary EC2 things.

EDIT: Revised the last portion of the post to reflect the three basic steps of persisting an AMI.