Running R on Amazon’s EC2

This is a note for those who use R, but haven’t yet used Amazon’s (EC2 cloud services.

Amazon’s EC2 is a type of cloud that provides on demand computing infrastructures called an Amazon Machine Images or AMIs. In general, these types of cloud provide several benefits:

  • Simple and convenient to use. An AMI contains your applications, libraries, data and all associated configuration settings. You simply access it. You don’t need to configure it. This applies not only to applications like R, but also can include any third-party data that you require.
  • On-demand availability. AMIs are available over the Internet whenever you need them. You can configure the AMIs yourself without involving the service provider. You don’t need to order any hardware and set it up.
  • Elastic access. With elastic access, you can rapidly provision and access the additional resources you need. Again, no human intervention from the service provider is required. This type of elastic capacity can be used to handle surge requirements when you might need many machines for a short time in order to complete a computation.
  • Pay per use. The cost of 1 AMI for 100 hours and 100 AMI for 1 hour is the same. With pay per use pricing, which is sometimes called utility pricing, you simply pay for the resources that you use.

Here are the main steps to use R on a pre-configured AMI.

Set up.
The set up needs to be done just once.

  1. Set up an Amazon Web Services (AWS) account by going to:

    aws.amazon.com.

    If you already have an Amazon account for buying books and other items from Amazon, then you can use this account also for AWS.

  2. Login to the AWS console
  3. Create a “key-pair” by clinking on the link “Key Pairs” in the Configuration section of the Navigation Menu on the left hand side of the AWS console page.
  4. Clink on the “Create Key Pair” button, about a quarter of the way down the page.
  5. Name the key pair and save it to working directory, say /home/rlg/work.

Launching the AMI. These steps are done whenever you want to launch a new AMI.

  1. Login to the AWS console. Click on the Amazon EC2 tab.
  2. Click the “AMIs” button under the “Images and Instances” section of the left navigation menu of the AWS console.
  3. Enter “opendatagroup” in the search box and select the AMI labeled
    “opendatagroup/r-timeseries.manifest.xml”, which
    is AMI instance “ami-ea846283”.
  4. Enter the number of instances to launch (1), the name of the key pair that you have previously created, and select “web server” for the security group. Click the launch button to launch the AMI. Be sure to terminate the AMI when you are done.
  5. Wait until the status of the AMI is “running.” This usually takes about 5 minutes.

Accessing the AMI.

  1. Get the public IP address of the new AMI. The easiest way to do this is to select the AMI by checking the box. This provides some additional information about the AMI at the bottom of the window. You can can copy the IP address there.
  2. Open a console window and cd to your working directory which contains the key-pair that you previously downloaded.
  3. Type the command:
    ssh -i testkp.pem -X root@ec2-67-202-44-197.compute-1.amazonaws.com

    Here we assume that the name of the key-pair you created is “testkp.pem.” The flag “-X” starts a session that supports X11. If you don’t have X11 on your machine, you can still login and use R but the graphics in the example below won’t be displayed on your computer.

Using R on the AMI.

  1. Change your directory and start R

    #cd examples
    #R
  2. Test R by entering a R expression, such as:

    > mean(1:100)
    [1] 50.5
    >
  3. From within R, you can also source one of the example scripts to see some time series computations:


    > source('NYSE.r')

  4. After a minute or so, you should see a graph on your screen. After the graph is finished being drawn, you should see a prompt:

    CR to continue

    Enter a carriage return and you should see another graph. You will need to enter a carriage return 8 times to complete the script (you can also choose to break out of the script if you get bored with the all the graphs.
  5. To plot the time series xts.return and write the result to a file called ‘ret-plot.pdf’ use:

    > pdf("ret-plot.pdf")
    > plot(xts.return)
    > dev.off()

    You can then copy the file from the Instance to your local machine using the command:

    scp -i testkp.pem root@ec2-67-202-44-197.compute-1.amazonaws.com:/root/examples/ret-plot.pdf ret-plot.pdf
  6. When you are done, exit your R session with a control-D. Exit your ssh session with an “exit” and terminte your AMI from the Amazon AWS console. You can also choose to leave your AMI running (it is only a few dollars a day).

Acknowledgements: Steve Vejcik from Open Data Group wrote the R scripts and configured the AMI.

One day course. I’ll be covering this example as well as several other case studies in a one day course taking place in San Mateo on July 14. See the courses page for more details.

11 Responses to Running R on Amazon’s EC2

  1. […] Running R on Amazon’s EC2 « From Data to Decisions – This is a note for those who use R, but haven’t yet used Amazon’s (EC2 cloud services. […]

  2. […] Running R on Amazon’s EC2 « From Data to Decisions This is a note for those who use R, but haven’t yet used Amazon’s (EC2 cloud services. […]

  3. rgrossman says:

    For those interested in parallel R, you may want to consider some of the products offered by REvolution Computing, which can be used easily with Amazon’s EC2 instances.

  4. fhamilton says:

    Robert, this was a great find being new to R and very helpful. Worked fine from my Mac but, we don’t have (and don’t want) X11 running locally on one of our Windows machines so, we thought we would run your scripts and then save the files as PDFs and then download them from the server configuration. However, it doesn’t seem to work with shell commands so, is there a way you would recommend to retrieve the pdf?

    Thanks, much.

  5. fhamilton,

    Thanks for your comment. I have updated the post to show how to do this by writing the plot to a file and then retrieving the file using scp. Here are the commands:

    To plot the time series xts.return and write the result to a file called ‘ret-plot.pdf’ use:

    > pdf(“ret-plot.pdf”)
    > plot(xts.return)
    > dev.off()

    You can then copy the file from the Instance to your local machine using the command:

    scp -i testkp.pem root@ec2-67-202-44-197.compute-1.amazonaws.com:/root/examples/ret-plot.pdf ret-plot.pdf

    –Bob

  6. mkhayter says:

    Robert,
    Thanks for great information.
    The version of R installed as an AMI is 2.8.0
    Are there any AMI public images with R release 2.9.0 or 2.9.1?

  7. francescamoyse says:

    Hi Robert – agree with other commenters this is fun stuff :-).

    I’m biased, but would nevertheless love to know what you and readers of this blog think of our recently launched service Monkey Analytics (http://www.monkeyanalytics.com) which abstracts the AMI management and generation detailed here and delivers Octave, Python, and now R computation in the cloud on EC2 servers.

    At fhamilton – not sure if we solve your problem just yet, as we aren’t R experts (we spent more time in Matlab / Python / PV Wave in the past), but our approach with GNU Octave and Python is to wrap image / figure generation commands, spit out images, and deliver those in the browser via our web app.

    (R was the number one feature request post launch, and we just got it working a few days ago).

    We’re pretty excited about what we’re up to, and love being part of the discussion about how best to use cloud computing to get science computation done.


    Francesca Moyse | Founder, Monkey Analytics | francesca@monkeyanalytics.com

  8. […] Robert Grossman’s original article offers more detail on running R on EC2, and has some interesting discussion as […]

  9. […] up R on Amazon’s AWS EC2 By azol 1. Did all steps of  Robert Grossman’s Running R on Amazon’s EC2 .Because of win-XP client, on step “Accessing the AMI” -3  […]

  10. vinhdizzo says:

    Hmmm, very interesting. I should contact Amazon instead, but since people here use R in particular, I have a question on pricing:

    http://aws.amazon.com/ec2/#pricing shows the pricing. Is it safe to assume that R users will be on the Standard on-demand, Small, Linux rate?

    If I were to use snowfall on 100 clusters for a total of 5 minutes, will I be billed for 5 hours for pricing? Or is each instance billed separately (rounded up to 1 hour) and I will be billed for 100 hours? I’m asking this because of this note:

    Pricing is per instance-hour consumed for each instance type, from the time an instance is launched until it is terminated. Each partial instance-hour consumed will be billed as a full hour.

    Let me know before I give this a try. Thanks!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: