Cloud computing has become so broad, that it is not practical to gather together all the key papers and projects, but here is a short list that you may find helpful.
What is cloud computing? Cloud computing doesn’t yet have a standard definition, but a good working definition is to define clouds as racks of commodity computers that provide on-demand resources and services over a network, usually the Internet, with the scale and the reliability of a data center.
- NIST has written a white paper that defines some of the basic characteristics of cloud computing. You can find it on their their cloud computing web page.
Introductions to Cloud Computing. I wrote two introductions to cloud computing:
- Robert L. Grossman, The Case for Cloud Computing, IT Professional, volume 11, number 2, March/April 2009, pages 23-27.
- Robert L Grossman and Yunhong Gu, On the Varieties of Clouds for Data Intensive Computing, Bulletin of the Technical Committee on Data Engineering, Volume 32, Number 1, March 2009, pages 44-50.
Infrastructure as a Service (IaaS) Clouds.
Infrastructure as a Service or IaaS refers to computer infrastructure (think virtualized computer hardware) delivered as a service over a network, often the Internet. Usually you can pay for IaaS as you need it, using what is called a utility computing payment model.
- Amazon is the leading provider of infrastructure as a service (one type of cloud computing). You can find more information about Amazon Machine Images (AMI), their Elastic Compute Compute Cloud (EC2), their Simple Storage Service (S3), and related services at aws.amazon.com.
- Eucalyptus is an open source project that supports cloud services with the same APIs as Amazon’s EC2 and S3 web services. Using Eucalyptus, you can build your own private clouds.
Large Data Clouds.
Some of the first clouds were developed as private clouds by companies, such as Google and Yahoo, that used them internally to process and analyze very large datasets.
Google’s internal cloud consists of the Google File System (cloud storage service), BigTable (cloud data services) and MapReduce (cloud compute services). Excellent descriptions of each of these layers is available from the Google Labs web site:
- Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The Google File System in SOSP ’03: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 29–43, New York, NY, USA, 2003. ACMGoogle Technical Report.
- Jeffrey Dean and Sanjay Ghemawat. MapReduce:Simpliﬁed data processing on large clusters, in OSDI’04: Sixth Symposium on Operating System Design and Implementation, 2004.
- Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, Bigtable: A distributed storage system for structured data. In OSDI’06: Seventh Symposium on Operating System Design and Implementation, 2006.
Google’s software implementing GFS/MapReduce/BigTable is proprietary, but there are some open source alternatives available:
- There is an open source Java-based cloud for large data called Hadoop that is broadly based upon the the first two Google Technical Reports listed above.
- There is also an open source C++-based cloud called Sector, that is designed to operate across wide area networks and supports a generalization of MapReduce in which a User Defined Function or UDF can be applied to all the data managed by the cloud. Sector also includes a security infrastructure.
Some information about Sector in be found in these technical reports:
- Yunhong Gu and Robert L Grossman, Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data, Philosophical Transactions of the Royal Society A, Volume 367, Number 1897, pages 2429–2445, 2009.
- Yunhong Gu and Robert L Grossman, Lessons Learned From a Year’s Worth of Benchmarks of Large Data Clouds, Proceedings of the Second Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS 2009), ACM, 2009.
- This is a nice white paper giving an overview of cloud computing service providers: Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy H. Katz, Andrew Konwinski, Gunho Lee, David A. Patterson, Ariel Rabkin, Ion Stoica, Matei Zaharia, Above the Clouds: A Berkeley View of Cloud Computing, February 10, 2009.
- A mind map is an interesting way to organizing certain types of information. Jean-Lou Dupont has prepared a cloud computing mind map of the various vendors, projects and service providers in cloud computing.