|
| Topics this page:
|
|
[0:11] We start this introductory tutorial from
the Amazon Web Services home page
http://aws.amazon.com/
where special offers and events are announced for all of
Amazon's web services.
But the core service is the Elastic Compute Cloud
(called EC2 because for the two C's) Amazon describes at
http://aws.amazon.com/ec2/faqs Frequently Asked Questions page.
The contribution of this tutorial is not to simply repeat
but to analyze
the mountains of product and technical information,
filter out what is duplicate or obsolete,
and integrate 3rd party tools and ideas
so you'll have a logical foundation to understand
the current situation.
We'll talk a bit about
security precautions,
then dive into using this
EC2 Console Dashboard
https://console.aws.amazon.com/ec2/home
and other ways to make use of Amazon's many web services.
With so many AWS products and supporting services, I felt the need to figure out how the various pieces relate with each other in a diagram, like a map of the subway under London, New York, or Boston.
This series of tutorials will give you a tour on each part of the system.
What we want to end up with (and quickly) are trouble-free instances of servers lots of visitors like.
We'll need to frequently make use of this Dashboard, so to get back to it quickly, I recommend we create a browser favorite or even a shortcut the Desktop.
We should also bookmark the EC2 Management Console,
which can be reached if we,
at the upper-right corner, click the Navigation Toggle Icon button to reveal the menu.
The AWS Management Console can be reached at
http://aws.amazon.com/console/ or
http://console.aws.amazon.com/ which may be a bit faster.
From the Management Console Support menu,
pulling down Support >
Technical FAQs commonly answered by
those new to the product.
[] FAQs can be rather random.
So I created this tutorial you're viewing so you won't be intimidated by this mountain of information.
With this tutorial I aim to describe the
essence of all these documents
so that you know where to go when you need more detail.
[1:43] Notice from the bread-crumbs at the top of this page that Technical FAQs are part of Articles and Tutorials, a list several pages long. We can get back to the full list of Articles by pulling down the Resources menu for Articles specific to EC2.
[] These articles, and the Sample Code & Libraries associated with each are more conveniently categorized by topic and programming language in the companion webpge to this video at
Together, I hope to give you an
orderly introduction to the
mountains of information on this technology.
[] To keep up-to-date, go to Amazon's Support Forums for on-going discussions for each AWS technology.
Other on-line hang-outs:
[] Pull down the About AWS menu
for Newsletters from Amazon.
[] To get notified of known availability problems at Amazon, the Support > Service Health Dashboard has RSS feeds you can subscribe to. Page down to scan Status History.
[] Amazon doesn't currently offer reliability calcuations such as mean time between disruptions, mean time to repair, or common Service Level Agreement statistics.
EC2 Getting Started Guide
EC2 User Guide
EC2 Developer Guide
EC2 API Reference
EC2 Command Line Tools Reference
EC2 API Quick Reference Card
[] Just pull down Resources > Documentation
that programmers reference
to create, configure, and manage instances of servers.
Amazon offers information for each of its services in several ways formatted several ways.
In real life, once familiar with the Developer Guide, most developers refer to the Quick Reference because it's organized by object rather than alphabetically by command in the Command Reference.
HTML web pages are convenient even though you may see a momentary lag going from page to page.
Many prefer pdf files for its ability to scale quickly and gracefully while we zoom in and out and search the whole document for a word.
When we click on a Microsoft CHM (Compiled Help Manual) file we need to wait for the download, then extract it to a local folder. Since these files are now downloaded in their entirety, they usually offer fast response time. The other advantage for this bit of hassle is that they are heavily indexed, so that when searching for a word, all topics containing that word is listed.
In reading postings on forums about EC2, I notice that Amazon's security infrastructure has been a source of confusion for many.
To actually do something, Amazon requires me to
sign-up for AWS
using the email and password we use to buy books and other stuff on Amazon.
One of the main reasons why people go to Amazon instead of its upstart competitors is the trusted brand name that Amazon has earned over time.
[0:51]
![]()
[0:21] So when working with AWS, security-concious people prefer to use a new email address different than their personal email, creating an email name containing numbers as well as letters.
Emails to that account can then be forwarded to a common email address.
All this applies even if you use an 3-factor authentication token from Amazon.
But many prefer to use other console clients Amazon also implemented based on it API.
Elasticfox Firefox Extension for Amazon EC2
as an add-in to the Mozilla Firefox browser.
Like other Firefox add-ons, click
download from this link to Amazon when you are in Firefox, which will recognize the
xpi file extension and install it automatically.
Later, Firefox will also automatically update this add-in when one becomes available.
JavaScript Scratchpad for EC2,
CloudWatch.
Auto Scaling.and
Load Balancing
Several third-party developers have also used Amazon's API to implemente clients:
Several clients use Amazon to offer interoperability with clouds other than EC2:
Help! If you know of another AWS clients, please let me know so I can add it to this list.
Back at the AWS Management Console, before signing into the EC2 Dashboard, page down to take a tour of it.
The EC2 Console is used to launch, reboot, and terminate (or kill) instances of EC2 servers.
The software available on the server at time of launch is defined by what is bundled in the Amazon Machine Image (AMI) used to launch the instance.
Each instance launched is identified by an Instance ID.
Most people make use of Amazon EBS (Elastic Block Storage), which are like SAN volumes that look like local hard drives. Snapshots can be taken to backup each volume at a moment in time.
Amazon first offered Linux images, then added Windows images.
Instead of using passwords, Amazon associates a pair of keys to each server.
But regardless of what you use, you need to be at the Management Console for account-level information and documentation on EC2.
[1:04] In the sign-up page, Amazon at this point offers instances in two Availability zones, one in us-east and one in Western Europe (eu-west).
The identifying code for the Availability Zone where each partcular instance is located (such as "us-east-1a") consists of several parts: a location (such as us or europe), a region (such as east-1 or west-1), and the zone code (from a to, in June 2009, d).
As of August 2009, availability Zones are not the same across accounts. The availability zone us-east-1a for account A is not necessarily the same as us-east-1a for account B. Zone assignments are mapped independently for each account.
??? One example of a service_endpoint is
https://us-east-1.ec2.amazonaws.com
This is set using the EC2_URL environment variable.
BTW, you can see the exponential growth in overall AWS usage since 2005 by tracking traffic stats to the amazonaws.com domain name.
Instances located in Europe costs more than those in the US.
Each image can be applied to instances of different sizes.
[1:04] Windows instances are slightly more than Linux instances. The big difference between what Amazon offers versus traditional internet service providers is this billing by the hour rather than renting a month or year at a time.
Scaling Activities (to add or remove instances) occur according to LaunchConfiguration specifications for each AutoScaling Group within each Availability Zone.
Scaling activities actually occur when a scaling trigger is activated, such as when a metric's upper or lower boundary threshold is breached for the duration previously specified as significant.
People have found wide variation in the time a takes for EC2 instances
to progress through the various states over the lifespan of each instance:
Pending, InService, Terminating, and Terminated.
To prevent triggers from continuously being re-activated. a cooldown period (set by parameter) gives the system time to perform and adjust to new scale in and scale out activities that affect capacity size.
Auto Scaling is a free service, but it is dependent on CloudWatch services for which there is a fee.
In additional to the usual trio of name, unit of measure (such as seconds or bytes), an optional timestamp, each measure (observed value) CloudWatch stores with a set of dimensions used to aggregate measures.
The dimensions include Amazon Machine Image, Availability Zone, and AutoScalingGroup.
When actions are performed using Amazon's web page, we provide a password we specified during account set-up. Since we came up with the password and are supposed to keep it secret, we cannot repudiate (deny) we actually made the request.
Programs (such as ElastiFox) making requests over the internet need an equivalent of providing a password. But because others can read what is sent over public data lines, Amazon uses a clever but complex technique to not require revealing a shared secret.
[3:01] Unlike a password, Amazon can warn us to never email our Access Key to others, even to Amazon itself, because Amazon uses a clever technique so that we never have to reveal it.
This approach uses a pair of two keys, a private and a public key generated during account set-up in the Access Identifiers page.
The 20-character Access ID is (Public) in that Amazon wants you to include it, as shown, in every Amazon web service request made.
[2:28] To make this information available to programs making calls to Amazon, we create a folder such as "C:\EC2" and Show, double click, and Ctrl+C or right-click to copy, then paste. ???
Programs can find the location of these files through environment variables such as:
set EC2_PRIVATE_KEY=c:\ec2\pk-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem set EC2_CERT=c:\ec2\cert-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem
The 20-character, alphanumeric Access Key ID Amazon generates for each user account should be copied and pasted into application source code making calls. This is so that Amazon can (if it needs to) get in touch with the developer behind each request.
[3:11] Some companies issue corporate policy statements dictating that keys should not be given to public web apps which ask for it in order to communicate to Amazon on our behalf.
[3:21] This may be a bit too drastic, because unlike social security numbers, another key can be Generated, and Amazon will take care of invalidating obsoleted keys.
Key pairs can be generated using this API command:
ec2-add-keypair keypair_name privatekey_file_name
The 40-character (Private) Secret Access Key is NOT like a password commonly presented to validate our identity.
Amazon's
HIPAA
compliance white paper
describes the key pair as "2048-bit
RSA".
This clarifies information in Amazon's
Developer Guide for EC2 and other AWS services.
[3:30] When Amazon generates the Secret Key, they keep a copy of it.
In programs we write to communicate with Amazon, we combine the Access Key with the data we send to Amazon into a hash formula to create a digest. This digital signature is what is sent, not the Access Key.
When Amazon receives the payload, it extracts the data portion and uses its own copy of our secret key to re-calculate the digital signature.
If Amazon ends up with the same hash we gave it, Amazon knows the key is verfied genuine.
[4:06] All this only works if everyone uses the same technique for creating the hash digest. In fact, Amazon's approach is public. The HMAC+SHA1 (Hash Message Authentication Code was defined in 1997 defined as RFC 2104 by clever folks at IBM and UCSD. SHA1, the Secure Hash Algorithm version 1, was defined in 2001 as RFC 3174 to replace the MD5 algorithm commonly used to detect whether all pieces of a downloaded file were received).
[4:37] One may wonder whether all this hashing business slows down the computer even a little? Well, in the words of my teen-age daughter: "YEAH!" But, in the words of my teen-age son: "SO? I have monster hardware."
Note that the hashing described above is a less complicated subset of the full PKI (Private Key Infrastructure) which enables traceability of authenticity back to a public CA such as Network Solutions.
But AWS doesn't check X.509 certificate revocation lists (CRLs) to determine if the certificate has been revoked, nor does AWS validate the certificate with a certificate authority (CA) or any trusted third parties.
AWS uses X.509 certificates only as carriers for public keys and does not trust or use any identity binding that may be included in X.509 certificates.
So Amazon uses information within the certificate
only to authenticate
requests.
The WS-Security 1.0 specification requires you to sign the SOAP message with your secret key and include the X.509 certificate in the SOAP message header. Specifically, you must represent the X.509 certificate as a BinarySecurityToken as described in the WS-Security X.509 token profile (also available at the OASIS-Open web site).
To authenticate requests using the SOAP protocol. (instead of using the Web Reference tool built into Visual Studio)
When this tutorial was written, X.509 certificates cannot be used with requests sent using JSON and other REST formats.
The Security Group associated with an instance defines the ports and protocols allowed.
People generally specify Remote Desktop (RDP) port 3989 and SSH port 20/22 to transfer files. Without these, you won't be able to get into the server because you can't touch the machine.
[] Production systems should limit have public IP address ranges limited (using a slash to designate the IP range, as in 103.55.22.234/32).
Security Group settings cannot be changed after an instance is launched with them. So I recommend that you create a unique security group for each instance.Each instance launched restores to disk a specific AMI (Amazon Machine Image) on a virtual machine with the specified hardware type.
The
Security Whitepaper outlining Amazon's internal security practices reveals that
the open-source
Xen hypervisor
(not VMWare) is used to virtualize servers.
QUIZ: How many instances can be created per account by default?
ANSWER: A request for more than 20 instances at once requires additional request form.
To get a list of AMI files available and their image ID's:
ec2-describe-images -o self -H --show-empty-fields
Commands to EC2 can also be issued from your client machine's Run command line if you download executables which implement the Amazon EC2 API file ec2-api-tools.zip.
This API enables us to create batch command files to automate our work. But remember that its executables are in the bin folder below the folder installed from Amazon and specified in the "EC2_HOME" environment variable. So specify the Windows PATH evironment variable with the bin folder.
EC2_HOME= ec2_tools folder (above bin sub-folder) JAVA_HOME= where java 1.5.0 JVM is located set EC2_PRIVATE_KEY=c:\ec2\pk-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem set EC2_CERT=c:\ec2\cert-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem
TBD
you may want to setup a Secure FTP server in your instance, then use WinSSH or other tool to transfer the files.
The FTP service on Windows 2003 machines are off by default because the FTP protocol transmits passwords over the public internet.
So most system administrators prefer to install an SFTP service to transfer files into Windows servers. This is a subset of the SSH (Secure Shell) service.
The additional security provided by SFTP and SSH require a certificate to authenticate each user.
SSH references key information in the Key Pair files created by Amazon (with file extention "pem").
However, the format of the certificate used for this purpose is not in the &quuot;PPK" format that the SSH program expects,
So a conversion program needs to be run.
Instead of a .PEM file, more recent versions of software instead look for a .PPK file.
Sample C# code to call EC2 SOAP are here and monitoring code
boto (named after a type of dolphin in the Amazon river) provides an open-source library for Python. Its creator, Mitch Garnaat, offers this mashup of ripping videos on EC2 from files in S3, all coordiated using SQS. [ Comments on this at Amazon]
The nice thing about EC2 is that rather than rebooting an instance, we are better off switching to a new instance. This overlap buys us the option of going back to the first server.
Similarly, rather than updating a server, we can take a more disciplined approach of tweaking the configurations that build a server.
But we can describe the API commands do it.
ec2-run-instances imageID -n 2 -k keypair_name
The -n parameters specifies the number of instances.
The -d data. allows a small amount of data to be passed to new instances, such as command line arguments, or the address of a database server.
The -f parameter specifies a file containing more parameters. This is especially handy for Windows users who cannot use the \ line continuing like Linux users.
The problem with commodity cloud services is that hardware is usually limited.
For one thing, you are limited to a single Ethernet interface.
Another thing is that newer transport protocols such as SCTP or DCCP are not possible due to limitations of transport layer protocols (TCP and UDP)
With Amazon, one doesn't have the flexibility to add encryption boards or separate hardware for firewall, load balancing, and other specialized functions perhaps handled more efficiently by dedicated hardware appliances.
Amazon offers a massively parrallel Hadoop MapReduce batch only on Linux instances. Alternative are CouchDB and SimpleSB.
The password used to access the server's all-powerful Administrator account is decrypted using the secret key portion of the Key Pair specified for the instance.
Once in the Windows server, create regular users with the password of your choosing.
For security, some companies change the default account name Administrator to something else. Is this possible ???
This initial setup on each machine is performed by the ec2config service running (from the Program Files\Amazon folder) as "LocalSystem". It syncs the instance clock with a time server.
Within a command prompt, an ipconfig lists the IP Address.
Right-click on the Computer icon. Memory
If you need environment variables, I suggest running a command batch script.
The snapshot can be a default such as one of these:
| Location | 32-bit | 64-bit | ||
|---|---|---|---|---|
| Enterprise | DataCenter | Enterprise | DataCenter | |
| US | snap-bb10f6d2 | snap-8010f6e9 | snap-d010f6b9 | snap-a310f6ca |
| EU | snap-a4bb5ecd | snap-b8bb5ed1 | snap-a6bb5ecf | snap-babb5ed3 |
When attaching a volume to a Windows instance, a device id is specified.
On Linux machines devices have names such as /dev/sdh.
On Windows machines devices have names from xvdf through xvdp.
The volume should now be visible to diskmgmt.msc MMC and to API command ec2-describe-volumes, which displays the ID and Size (in GB) of volumes attached to the instance where the command is run.
QUIZ: How many volumes can be allocated per account?
ANSWER: If an account wishes to allocate more than 20 volumes, request more.
QUESION: What happens to data stored on the C:\ drive of an instance when it terminates?
ANSWER: They disappear. The drive goes away when the instance evaporates. This is why most people store data in Amazon's EBS (Elastic Block Storage). The name "block" because they transfer data in small blocks as local drives do. The hardware behind EBS is a SAN (Storage Area Network).
This way, data is independent from the lifetime of instances.
SAN drives are usually much faster and bigger than local drives. Each volume can individually hold up to 1TB of data, but can accomodate more by configuring a RAID0 drive across up to 40 volumes, which will cost about $4,000 per month.
A volume should only be detached from a Windows instance (using the ec2-detach-volume command) only after running Unix-like utilities (from sysinternals) handle (to see if any file handles are open on the EBS Volume) and sync (to flush all file system data to disk), followed by a Windows command to /dismount, such as this example for volume e:
Data is backed-up from an EBS Volume by issuing a command such as:
Snapshot files are stored with other flat files in Amazon's S3 file servers.
The response contains the snapshot ID and a "pending" flag until the snapshot is "completed"
The status of all snapshots is obtained using the list command
Restoring a particular snapshot back to the volume would bring the volume to the point in time when the snapshot was taken.
Behind the scenes, Amazon has a way of allowing several snapshots to be taken without using up disk space for entire separate copies of a volume. It's a form of incremental backup. Internally, Amazon keeps track of two lists. One is when each snapshot was taken, and another list is when each separate block of data bits in the volume was changed.
So if several snapshots are taken when nothing happened on the server, the snapshot doesn't really use up any more space to store the backup.
QUIZ: What command is used to restore a snapshot to a volume?
Some create a service that kicks off automatically to perform snapshots on a regular basis.
To store individual media files, use Amazon's S3 (Simple Storage Service). Instead of a folders, S3 uses the word "buckets" because S3 doesn't offer a directory one can traverse. With S3's flat file system, the 255 maximum characters identifying each bucket includes the bucket's path.
Each S3 bucket can contain any number of "objects"
Several web apps lets you view S3 buckets from a web browser:
But I don't use them because the ask for my Private Access Key.
Visitors enjoy a faster experience if images and other resources are served from a different host than the HTML page. Traditionally this means standing up another server.
But Amazon's CloudFront service offers an alternative by serving resources from S3 (Simple Storage Service) buckets in duplicate locations throughout the world at the edge close to users, much like what Amazon's competitor Akamai has been doing.
The problem is that it may take up to 24 hours for files to propagate.
Another alternative to reduce data transfer bills if you can get visitors to use a BitTorrant client software to access peer machines mirroring *.torrant files originating from Amazon ( annoymously).
Logs are ideally stored in a way that enables lots of data from many sources to be stored quickly, such as Amazon's SimpleDB, which stores items not in a database schema but in a domain containing attribute-value pairs such as a time-stamp and associated message. Since values in SimpleDB are text-only, special programming is needed to zero-pad numbers, mask negative numbers, structure dates in sortable ISO 8601 format, etc.
Since storing data in SimpleDB involves sending data over Amazon's internal network, some latency is added. This is why Amazon Architect Werner Volgels claims not instant but "eventually consistent" updates in SimpleDB.
Each item in a SimpleDB domain can be associated with up to 256 text attributes of 1,024 bytes or less.
These are reasons why some prefer a hybrid solution by using SimplDB commands to store data in a local drive on the server using
Once data is safely on the local drive's M/DB, it can then be replicated to SimpleDB in real-time or on schedule after hours.
Collections within SimpleDB are accessed using Amazon's own lightweight query language to get back XML. Results from queries are returned with a BoxUsage value which shows the amount of system resources used by the operation.
When Amazon announced this on October 27, 2009, it offered various sizes of MySQL 5.1 (an open-source product owned by Sun, which was acquired by Oracle).
Wish Amazon would put up PostgresSQL as well because MySQL does not offer some enterprise features.
At launch, Amazon assigns to each instance two sets of DNS URLs (Universal Resource Locator) and IP addresses: one for use internally and one for use on the public internet.
The Private (Internal) URL Amazon generates at launch (like "domU-12-31-35-00-35-F3.compute-1.internal") is used to access an instance serving as a database server. (Good to know since Amazon doesn't charge for data transferred among Internal URLs within the same Zone. But you will be charged for network usage if an instances references a public IP address. )
The Public (external) URL Amazon also generates at launch (like ec2-72-44-45-204.compute-1.amazonaws.com) sysadmins use with Windows Remote Desktop (RDP) to reach into the instance.
When a visitor types in the Public URL into an internet browser, the public internet routes the visitor to the Amazon.com DNS (Domain Name Service) server, which then retrieves the Public IP address associated with the Public URL.
The Public IP is a globally unique IP directly addressable from anywhere on the Internet.
But, unlike with dedicated servers not in a virtual cloud, this address is not assigned to an interface card on the virtual machine instance.
Within the Amazon data network, NAT (Network Address Translation) occurs (per RFC 1618) to map IPs to the actual physical address of the server.
Amazon assigns a RFC-1918 (private) address.
By default each Amazon account can allocate 5 such IPs, but more can be requested.
an Elastic IP can be associated with the Public IP of an individual instance or a Load Balancer designed as a single entry point to forward traffic among several instances.
This can be done using the command
ec2-associate-address
Once associated, get the DNS name (such as "ec2-75-101-137-243.compute-1.amazonaws.com") for the instance associated with the Elastic IP address using the ec2-describe-instances command. This is the name to use for domain name resolution.
When an EC2 instance queries this external DNS name of an Elastic IP, the EC2 DNS server returns the internal IP address of the instance to which the Elastic IP address is currently assigned.
Amazon's approach may be slower than dedicated servers in taking over an ip. This may be critical for high-availability environments.
There are separate dashboards for other Amazon services such as
MapReduce batch processing and CloudFront, which we'll be describing later.
Although the public URL or Public IP address can be given visitors to reach a particular instance, it is generally not done beacause they will change when the instance goes away.
We generally want visitors to reach our website by specifying our own domain name in their internet browser.
This is arranged by first purchasing your domain from a Registrar, then tell them the Elastic (static) IP you associated with the Load Balancer in front of your application instance.
But if you only have one server, you would provide the Public IP Amazon assigned to your instance, and then change it whenever you create a new instance, since the IP address to an instance changes.
Your Registrar will propagate this information on (Domain Name Service) servers throughout the internet. This usually takes several hours.
To enable visitors to use the HTTPS protocol over port 443, one needs to buy a SSL (Secure Socket Layer) certificate from a root Certificate Authority recognized by Microsoft, Firefox, and other browsers. This proof of identity is so visitors which proves ownership of domain names.
Know of other bloggers? Please let me know!
Know of others? Please let me know!
Amazon began offering web services in Fall 2006.
Book Cloud Architectures (April 2009) by George Reese.
Book Dot Cloud: The 21st Century Business Platform Built on Cloud Computing by Peter Fingar (author of Extreme Competition)
The Big Switch by Nicholas Carr.
Google Book: Programming Amazon Web Services by James Murty [ Example Code]
How to Install DotNetNuke 4.3.x from AWS EC2 Creative magazine
Video: Cloud Computing Explained
Video: Windows On EC2 at Cloud Camp Atlanta 09 by Sudesh Oudi of Service Xen Part 1, Part 2, Part 3, Part 4, Part 5
Creating Applications with Amazon EC2 and S3 (O'Reilly OnLamp) by Judith Myerson (05/13/2008)
This is a larger HD version of tektut's video on YouTube dated January 26, 2009.
Open Grid Forum BOF Session March 2009
REST in practice for IT and Cloud management (part 1: Cloud APIs) by William Vambenepe
Video: Amazon's EC2 Tutorial (Linux) by Mike Culver, Web Services Evangelist
Private Cloud within a Public Cloud
|
| ||
|
| ||
| Your first name: Your family name: Your location (city, country): Your Email address: |
Top of Page Thank you!
Human verify: | |||