EC2 Storage design
Abstract
Storage design on EC2 presents several unique challenges and opportunities. The elastic nature of EC2 systems makes using traditional storage schemes problematic because:
- Upscaling storage is a complex manual task on NFS/CIFS and SAN with regards to both capacity and perfomance
- Service locations are not static on EC2 (IP addresses may change frequently)
- Low level storage is usually fixed and does not allow easy adjustments
These requirements are due to the characteristics of the system using the storage. In addition the available storage mechanism on EC2/AWS are very different in nature from traditional storage devices and thus efficient use may need different strategies.
Overview of EC2/AWS Storage services
* EBS - nonvolatile high performance storage volumes that can be allocated with size ranging from 1GB to 1TB. * Cannot be resized * Cost is calculated per GB/month, no dependency on the number of EBS volumes * Physically mirrored * Can be snapshotted to S3 and restored from such a snapshot * Instance volatile storage
* S3 - nonvolatile storage with soap/http interface. No storage limits, interface is file based and thus no partitioning or filesystem is necessary. Usually some middleware or API are required to use in applications - S3 fuse filesystems exist but are limited and generally unstable. Can be (and often is) used for directly serving web content.
Distributed file systems
NFS
NFS on EC2 is not a good idea. While it does work, HA setup is impossible (no ip failover) and the volatile nature of EC2 instances makes NFS a nightmare to administer. Also, if you have large datasets you will encounter performance and admin issues because NFS doesn't deal with volume management and replica selection is a lame excuse for real parallelism.
If you really must use NFS, follow these guidelines:
- Mount with the
soft,intr options because your server will die eventually
- Export with
async
- Export to
10.0.0.0/8 - you don't know what subnet your instances will be on, or better yet, use kerberos
- For performance, use high cpu machine and RAID
EBS RAID
When storage requirements exceed 1TB and S3 is not a viable option, or when performance really counts, EBS volumes can be aggregated as a RAID volume. Amazon currently does not allow RAID in their API, so
mdadm must be used. Depending on the load and number of drives, md raid may require heavy cpu usage so using at least a medium instance is a must. From my experience, IO cpu activity is a major concern on EC2 instances - just have a look at
top (check st!) or
iostat when running a benchmark or IO intensive task.
Important Note: EBS volumes are accessed through 1 Gbit/sec interface. This means that adding volumes to your RAID will not help performance once you exhausted the bandwidth, this usually happens (on heavy loads) after 2 ~ 3 volumes.
RAID levels
The age old question of which RAID level to use is somewhat more complex when using EBS volumes because EBS volume are already mirrored under the hood. In fact, when creating RAID 0 you are actually creating a kind of RAID 1+0. This sounds very good but
don't make the mistake of treating this setup as RAID 1+x because you don't have control of the mirroring part. As far as you're concerned you're using regular disks that are very reliable. Without any knowledge about the physical layer of drives any assumptions regarding reliability or performance of the RAID may be false - some EBS volumes might share the same bus or even the same physical disk, a software bug in Amazon's system may kill all of your EBS's or it may happen that all of the underlying disks are of the same defective manufacturing series, you get the point.
Snapshotting EBS RAID
Snapshotting EBS RAID can be quite a pain. Consistency must be ensured for both the filesystem and the RAID volume. Usually freezing the the filesystem/device mapper is enough, but when RAID is used there are additional IO buffers that need to cleared. So the process for a snapshot would be:
- Freeze filesystem/device mapper
- Wait for IO buffers to sync (
blockdev --flushbufs and maybe also sleep 1)
- Initiate snapshots on all EBS volumes participating in the RAID
- Unfreeze the filesystem
Note 1: Performance is degraded throughout the EBS snapshot creation and RAID is no exception. However, when using RAID 4,5,6 degradation may be extremely severe because RAID 4,5,6 implement additional XOR operation when writing and must read from all stripes on each read op (when read size>stripe size).
Note 2: The snapshots resulting are an atomic set and cannot be used if some are missing (you can lose 1 with RAID 5, 2 with RAID 6). When restoring you need to recreate the entire RAID set as EBS volumes.
Multilevel storage
TODO
Performance considerations
- EBS volumes IO give a performance penalty on small instances due to stolen ticks from underlying machine IO
- It's easy to saturate the underlying machine network when using multiple EBS volumes, EBS is SAN and probably gigabit connected
--
AvishaiIshShalom - 12 Nov 2009