Membase In The Cloud: Part 1

Membase_introduction

Membase is an open source, high performance, highly available, distributed NoSQL key-value database management system (see NoSQL In The Cloud) optimized for persisting data behind modern SaaS applications. Companies using Membase include Zynga, PayPal, Vodafone and Microsoft.

Membase processes client requests with low (sub-millisecond) latency and high sustained throughput, with the ability to scale from a single Membase server to a cluster of potentially thousands of servers.

Memcached compatibility

Memcached is a distributed memory caching technology and a core infrastructure component behind 18 of the top 20 most heavily-trafficked websites (including Google, Facebook, Twitter and Wikipedia). It is often deployed alongside relational databases, caching data and objects in RAM and reducing the number of times the relational database must be read.

Membase is a drop-in replacement for Memcached. On-the-wire compatibility means that existing Memcached code works as is. Membase is also:

Simple

Everything about Membase is designed to be easy and low maintenance. Getting, installing, managing, expanding and using it. As a NoSQL database, there is no need to create and manage schemas, and also no need to normalize, shard or tune the database.

Fast

Membase is arguably the lowest latency, highest throughput NoSQL database technology available.

Elastic

Scaling a Membase cluster in the cloud is as easy as starting up (or terminating) virtual machine instances and hitting the rebalance button. Data is automatically re-distributed across the cluster, increasing or reducing aggregate I/O and storage capacity. From a high availability perspective, there are no single points of failure in a Membase cluster.

Membase_install01

Membase Concepts

You can interact with and administer a Membase cluster either through a web console, command line interface or a REST API.

Cluster Manager

The Cluster Manager runs on each Membase Server and provides the following services:

  • Cluster Management
  • Node startup and shutdown
  • Node monitoring
  • Gathering of statistics
  • Logging
  • Security

Client applications access these services via the admin port (8091) and data port (11211).

Node Memory and Disk Management

Membase automatically manages storing objects between disk and memory. For performance reasons object metadata are always kept in memory.

A memory quota is set with configuration and should not be more that 80% of the total physical RAM on the node. The quota set for the first node in the cluster is inherited by all nodes subsequently joining the cluster. Membase automatically migrates from memory to disk when the quota is reached.

Data Buckets

Data management services are provided through virtual data containers called buckets. A bucket is a logical grouping of physical resources within a cluster.

Membase bucket types:

  • Memcached
    Distributed, in-memory, key-value cache. Memcached buckets are designed to be used alongside relational databases, caching frequently-used data, thereby reducing database server load.
  • Membase
    Highly-available and horizontally scalable data storage. Membase buckets are 100% protocol compatible with Memcached.

Membase bucket-type capabilities:

  • Persistence
    Data objects can be persisted asynchronously to hard-disk resources from memory.
  • Replication
    Replica servers can receive copies of data objects in the bucket. A replica server can be promoted if the host server fails, providing a highly available cluster via fail-over.
  • Rebalancing
    Rebalancing enables dynamic addition or removal of buckets and servers in the cluster.

In Part 2 I will look at installing Membase on Amazon EC2.

NoSQL In The Cloud

A fundamental mismatch between the architectural requirements of modern SaaS applications and traditional relational database technologies are driving high-tech internet companies to look elsewhere, and NoSQL database technologies are emerging as the preferred data persistence stores, helping to overcome the constraints, complexity and costs of relational databases.

Nosql_introduction

NoSQL is a term used to classify next generation databases that differ from traditional RDBMSs (Relational Database Management Systems) like MySQL, Oracle, SQL Server and DB2 in some of the following ways.

  • Horizontal Scalability
  • Very High Availability
  • Massive Data Stores
  • Schema-free
  • Distributed Architecture, MapReduce support (software framework introduced by Google to support distributed computing on large datasets)
  • BASE (Basically Available, Soft state, Eventual consistency) as opposed to ACID (Atomicity, Consistency, Isolation, Durability)
  • Open Source

Why NoSQL?

Modern SaaS applications are built to scale out (horizontal scale). Simply add more application servers behind a load balancer to support more users. Scaling out is also a fundamental concept of cloud computing. With an elastic compute cloud virtual machine instances can be easily added or removed to match demand, and elastic cloud architectures naturally requires an elastic data persistence tier.

Relational databases, optimized for client-server applications, is primarily a scale up (vertical scale) technology (see MySQL In The Cloud: Part 1). Scaling up means you have to get a bigger server, and bigger servers tend to be complex, proprietary, and disproportionately expensive unlike the low-cost, commodity hardware typically deployed in SaaS and cloud architectures. Also, there is a limit to how big a server you can purchase.

As organizations start re-architecting for the cloud, it is critical to consider all the cloud computing requirements,  and specifically the need for an elastic data persistence tier.

Polyglot Persistence

Having said all of the above, and also considering that arguably the majority of data behind modern web applications will benefit from elastic data solutions, there are exceptions. For data where relational technology is a better match, relational databases can and should be used.

Most of the production NoSQL implementations that I know of takes a polyglot persistence approach whereby NoSQL databases are used in combination with relational databases. The term "no SQL" can be misleading and I personally prefer the NoSQL approach translated as "not only SQL".

Scaling SQL

It is also important to note that relational databases can be scaled horizontally (by adding read replicas and sharding for example - see MySQL In The Cloud: Part 1). The point however is that relational databases are notoriously difficult to scale out, and that NoSQL architectures with horizontal scalability built in from day one, scale out effortlessly.

An interesting recent development is a focus on new technologies that will help you scale SQL in the cloud. Some of the technologies that I am keeping an eye on: Akiban Technologies, ScaleBase, and Xeround.

Amazon Relational Database Service (RDS) also goes a long way to help you scale SQL in the cloud (see MySQL In The Cloud: Part 2).

NoSQL Databases

Some of the more notable NoSQL databases categorized by their manner of implementation:

Document Store

Key Value Store

Wide Column Store

Graph Databases

In future posts I will be looking at some of these NoSQL databases in detail.

Membase In The Cloud ]

Amazon Elastic Load Balancer

Before starting up an Elastic Load Balancer (ELB) lets take a quick look at ELB features.

Elastic Load Balancer Features

  • Elastic Load Balancing distributes incoming traffic across multiple Amazon EC2 instances. ELB can be enabled within a single Availability Zone (independent infrastructure in a physically separate location) or across multiple zones.
  • ELB can detect the health of Amazon EC2 instances. When it detects unhealthy load-balanced Amazon EC2 instances, it no longer routes traffic to those Amazon EC2 instances instead spreading the load across the remaining healthy Amazon EC2 instances.
  • Elastic Load Balancing supports the ability to stick user sessions to specific EC2 instances.
  • ELB supports SSL termination at the Load Balancer, including offloading SSL decryption from application instances and providing centralized management of SSL certificates.
  • Elastic Load Balancing metrics such as request count and request latency are reported by Amazon CloudWatch.

SSL Termination

Amazon recently announced support for SSL termination on Elastic Load Balancer. Main advantages:

  • Encryption and decryption load is removed from your instances and handled by ELB.
  • Reduced administrative complexity of your X.509 certificates. Instead of installing and maintaining on all your web server instances you only have to install on your Load Balancer.

Another very useful ELB feature is to be able to tell if the HTTP requests arriving at your EC2 instances are been transmitted across the Internet using HTTPS. The following headers are injected when the request arrives at the Elastic Load Balancer:

  • X-Forwarded-Proto specifies the protocol ("http" or "https") of the original request made to the Elastic Load Balancer.
  • X-Forwarded-Port specifies the port of the original request.

Starting Elastic Load Balancing

Launch the Amazon EC2 Management Console, select "Load Balancers" from the left column of the console, and then select "Create Load Balancer". This will start the Create New Load Balancer wizard.

Elb_wizard1

On page one configure the ports and protocols for your load balancer. Traffic from your clients can be routed from any load balancer port to any port on your EC2 instances.

If you include a secure HTTP (HTTPS) listener, you will have to upload your SSL Certificate in the next screen.

Elb_wizard2

Next configure the Load Balancer health check rules. Enlarge the screenshot below to see the detailed onscreen descriptions.

Elb_wizard3
Select the EC2 Instances to add to your Load Balancer.

Elb_wizard4-2
Finally, you can review your Load Balancer configuration before launching.

Elb_wizard5-2

GlusterFS 3.1 on Amazon EC2: Part 4

In Part 3 I finished the configuration and started the GlusterFS replicated volume. The Gluster storage cluster volume can now be mounted on client machines.

Gluster Clients

Gluster offers three different ways for client machines to access volumes in the storage cluster.

  • NFS - The open standard Network File System protocol originally developed by Sun Microsystems. One of the advantages of using NFS is that no client side Gluster installation is required.
  • CIFS - Common Internet File System, also known as SMB (Server Message Block). The recommended method for accessing volumes when using Microsoft Windows-based clients and SAMBA clients in GNU/Linux environments.
  • Gluster Native Client - The Gluster Native Client is a POSIX conformant, FUSE-based client running in user space. Gluster Native Client is the recommended method for accessing volumes when high concurrency and high performance is required.

Installation

I will be using the Gluster Native Client. Before you begin installing the native client you need to verify that the FUSE module is loaded on the client.

Filesystem in Userspace (FUSE) is a loadable kernel module for Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a "bridge" to the actual kernel interfaces.

Add the FUSE loadable kernel module (LKM) to the Linux kernel using the following command.

$ sudo modprobe fuse

Verify that the FUSE module is loaded.

$ dmesg | grep -i fuse
[ 1030.962206] fuse init (API version 7.13)

Install GlusterFS 3.1 on the client machine.

$ cd /tmp
$ wget http://download.gluster.com/pub/gluster/glusterfs/3.1/3.1.0
       /Ubuntu/glusterfs_3.1.0-1_amd64.deb
$ sudo dpkg -i glusterfs_3.1.0-1_amd64.deb

Create a mount directory and then mount the Gluster volume (created in Part 1 - Part 3).

$ sudo mkdir /mnt/glusterfs 
$ sudo mount -t glusterfs 10.229.79.229:/gluster-volume
       /mnt/glusterfs

Test the mounted volume.

$ sudo mount
.
10.229.79.229:/gluster-volume on /mnt/glusterfs type fuse.glusterfs

As a final test I usually copy a file to the mounted volume and verify that the file is replicated to both server bricks.

Firewall Settings

According to the GlusterFS documentation ports 111, 24007, 24008, 24009 (24009 + number of volumes) must be open on all Gluster servers. For NFS ports 38465 (38465 + number of Gluster servers) must also be open.

In the Amazon Cloud I create Security Groups for each functional cluster (for example AppServer Cluster and Storage Cluster) and then allow connections via these Groups. For example, by allowing the AppServer Cluster Security Group to connect to the Storage Cluster Security Group, all the ports will automatically be accesible.

GlusterFS 3.1 on Amazon EC2: Part 3

In Part 2 I prepared the EBS volumes, attached them to the storage bricks, formatted, and then mounted the volumes.

In Part 3 I will complete the configuration of the GlusterFS distributed replicated volume and then start the volume.

Create export directories on both instances and bind them to your EBS mounts. On Brick 1:

$ sudo mkdir /export
$ sudo mkdir /export/brick01
$ sudo mount --bind /mnt/ebs_disk01 /export/brick01

On Brick 2:

$ sudo mkdir /export
$ sudo mkdir /export/brick02
$ sudo mount --bind /mnt/ebs_disk01 /export/brick02

Create the Gluster volume using the Gluster command line interface on Brick 1.

$ sudo gluster volume create gluster-volume
       replica 2 transport tcp
       10.229.79.219:/export/brick01
       10.229.122.230:/export/brick02
Creation of volume gluster-volume has been successful

Now start the volume.

$ sudo gluster volume start gluster-volume
Starting volume gluster-volume has been successful

You can check the volume status with the gluster volume info command.

$ sudo gluster volume info
.
Volume Name: gluster-volume
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 10.229.79.219:/export/brick01
Brick2: 10.229.122.230:/export/brick02

The Gluster distributed replicated volume is now up and running. In Part 4 I will mount the Gluster volume on client machines.

GlusterFS on Amazon EC2: Part 2

Updated: October 18, 2010 (Original Post: August 17, 2010)

Part 2 is shared for both the GlusterFS 3.1 series (Part 1-4) as well as the GlusterFS 3.0.5 series.

For Amazon EC2 file system storage volumes you can either use the ephemeral local instance stores or off-instance Elastic Block Store (EBS) volumes that persists independently from the life of an instance.

Amazon Elastic Block Store (EBS)

Because you have multiple replicated copies of files in the GlusterFS storage cluster you can potentially use the local instance stores provided that you keep backup copies in a different availability zone. I will however be using EBS volumes mainly because of the following reasons:

  • Each storage volume is automatically replicated within the same availability zone which improves EBS volume resiliency (prevents data loss due to failure of any single hardware component). With an annual failure rate (AFR) of between 0.1% and 0.5% EBS volumes are 10 times more reliable than typical commodity disk drives.
  • Amazon EBS provides the ability to create point-in-time snapshots of volumes, which are persisted to Amazon S3. These snapshots can be used as the starting point for new EBS volumes and protect data for long-term durability.
  • Amazon CloudWatch exposes performance metrics for EBS volumes, giving you insight into bandwidth, throughput, latency and queue depth.
  • You can attach multiple volumes to an instance and stripe across the volumes to improve I/O performance.
  • I find this one a bit surprising and plan on running run my own benchmarks, but according to Amazon: "The latency and throughput of Amazon EBS is designed to be significantly better than the Amazon EC2 instance stores in nearly all cases."

Other interesting facts about EBS:

  • Amazon EBS volume can be anywhere between 1 GB and 1 TB in size. Multiple volumes can be mounted to the same instance.
  • Amazon EBS volumes can only be attached to instances in the same availability zone and only to one instance at a time.
  • Amazon EBS snapshots are differential backups, i.e. only the blocks on the device that have changed since your last snapshot will be incrementally saved.
  • Snapshots can be used to expand the size of a volume or move volumes across availability zones.

Preparing the File System Volumes

First you need to create the EBS volumes (one per storage cluster instance). You can either create the volumes through the Amazon EC2 Management Console or the EC2 Command Line Tools. Remember to create the volumes in the same availability zone that your cluster instances will be running.

Create_volume

Using the EC2 Management Console you can right-click on the EBS volume you created and then select Attach Volume to attach it to any running instance in the same availability zone. I am attaching the EBS volumes as device /dev/sdf1.

Attach_volume

Log in to the instances, format the newly attached devices and mount (you can do a sudo fdisk -l to check that the volume was successfully attached).

$ sudo mkfs -t ext4 /dev/sdf1
$ sudo mkdir /mnt/ebs_disk01
$ sudo mount /dev/sdf1 /mnt/ebs_disk01

Run a sudo mount to make sure that the device was successfully mounted.

$ sudo mount
.
/dev/sdf1 on /mnt/ebs_disk01 type ext4 (rw)

Remember to add the new device to /etc/fstab to automatically mount during a reboot - detailed fstab information can be found in the Ubuntu Community Documentation.

GlusterFS 3.1 Part 3, GlusterFS 3.0.5 Part 3

GlusterFS 3.1 on Amazon EC2: Part 1

I will be setting up a mirrored dual server GlusterFS storage cluster with automatic failover and load balancing. (See Scalable Storage In The Cloud for more information about GlusterFS as well as other storage cluster options.)

Gluster

Start by launching two 64-bit Ubuntu Server 10.04 LTS Amazon EC2 instances (EU Region ami-10794c64). Why Ubuntu on EC2? See Juiced For The Cloud!

Note: If you want to run GlusterFS on 32-bit instances you will have to install from source. Although it is not officially supported, running GlusterFS 3.1 in a 32-bit environment should work just fine.

Install GlusterFS

Log into the instances and execute the following on both. First update Ubuntu.

$ sudo apt-get update
$ sudo apt-get upgrade 

Install GlusterFS 3.1

$ cd /tmp
$ wget http://download.gluster.com/pub/gluster/glusterfs/3.1/3.1.0
       /Ubuntu/glusterfs_3.1.0-1_amd64.deb
$ sudo dpkg -i glusterfs_3.1.0-1_amd64.deb

And start the glusterd daemon.

$ sudo /etc/init.d/glusterd start
 * Starting glusterd service glusterd
 ...done.

Create a Trusted Storage Pool

Before configuring a GlusterFS volume, you need to create a trusted storage pool consisting of the storage servers that will comprise the volume.

A storage pool is a trusted network of storage servers. When you start the first server, the storage pool consists of that server alone. To add additional storage servers to the storage pool, you can use the probe command from a storage server that is already trusted.

From the first server, probe the second server you want to add to the storage pool.

$ sudo gluster peer probe 10.235.6.233
Probe successful 

Verify the peer status from the first server.

$ sudo gluster peer status
Number of Peers: 1
Hostname: 10.235.6.233
Uuid: 99097fe8-2ec5-45cc-bfcf-0e34284d7f24
State: Peer in Cluster (Connected)

In Part 2 I will look at creating and attaching Elastic Block Store (EBS) volumes before configuring GlusterFS Distributed Replicated volumes.

Scalable Storage In The Cloud

Updated: October 14, 2010 (Original Post: August 13, 2010)

On October 12, 2010 Gluster announced the general availability of GlusterFS 3.1, adding a lot of new capabilities. This post is now updated to reflect the 3.1 changes. I will also publish a new series detailing the installation of GlusterFS 3.1 on Amazon EC2 (the GlusterFS on Amazon EC2 Part 1 to 4 series published in August installs version 3.0.5).

One of the challenges when moving to the Cloud is provisioning a highly available and scalable storage (NAS) solution. A file storage solution that must be easily accessible from transient server clusters, in our scenario an auto scalable application cluster. The fact that the application server instances are ephemeral in nature, effectively means that you can not rely on the local instance stores for any files / data that you want to persist.

I started by looking at the classic highly available Linux file server solution, namely NFS server used in combination with DRBD and Heartbeat. From my experience however, and also reading about what other Architects and Engineers are doing, I came to the conclusion that NFS and the Cloud is not a natural fit. Therefore I started researching alternative distributed file system solutions allowing for scalable storage.

From all the options I looked at (Ceph, Lustre, MooseFS, etc) GlusterFS stood out as the best product relative to our requirements, i.e. (in order of priority):

  • High Availability
  • Ease of Use
  • Performance
  • Scalability

Gluster

GlusterFS Architecture

One of the main differences between GlusterFS and other distributed file systems is the unique architecture that eliminates the need for a metadata server. This fundamental shift in distributed file system design simplifies server setup, removes a potential bottleneck and also eliminates a central point of failure. Deployment is a breeze and you can literally get a petabyte-scale storage solution up and running in less than an hour.

Gluster does not depend on metadata in any way, instead it generates the equivalent information on-the-fly using an algorithm, the Gluster Elastic Hash. The results of these calculations are dynamic values acquired whenever needed in each of the nodes of a Gluster deployment. The algorithms are universal and omnipresent across the distributed architecture and therefore cannot be out of sync. Impressive in theory and we are about to find out how it performs in production.

Other Features and Benefits

  • Data mirroring & replication (availability)
  • Real time self-healing (availability)
  • Volume failover (availability)
  • Automatic load balancing (performance)
  • Stripe files across storage blocks (performance)
  • Scales to hundreds of petabytes (scalability)

What's new in Gluster 3.1?

(From the Gluster product page.)
Gluster Storage Platform v3.1 introduces new capabilities that ensure storage is a fit with the modern data center that needs to scale on demand, is highly virtualized, requires large scale automation, and is increasingly deployed in the cloud. These capabilities can be categorized as follows:

Gluster Elastic Volume Manager
Storage volumes are abstracted from the underlying hardware and can grow, shrink, or be migrated across physical systems as necessary. Storage servers can be added or removed on-the-fly with data automatically rebalanced across the cluster. Data is always online and and there is no application downtime. File system configuration changes are accepted at runtime and propagated throughout the cluster allowing changes to be made dynamically as workloads fluctuate or for performance tuning.

Gluster Console Manager
The Command Line Interface (CLI), Application Programming Interface (API) and shell are merged into a single powerful interface, enabling automation by giving the CLI higher level API’s and scripting capabilities. Languages such as Python, Ruby or PHP can be used to script a series of commands that are invoked through the command line. This new tool requires no new APIs and is able to script out and rapidly automate any information inserted in the CLI allowing cloud administrators the ability to simply automate large scale operations.

Native NFS
Gluster now includes a native NFS v3 module which allows storage servers to communicate natively with NFS clients directly to any storage server in the cluster and simultaneously communicates NFS and the Gluster protocol. NFS requires no specialized training, making it simple and easy to deploy. NFS is suitable for most workloads including virtual machine support. Workloads that are highly parallelized should still use the Gluster native protocol where all clients can communicate with any server in the cluster.

Ceph-logo1

Ceph

Another distributed network file system that I will be watching closely as it matures is Ceph. Together with GlusterFS, Ceph was included as a standard component with Ubuntu Server Edition 10.10. The Ceph client is also part of the Linux kernel since 2.6.34.

At the time of this writing Ceph is not yet ready for production. Ceph is also more complex to install and maintain than GlusterFS.

Aws_logo

Why not Amazon S3?

As we are running in the Amazon Cloud, we will be using S3 in combination with our Gluster Storage Cluster. Because of our phase one forklift approach we want to keep the changes to our existing code to a minimum. Files on the Storage Cluster can easily be accessed by our existing code through a mount on the application servers. Accessing files on S3 through the API will obviously require code changes (and a S3 lock-in). I realize that S3 buckets can also be mounted through an available FUSE module and although I haven't tested it yet, I seriously doubt if a S3 FUSE mount will meet our performance requirements (will test at a later stage and provide feedback via this blog).

Therefore, to begin with we will mainly be using S3 for backups and potentially our static web application content to improve performance through the global network of edge locations provided by the Amazon CloudFront CDN.

In my following post I will look at the details of installing GlusterFS on Amazon EC2 (GlusterFS 3.1, GlusterFS 3.05).

Cloud Security Best Practices

From the Amazon Paper - Architecting for the Cloud: Best Practices

In a multi-tenant environment, cloud architects often express concerns about security. Security should be implemented in every layer of the cloud application architecture. Physical security is typically handled by your service provider, which is an additional benefit of using the cloud. Network and application-level security is your responsibility and you should implement the best practices as applicable to your business.

Protect your data in transit

If you need to exchange sensitive or confidential information between a browser and a web server, configure SSL on your server instance. You’ll need a certificate from an external certification authority like VeriSign or Entrust. The public key included in the certificate authenticates your server to the browser and serves as the basis for creating the shared session key used to encrypt the data in both directions.

Create a Virtual Private Cloud by making a few command line calls (using Amazon VPC). This will enable you to use your own logically isolated resources within the AWS cloud, and then connect those resources directly to your own datacenter using industry-standard encrypted IPSec VPN connections.

You can also setup an OpenVPN server on an Amazon EC2 instance and install the OpenVPN client on all user PCs.

Protect your data at rest

If you are concerned about storing sensitive and confidential data in the cloud, you should encrypt the data (individual files) before uploading it to the cloud. For example, encrypt the data using any open source or commercial PGP-based tools before storing it as Amazon S3 objects and decrypt it after download. This is often a good practice when building HIPPA-Compliant applications that need to store Protected Health Information (PHI).

On Amazon EC2, file encryption depends on the operating system. Amazon EC2 instances running Windows can use the built-in Encrypting File System (EFS) feature. This feature will handle the encryption and decryption of files and folders automatically and make the process transparent to the users. However, despite its name, EFS doesn’t encrypt the entire file system; instead, it encrypts individual files. If you need a full encrypted volume, consider using the open-source TrueCrypt product; this will integrate very well with NTFS-formatted EBS volumes. Amazon EC2 instances running Linux can mount EBS volumes using encrypted file systems using variety of approaches (EncFS, Loop-AES, dm-crypt, TrueCrypt). Likewise, Amazon EC2 instances running OpenSolaris can take advantage of ZFS Encryption Support. Regardless of which approach you choose, encrypting files and volumes in Amazon EC2 helps protect files and log data so that only the users and processes on the server can see the data in clear text, but anything or anyone outside the server see only encrypted data.

No matter which operating system or technology you choose, encrypting data at rest presents a challenge: managing the keys used to encrypt the data. If you lose the keys, you will lose your data forever and if your keys become compromised, the data may be at risk. Therefore, be sure to study the key management capabilities of any products you choose and establish a procedure that minimizes the risk of losing keys.

Besides protecting your data from eavesdropping, also consider how to protect it from disaster. Take periodic snapshots of Amazon EBS volumes to ensure it is highly durable and available. Snapshots are incremental in nature and stored on Amazon S3 (separate geo-location) and can be restored back with a few clicks or command line calls.

Protect your AWS credentials

AWS supplies two types of security credentials: AWS access keys and X.509 certificates. Your AWS access key has two parts: your access key ID and your secret access key. When using the REST or Query API, you have to use your secret access key to calculate a signature to include in your request for authentication. To prevent in-flight tampering, all requests should be sent over HTTPS.

If your Amazon Machine Image (AMI) is running processes that need to communicate with other AWS web services (for polling the Amazon SQS queue or for reading objects from Amazon S3, for example), one common design mistake is embedding the AWS credentials in the AMI. Instead of embedding the credentials, they should be passed in as arguments during launch and encrypted before being sent over the wire.

If your secret access key becomes compromised, you should obtain a new one by rotating to a new access key ID. As a good practice, it is recommended that you incorporate a key rotation mechanism into your application architecture so that you can use it on a regular basis or occasionally (when disgruntled employee leaves the company) to ensure compromised keys can’t last forever.

Alternately, you can use X.509 certificates for authentication to certain AWS services. The certificate file contains your public key in a base64-encoded DER certificate body. A separate file contains the corresponding base64-encoded PKCS#8 private key.

AWS supports multi-factor authentication as an additional protector for working with your account information on aws.amazon.com and AWS Management Console.

Secure your Application

Every Amazon EC2 instance is protected by one or more security groups, named sets of rules that specify which ingress (i.e., incoming) network traffic should be delivered to your instance. You can specify TCP and UDP ports, ICMP types and codes, and source addresses. Security groups give you basic firewall-like protection for running instances.

Another way to restrict incoming traffic is to configure software-based firewalls on your instances. Windows instances can use the built-in firewall. Linux instances can use netfilter and iptables.

Over time, errors in software are discovered and require patches to fix. You should ensure the following basic guidelines to maximize security of your application:

  • Regularly download patches from the vendor's web site and update your AMIs
  • Redeploy instances from the new AMIs and test your applications to ensure the patches don't break anything.
  • Ensure that the latest AMI is deployed across all instances
  • Invest in test scripts so that you can run security checks periodically and automate the process
  • Ensure that the third-party software is configured to the most secure settings
  • Never run your processes as root or Administrator login unless absolutely necessary

All the standard security practices pre-cloud era like adopting good coding practices, isolating sensitive data are still applicable and should be implemented.

In retrospect, the cloud abstracts the complexity of the physical security from you and gives you the control through tools and features so that you can secure your application.

MySQL In The Cloud: Part 2

Updated: October 5, 2010 (Original Post: October 4, 2010)
The day after my original post Amazon released Read Replicas. Post updated accordingly.

In Part 1 I looked at the three phases of scaling MySQL in the Cloud using a shared nothing architecture. In Part 2 I will show you how you can easily deploy a highly available and scalable MySQL solution using Amazon's Relational Database Service (RDS).

Amazon Relational Database Service (RDS)

Amazon RDS makes it easy to set up, operate, and scale a relational database in the cloud. RDS is a Database as a Service (DaaS) solution that gives you access to the full capabilities of a MySQL 5.1 database running on your own database instance.

Potentially a great solution if you want the features and capabilities of a relational database, or if you want to migrate your existing LAMP software stack to the Cloud. All your code, applications, and tools that you are using today with your existing MySQL databases will work seamlessly with Amazon RDS.

Amazon RDS Highlights

  • Simple to Deploy. Amazon RDS makes it easy to deploy a production-ready relational database without worrying about infrastructure provisioning or installing and maintaining database software.
  • Automated Backups. The automated backup feature enables point-in-time recovery for your DB Instance. Amazon RDS will backup your database and transaction logs and store both for a user-specified retention period. This allows you to restore your DB Instance to any second during your retention period, up to the last five minutes. Your automatic backup retention period can be configured to up to eight days.
  • Snapshots. DB Snapshots are user-initiated backups of your DB Instance. These full database backups will be stored by Amazon RDS until you explicitly delete them. You can create a new DB Instance from a DB Snapshot whenever you desire.
  • Patch Management. Amazon RDS automatically handles database patch management.
  • Scale Up. Easily scale from a minimum 1.7 GB memory, 1 ECU DB Instance up to a maximum Instance with 64 GB of memory and 26 ECUs.
  • Scale Out (Phase One). When you create your DB Instance to run as a Multi-AZ deployment, Amazon RDS will automatically provision and maintain a synchronous “standby” replica in a different Availability Zone. In the event of planned database maintenance or unplanned service disruption, Amazon RDS will automatically failover to the up-to-date standby so that database operations can resume quickly without administrative intervention.
  • Scale Out (Phase Two). Read Replicas makes it easy to elastically scale out beyond the capacity constraints of a single DB Instance for read-heavy database workloads. You can create one or more replicas of a given source DB Instance and serve high-volume application read traffic from multiple copies of your data, thereby increasing aggregate read throughput.

Launching a MySQL DB Instance

Let's take a quick look at how simple it is to launch a new MySQL DB Instance on RDS. Open the Amazon RDS Management Console and click on Launch DB Instance. This will start the Launch DB Instance Wizard.

Rds_launch1

As you can see, creating a Phase One standby server is as simple as setting Multi-AZ Deployment to Yes. Select the size of your DB Instance and fill in the storage to be allocated (up to 1TB).

Rds_launch2

Enter your Database Name and select an Availability Zone as well as a Security Group.

Rds_launch3

Backup Retention Period. The automated backup feature of Amazon RDS enables point-in-time recovery of your DB Instance. When automated backups are turned on for your DB Instance (retention period set to 1-8 days), Amazon RDS automatically performs a full daily backup of your data (during your preferred backup window) and captures transaction logs (as updates to your DB Instance are made). When you initiate a point-in-time recovery, transaction logs are applied to the most appropriate daily backup in order to restore your DB Instance to the specific time you requested.

Amazon RDS retains backups of a DB Instance for a limited, user-specified period of time called the retention period, which by default is one day but can be set to up to eight days. You can initiate a point-in-time restore and specify any second during your retention period, up to the Latest Restorable Time, which is typically within the last five minutes.

Backup Window. The preferred backup window is the user-defined period of time during which your DB Instance is backed up. During the backup window, storage I/O may be suspended while your data is being backed up. This I/O suspension typically lasts a few minutes at most.

Note: This I/O suspension is avoided with Multi-AZ DB deployments, since the backup is taken from the standby.

Maintenance Window. Setting the maintenance window gives you an opportunity to control when DB Instance modifications (such as scaling DB Instance class that usually takes only a few minutes) and software patching (security and durability related patches) occur, in the event either are requested or required. If a maintenance event is scheduled for a given week, it will be initiated and completed at some point during the four hour maintenance window you identify.

Note: Running your DB Instance as a Multi-AZ deployment can further reduce the impact of a maintenance event, as Amazon RDS will conduct maintenance via the following steps:

  1. Perform maintenance on standby
  2. Promote standby to primary
  3. Perform maintenance on old primary, which becomes the new standby.

Rds_launch4

Finally review the information details and click Launch DB Instance. After only a few minutes your primary and standby servers will be up and running.

Creating a Read Replica

On the Amazon RDS Management Console, first select the DB Instance you wish to replicate, and then click on the Create Read Replica button. This will open the following dialog.

Read_replica

Specify your new DB Instance identifier (the replica) as well as the new instance class. Click on Create and you will have a replicated DB Instance up and running within minutes.

About Read Replicas

Amazon RDS uses MySQL 5.1’s built-in replication to propagate changes made to a source DB Instance to Read Replicas. Updates are applied to Read Replicas after they occur on the source DB Instance (asynchronous replication), and replication lag can vary significantly (replication used by Multi-AZ deployments is synchronous, meaning that all database writes are concurrent on the primary and standby).

See Fighting MySQL Replication Lag and Managing Slave Lag with MySQL Replication for help with managing the replication lag. If you want to measure replication lag, Maatkit has a great tool for real latency measurement.

For read transactions where you cannot afford any chance that you are not working with the most up to date data, you will have to read from your Primary DB Instance.