master-master – Life Scaling

Maintaining a MySQL high availablity cluster is one of the first missions encountered when scaling web applications. Very quickly your application gets to the point where the one database machine you have is not enough to handle the load, and you need to make sure that when failure happens (and it always happens), your cluster is ready to failover gracefully.

Some basic MySQL replication paradigms

MySQL master-slave replication was one of the first architectures used for failovers. The rationale is that if a master fails. a slave can be promoted to master, and start handle the writes. For this you could use several combinations of ip tools and monitoring software, for example, iproute and nagios, or heartbeat and mon.

However, master-slave architecure for MySQL replication has several flaws, most notable are:

The need to manually take care of bringing the relegated master back to life, as a slave to the now-promoted master (this can be scripted, but usually contains many pitfalls when trying to automate as a script).
The possibility of failover during a crash, which can result in same transaction being committed both on the old master and the new master. Good luck then, when trying to bring back the master as a slave. You’ll most likely get some duplicate key failure because of auto increments on the last transaction when starting replication again, and then the whole database on the relegated master is useless.
The inablity to switch roles quickly. Say the master is on a better machine than the slave, and now there was a failover. How can you easily restore the situation the way it was before, with the master being on the better machine? Double the headache.

Along came master-master architecture, which in essence is an architecture which keeps two live masters at all times, with one being a hot standby for the other, and switching between them is painless. (Baron Schwartz has a very interesting post about why referring to master-master replication as a “hot” standby could be dangerous, but this is out of the scope of this post). One of the important things that lies in the bottom of this paradigm, is that every master works in its own scope in regards to auto-increment keys, thanks to the configuration settings auto_increment_increment and auto_increment_offset. For example, say you have two masters, call them db1 and db2, then db1 works on the odd auto-increments, and db2 on the even auto-increments. Thus the problem of duplication on auto increment keys is avoided.

Managing master-master replication with MMM

Master-master replication is easily managed by a great piece of perl code called MMM. Written initially by Alexey Kovyrin, and now maintained and being installed daily in production environments by percona, MMM is a management daemon for master-master clusters. It’s a convenient and reliable cluster management tool, which simplifies switching roles between machines in the cluster, takes care of their monitoring and health issues, and prevents you from doing stupid mistakes (like making an offline server an active master…).

And now comes the hard, EC2 part. Managing high availability MySQL clusters is always based on the ability to control machines ip addresses in the internal network. You set up a floating internal ip address for each of the masters, configure your router to handle these addresses, and you’re done. When the time comes to failover, the passive master sends ARP and takes over the active master’s ip address, and everything swtiches smoothly. It’s just that on EC2, the router part or any internal ip address can not be determined by you (I was told FlexiScale gives you the option to control ip addresses on the internal network, but I never got to testing it).

So how can we use MMM on EC2 for master-master replication?

One way is to try using EC2’s elastic ip feature. The problem with it, is that currently moving an elastic ip address from one instance to another, takes several minutes. Imagine a failover from active master to passive master in which you would have to wait several minutes for the application to respond again — not acceptable.

Another way is to use name resolving instead of ip addresses. This has major drawbacks, especially the non reliable nature of the DNS. But it seems to work if you can set up a name resolver that serves your application alone, and use the MMM ns_agent contribution I wrote. You can check out MMM source, and install the ns_agent module according to contrib/ns_agent/README.

I am happy to say that I currently have a testing cluster set up on EC2 using this feature, and up until now it worked as expected, with the exception of several false-positive master switching due to routing issues (ping failed). Any questions or comments on the issue are welcome, and you can also post to the devolpement group.