Apache mod_ssl Makes Your Clients Crawl Compared To Nginx SSL Module

The SSL/TLS process is a heavy one, it involves algorithm negotiation between client and server, key exchanges, cyphering, decyphering and authentication. But what’s surprising is, that the server you’re connecting to can directly influence the performance of your client and its CPU consumption.

I had a php command line process spawning child processes and connecting through SSL to a web server, in 2 scenarios. The first scenario was to an out of the box Apache httpd server with mod_ssl, and the second scenario was to an out of the box Nginx with the SSL module. Both were using the exact same box, and were “out of the box” meaning I used the default configuration for both.

In the first scenario I was able to spawn no more than 6 (!) php processes before the box running them began to show load, and the CPU queue started to fill up. Each php child was taking between 15%-30% cpu at any given moment.

In the second scenario, I was able to spawn 40 (!!) php child processes without the box being loaded. Each php child was taking around 1.5% cpu.

I’m no SSL expert, and there might be a way to configure Apache to inflict less load on the connecting client. There is also SSLSessionCache which might relieve load from both the server and the client. But the “out of the box” configuration shows that Nginx is a real winner again.

If you can, avoid SSL altogether. If not, terminate it at a front-end before proceeding to Apache.

Connecting Several PHP Processes To The Same MySQL Transaction

The following concept is still under examination, but my initial tests proved successful, so I thought it’s time to share.

Here is a problem: There is a batch job reading remote data through XML-RPC, and updating a local MySQL database according to the XML-RPC responses. The db is InnoDB, and the entire batch job should be transacted. That is, in any case of failure, there should be rollback, and on success there should be commit.

So, the simple way is of course a single process script that uses a linear workflow:

START TRANSACTION locally.
Make XML-RPC request, fetch data.
Insert data into db as needed.
Repeat 2-3 until Error or Finish.
If Error ROLLBACK, if Finish COMMIT.

This works, but you may notice a bottleneck, being the XML-RPC request. It’s using http, and it’s connecting to a remote server. Sometimes the XML-RPC server also takes time to perform the work that generates the response. Add the network latency, and you get a single process that most of the time sits idle and waits for response.

So if we have a process that just sits and waits most of the time, let’s spread its work over several processes, and assume that while most of the processes will be waiting, at least one can be free to deal with the local database. This way we will get maximum utilization of our resources.

So the multi-process workflow:

START TRANSACTION locally.
Fork children as necessary.
From child, make XML-RPC request, fetch data.
From child, acquire database access through semaphore.
From child, insert data into db as needed.
From child, release database access through semaphore.
From child, repeat 3-6 until Error or Finish.
From parent, monitor children until Error or Finish.
From parent, if Error ROLLBACK, if FINISH COMMIT.

Now, the workflow seems all and well in theory, but can it work in practice? Can we connect to the same transaction from several different PHP processes?

I was surprised to find out that the answer is positive. As long as all processes share the same connection resource, they all use the same connection. And in MySQL, the same connection means the same transaction, given that a transaction was started and not yet committed or rolled back (either explicitly or implictly).

The secret is to create the connection resource with the parent, and when forking children, they have a reference to the same connection. The caveat is that they must access the resource atomically, otherwise unexpected behavior occurs (usually the connection hangs, I am guessing that it is when one child tries to read() from the socket and the other to write() to it). So in order to streamline the access to the db connection, we use a semaphore. Each child can access the connection only when it’s available, and it’s blocking if not available.

In the end of the workflow, our parent process acts much like a Transaction Manager in an XA Transaction, and according to what the children report, decides whether to commit or rollback.

Here is a proof of concept code (not tested in this version, but similar code tested and succeeded):

The DBHandler Class

class DBHandler
{
	private $link;
	private $result;
	private $sem;

	const SEMKEY = '123456';

	public function __construct($host, $dbname, $user, $pass, $new_link = false, $client_flags = 0)
	{
		$this->link = mysql_connect($host, $user, $pass, $new_link, $client_flags);
		if (!$this->link)
			throw new Exception ('Could not connect to db. MySQL error was: '. mysql_error());
		$isDb = mysql_select_db($dbname,$this->link);
		if (!$isDb)
			throw new Exception ('Could not select db. MySQL error was: '. mysql_error());
	}

	private function enterSemaphore()
	{
		$this->sem = sem_get(self::SEMKEY,1);
		sem_acquire($this->sem);
	}

	private function exitSemaphore()
	{
		sem_release($this->sem);
	}


	public function query($sql)
	{
		$this->enterSemaphore();

		$this->result = mysql_unbuffered_query($sql, $this->link);
		if (!$this->result)
			throw new Exception ('Could not query: {' . $sql . '}. MySQL error was: '. mysql_error());
		if ($this->result === true)
		{
			// INSERT, UPDATE, etc..., no result set
			$ret = true;
		}
		else
		{
			// SELECT etc..., we have a result set
			$retArray = array();
			while ($row = mysql_fetch_assoc($this->result))
				$retArray[] = $row;
			mysql_free_result($this->result);
			$ret = $retArray;
		}

		$this->exitSemaphore();

		return $ret;
	}

	public function beginTransaction()
	{
		$this->query('SET AUTOCOMMIT = 0');
		$this->query('SET NAMES utf8');
		$this->query('START TRANSACTION');
	}

	public function rollback()
	{
		$this->query('ROLLBACK');
	}

	public function commit()
	{
		$this->query('COMMIT');
	}
}

The Forking Process

$pid = 'initial';
$maxProcs = $argv[1];
if (!$maxProcs)
{
	 $maxProcs = 3;
}
$runningProcs = array(); // will be $runningProcs[pid] = status;
define('PRIORITY_SUCCESS','-20');
define('PRIORITY_FAILURE','-19');

try
{
	$dbh = new DBHandler(DBHOST,DBNAME,DBUSER,DBPASS);

	$dbh->beginTransaction();

		// fork all needed children
		$currentProcs = 0;
		while ( ($pid) && ($currentProcs < $maxProcs))
		{
			$pid = pcntl_fork();
			$currentProcs++;
			$runningProcs[$pid] = 0;
		}

		if ($pid==-1)
		{
			throw new Exception ("fork failed");
		}
		elseif ($pid)
		{
			// parent
			echo "+++ in parent +++n";
			echo "+++ children are: " . implode(",",array_keys($runningProcs)) . "n";

			// wait for children
			// NOTE -- here we do it with priority signaling
			// @TBD -- posix signaling or IPC signaling.
			while (in_array(0,$runningProcs))
			{
				if (in_array(PRIORITY_FAILURE,$runningProcs))
				{
					echo "+++ some child failed, finish waiting for children +++n";
					break;
				}
				foreach ($runningProcs as $child_pid => $status)
				{
					$runningProcs[$child_pid] = pcntl_getpriority($child_pid);
					echo "+++ children status: $child_pid, $status +++n";
				}
				echo "n";
				sleep(1);
			}

			echo "+++ checking if should commit or rollback +++n";
			if (in_array(PRIORITY_FAILURE,$runningProcs) || in_array(0,$runningProcs))
			{
				echo "+++ some child had problem! rollback! +++n";
				$dbh->rollback();
			}
			else
			{
				echo "+++ all my sons successful! committing! +++n";
				$dbh->commit();
			}

			// signal all children to exit
			foreach ($runningProcs as $child_pid => $status)
			{
				echo "+++ killing child $child_pid +++n";
				posix_kill($child_pid,SIGTERM);
			}
		}
		else
		{
			// child
			$mypid = getmypid();
			echo "--- in child $mypid ---n";
			//sleep(1);
			echo "--- child $mypid current priority is " . pcntl_getpriority() . " ---n";

			// NOTE -- following queries do not work, for example only
			$dbh->query("select ...");

			echo "--- child $mypid finished, setting priority to success and halting ---n";
			pcntl_setpriority(PRIORITY_SUCCESS);
			while (true)
			{
				echo "--- child $mypid waiting to be killed ---n";
				sleep(1);
			}
		}

}
catch (Exception $e)
{
	// output error
	print "Error!: " . $e->getMessage() . "n";

	// if parent -- rollback, signal children to exit
	// if child  -- make priority failure to signal
	if ($pid)
	{
		// rollback
		$dbh->rollBack();
		foreach ($runningProcs as $child_pid => $status)
			posix_kill($child_pid,SIGTERM);
	}
	else
	{
		pcntl_setpriority(PRIORITY_FAILURE);
		$mypid = getmypid();
		while (true)
		{
			echo "--- child $mypid waiting to be killed ---n";
			sleep(1);
		}
	}

}

Well, all of this sounds well, and also worked well on a development environment. But it should be taken out of the lab and tested on a production environment. Once I give it a shot, I will update with benchmarks.

PEAR’s Services_Twitter and the Source Option

I recently started using PEAR’s Services_Twitter package (still in beta) written by Joe Stump (from Digg) and David Jean Louis. My natural choice would have been Zend Framework’s Zend_Service_Twitter, but I wasn’t aware that it was out of the incubator by the time I started.

Using and integrating the Services_Twitter package was very easy, and didn’t require much more than copying and pasting the code from the class documentation:

require_once 'Services/Twitter.php';
$username = 'You_Username';
$password = 'Your_Password';

try {
    $twitter = new Services_Twitter($username, $password);
    $msg = $twitter->statuses->update("I'm coding with PEAR right now!");
    print_r($msg); // Should be a SimpleXMLElement structure
} catch (Services_Twitter_Exception $e) {
    echo $e->getMessage();
}

However, once I requested the “source” option from twitter (that nifty thing that says where the tweet was sent via), things got complicated. Apparently, when the Twitter.php factory builds an instance of an API driver, it instantiates a new object, and doesn’t pass it the current factory options, so this didn’t work:

    $twitter = new Services_Twitter($username, $password);
    $twitter->setOptions('source','myapp');
    $msg = $twitter->statuses->update("I'm coding with PEAR right now!");
    print_r($msg); // Should be a SimpleXMLElement structure

The status was updated, but with no source. A short debug and look through the source revealed that factory behavior (not passing the current options to the newly instantiated statuses object). I don’t really see the rationale behind this, so I filed a bug report.

What did work, was instantiating a Twitter_Statuses object, and working with it directly, like this:

    include_once "Services/Twitter/Statuses.php";

    $twitter = new Services_Twitter_Statuses($username, $password);
    $twitter->setOptions('source','myapp');
    $msg = $twitter->update("I'm coding with PEAR right now!");

Now you can Tweet away with source!

PEAR's Services_Twitter and the Source Option