TVinci's Break

It’s always nice to hear about people you personally know who succeed in the world of hi-tech and venture capital. That is the case with the guys over at TVinci, who just secured $1.6M in funding from angels Zohar Gilon and Ron Tamir.

Although their blog is down (probably due to the massive techcrunch, vccafe and techaviv coverage), TVinci offers something which is usually overlooked on product development – a full blown user experience. With background in creative solutions development for UI at frido, the team that founded TVinci is all about user experience and user interface.

Sometimes a product succeeds because it is groundbreaking, a completely new idea or a phenomenon that appeals to a large target audience and is viral in its essence (did somebody say twitter?). But sometimes, it is enough to take something very basic, give it a different angle or a never-before added value,  and come out with a winner product.

And I think this is the case with TVinci. They take a very basic service, which is video consumption, and they turn it into a turnkey solution for the publisher on the one hand, and a full blown video experience for the consumer on the other hand. They’ve already done so on Reshet (an Israeli TV broadcaster) MTV Israel, MTV Poland and Orange Israel.

I am not fully aware of all competition in this space of turnkey video solutions for media organizations (qumu might be one), but TVinci sure looks like a very appealing product which maximizes the video consumption experience of the viewers and thus helps publishers retain and engage their customers more effectively.

I am not sure where they are going to direct this round of funding, but wherever they do, I am sure they can take their product sky high.

Congratulations to Ofer (and Moran)!

TVinci’s Break

It’s always nice to hear about people you personally know who succeed in the world of hi-tech and venture capital. That is the case with the guys over at TVinci, who just secured $1.6M in funding from angels Zohar Gilon and Ron Tamir.

Although their blog is down (probably due to the massive techcrunch, vccafe and techaviv coverage), TVinci offers something which is usually overlooked on product development – a full blown user experience. With background in creative solutions development for UI at frido, the team that founded TVinci is all about user experience and user interface.

Sometimes a product succeeds because it is groundbreaking, a completely new idea or a phenomenon that appeals to a large target audience and is viral in its essence (did somebody say twitter?). But sometimes, it is enough to take something very basic, give it a different angle or a never-before added value,  and come out with a winner product.

And I think this is the case with TVinci. They take a very basic service, which is video consumption, and they turn it into a turnkey solution for the publisher on the one hand, and a full blown video experience for the consumer on the other hand. They’ve already done so on Reshet (an Israeli TV broadcaster) MTV Israel, MTV Poland and Orange Israel.

I am not fully aware of all competition in this space of turnkey video solutions for media organizations (qumu might be one), but TVinci sure looks like a very appealing product which maximizes the video consumption experience of the viewers and thus helps publishers retain and engage their customers more effectively.

I am not sure where they are going to direct this round of funding, but wherever they do, I am sure they can take their product sky high.

Congratulations to Ofer (and Moran)!

Memcached Storage Class for Zend_OpenId_Provider

I experimented a bit with creating an OpenID provider entity using Zend_OpenId_Provider. It was not a hard task to implement, but seeing that the default storage class is file based made me shiver. There are two reasons why I hate anything to do with local disk access:

  • It’s s-l-o-w-w-w. Disk I/O is the pitfall of performance for web applications. Avoid when possible.
  • It’s usually not fitting for clustered environments. If you have a cluster of application servers (running php for example), and you are using disk access, it will only update the disk on the application node you were directed to in the current request. On the next request to the application, you might not be directed to the same application server (if the load balancing is not ip-hashed or session based). Of course this is not always the case – sometimes there’s a network storage, sometimes several directories can be rsynced across the cluster — but as a rule of thumb, local disk access is not good for clustered environments.

So the obvious thing when implementing an OpenID provider using Zend Framework is to change the default Storage class, and use a storage that’s not a traditional filesystem. Before jumping into using a MySQL backend for this, and coming up with a full blown OpenID provider, I needed something quick that will replace disk storage, but will also work on a clustered environment. So it was really natural to turn to memcached.

I am not sure that using memcached as a final storage engine for an OpenID provider is really a good call. Caches expire, keys are being purged, and whole memcached nodes can evaporate. However, it might fit a provider that is not a full blown OpenID service. If you can find a way to addUser() to the storage every time before a user starts an authentication attempt (and it’s not that difficult, considering the 10-stage authentication process), and if you can handle associations and other info being deleted from time to time (and if your users can handle it…) — memcached storage can be what you need.

In any case, even if just for testing purposes, here’s a memcached storage class for Zend Framework’s OpenID Provider I wrote (it’s a plain text file, apologies for the doc/msword file type).

MMM (Mysql Master-Master Replication) on EC2

Maintaining a MySQL high availablity cluster is one of the first missions encountered when scaling web applications. Very quickly your application gets to the point where the one database machine you have is not enough to handle the load, and you need to make sure that when failure happens (and it always happens), your cluster is ready to failover gracefully.

Some basic MySQL replication paradigms

MySQL master-slave replication was one of the first architectures used for failovers. The rationale is that if a master fails. a slave can be promoted to master, and start handle the writes. For this you could use several combinations of ip tools and monitoring software, for example, iproute and nagios, or heartbeat and mon.

However, master-slave architecure for MySQL replication has several flaws, most notable are:

  • The need to manually take care of bringing the relegated master back to life, as a slave to the now-promoted master (this can be scripted, but usually contains many pitfalls when trying to automate as a script).
  • The possibility of failover during a crash, which can result in same transaction being committed both on the old master and the new master. Good luck then, when trying to bring back the master as a slave. You’ll most likely get some duplicate key failure because of auto increments on the last transaction when starting replication again, and then the whole database on the relegated master is useless.
  • The inablity to switch roles quickly. Say the master is on a better machine than the slave, and now there was a failover. How can you easily restore the situation the way it was before, with the master being on the better machine? Double the headache.

Along came master-master architecture, which in essence is an architecture which keeps two live masters at all times, with one being a hot standby for the other, and switching between them is painless. (Baron Schwartz has a very interesting post about why referring to master-master replication as a “hot” standby could be dangerous, but this is out of the scope of this post). One of the important things that lies in the bottom of this paradigm, is that every master works in its own scope in regards to auto-increment keys, thanks to the configuration settings auto_increment_increment and auto_increment_offset. For example, say you have two masters, call them db1 and db2, then db1 works on the odd auto-increments, and db2 on the even auto-increments. Thus the problem of duplication on auto increment keys is avoided.

Managing master-master replication with MMM

Master-master replication is easily managed by a great piece of perl code called MMM. Written initially by Alexey Kovyrin, and now maintained and being installed daily in production environments by percona, MMM is a management daemon for master-master clusters. It’s a convenient and reliable cluster management tool, which simplifies switching roles between machines in the cluster, takes care of their monitoring and health issues, and prevents you from doing stupid mistakes (like making an offline server an active master…).

And now comes the hard, EC2 part. Managing high availability MySQL clusters is always based on the ability to control machines ip addresses in the internal network. You set up a floating internal ip address for each of the masters, configure your router to handle these addresses, and you’re done. When the time comes to failover, the passive master sends ARP and takes over the active master’s ip address, and everything swtiches smoothly. It’s just that on EC2, the router part or any internal ip address can not be determined by you (I was told FlexiScale gives you the option to control ip addresses on the internal network, but I never got to testing it).

So how can we use MMM on EC2 for master-master replication?

One way is to try using EC2’s elastic ip feature. The problem with it, is that currently moving an elastic ip address from one instance to another, takes several minutes. Imagine a failover from active master to passive master in which you would have to wait several minutes for the application to respond again — not acceptable.

Another way is to use name resolving instead of ip addresses. This has major drawbacks, especially the non reliable nature of the DNS. But it seems to work if you can set up a name resolver that serves your application alone, and use the MMM ns_agent contribution I wrote. You can check out MMM source, and install the ns_agent module according to contrib/ns_agent/README.

I am happy to say that I currently have a testing cluster set up on EC2 using this feature, and up until now it worked as expected, with the exception of several false-positive master switching due to routing issues (ping failed). Any questions or comments on the issue are welcome, and you can also post to the devolpement group.

Blackhole Name Servers

If you are running a name server that’s serving your application or inner network in some way, and you start seeing a slowdown in reverse name resolution, you should check your logs (or if no name server logs, you can tcpdump port 53), and search for requests to BLACKHOLE-1.IANA.ORG (192.175.48.6) or BLACKHOLE-2.IANA.ORG (192.175.48.42).

When I saw these for the first time I thought it was some Chris Cornell Joke.

If you’re seeing these and experience a slowdown, you have a problem — your name server is recursing and trying to resolve addresses in the reserved private space, instead of replying with an authoritative answer, or at least replying with a redirection.

There are 2 solutions (assuming you are using bind):

  1. Configure your name server to be authoritative for the reserved space:
    In /etc/named.conf:

    zone “0.0.10.in-addr.arpa” {
    type master;
    file “/var/named/0.0.10.in-addr.arpa.zone”;
    };

    And in the zone file /var/named/10.in-addr.arpa.zone, if for example you want 10.0.0.3 to resolve to web.example.com:

    $TTL 14400
    @ IN SOA ns1.example.com. admin.example.com. (
    2009012501;
    28800;
    604800;
    604800;
    86400
    )
    
    IN NS ns1.example.com.
    3 IN PTR web.example.com
  2. If you know (or can assume) there’s a name server along the way that is configured to reply authoritatively for these queries, configure your name server to not perform recursion. This way it replies to the query with “I don’t know who’s 10.0.0.3, go look for yourself, here’s a hint”.In /etc/named.conf, add in options context:
    recursion no;

Since there was indeed a name server configured properly to reply for all the 10.0.0.0/8 addresses in my network, and I only configured the inner name server to reply for what the application needed, adding the no recursion option solved the problem in my case.

By the way, adding “recursion no” to a name server that is only there to serve some specific application need is good practice both security-wise and performance-wise.

Oh, and here’s what IANA have to say about the blackhole servers. Creepy.

Upgrading Mootools 1.11 to 1.2 Is Between Hard and Impossible

When it comes to client side, Javascript frameworks are one of the first things that accelerates your development of web applications. Before knowing about the existence of Javascript frameworks, I was coding raw Javascript, in strict functional programming, trying to solve the never ending war of cross-browser compatibility with every new line of code.

And then came along frameworks like Scriptaculous, JQuery, Dojo, and Mootools. These frameworks all tackle the everyday problems of Javascript (cross browser issues being the most important part), build and force an object oriented way of programming Javascript, and create easy to use classes for complicated tasks (like evaluating JSON responses from XmlHttpRequests). On top of that, they add cool and customizable graphics and effects for web UI.

History has its way of determining what framework you eventually use. They are all really similar, and when you’re just in the stage of picking your framework, it usually comes down to the one that has an example in its documentation that is most relevant to your current problem and seems that it solves it with the easiest code. Sometimes though, it’s just a matter of what framework is showcased better, and which has the coolest graphics.

In my case, it was Mootools, and I really can’t remember why. I started using it when it was in version 1.11, and I have been truely happy with it (except for a really nasty bug with https protocol and Internet Explorer). Mootools by version 1.11 was already a robust library, and didn’t require much more development. But of course that open source projects progress and develop, and have to catch up with new browsers and new technologies, and then Mootools released version 1.2.

It’s been already 6 months or so since it was released, and I still haven’t upgraded. And it’s not that I didn’t try — I have tried already 3 times I think — but the upgrade process from 1.11 to 1.2 is probably the hardest upgrade I’ve ever encountered. Version 1.2 is not backward compatible with version 1.11. There are compatibility packages, and several attempts of the community to build on top of these packages, but they never seem to cover all the compatibility needed. There are always a few lines of code you wrote using 1.11, that even with cmpatiblity packages, 1.2 will just break on.

Now don’t get me wrong, I am addicted to Mootools. They say that its core is stable, and they’re right. I just hope that I will never be forced to try again to upgrade to 1.2, or that there will be an easy drop-in way to get code written for 1.11 run on 1.2.

Nginx and Weird "400 Bad Request" Responses

Most of the LAMP clusters I deal with on a daily basis use the same basic stack – an nginx at the front as a load balancer proxy, apache as the application server (running php), and mysql as the backend. I just realized that this stack might as well be called LNAMP or LAMPN, since nginx plays a big part in it.

Problem with LAMPN, is that it sometimes takes its new stack name seriously, and starts limpin’. Nginx is an excellent front end – it’s highly configurable and highly efficient. But, it has a nasty habit of not letting your requests go through to the application servers if the request is hard for it to understand. Instead, it returns a “400 Bad Request” and dies.

So what was a request that was “hard to understand” in my case? It was a request with a very large “Cookie” header. If the cookies size of a certain domain sum up to more than large_client_header_buffers, your request is simply rejected by nginx. The default for large_client_header_buffers is:

large_client_header_buffers 4 4k
/* 4k being the page size of the system, can be any size depending on OS */

The cookie header sent from my browser to the domain I tried to access was 4.4k in size, larger than the default size, so nginx flipped. I read it is only a Firefox issue (does IE split the Cookie to chunks? Is it even possible to send multiple chunks of a Cookie header?), but it might as well happen with other browsers and different requests. To solve the problem, the following setting in the http context will solve your problem:

large_client_header_buffers 4 8k

Actually, nginx documentation mentions this problem, but it’s one of those default settings you think to yourself you’ll never have to change. Well, apparently, sometimes you do.

Nginx and Weird “400 Bad Request” Responses

Most of the LAMP clusters I deal with on a daily basis use the same basic stack – an nginx at the front as a load balancer proxy, apache as the application server (running php), and mysql as the backend. I just realized that this stack might as well be called LNAMP or LAMPN, since nginx plays a big part in it.

Problem with LAMPN, is that it sometimes takes its new stack name seriously, and starts limpin’. Nginx is an excellent front end – it’s highly configurable and highly efficient. But, it has a nasty habit of not letting your requests go through to the application servers if the request is hard for it to understand. Instead, it returns a “400 Bad Request” and dies.

So what was a request that was “hard to understand” in my case? It was a request with a very large “Cookie” header. If the cookies size of a certain domain sum up to more than large_client_header_buffers, your request is simply rejected by nginx. The default for large_client_header_buffers is:

large_client_header_buffers 4 4k 
/* 4k being the page size of the system, can be any size depending on OS */

The cookie header sent from my browser to the domain I tried to access was 4.4k in size, larger than the default size, so nginx flipped. I read it is only a Firefox issue (does IE split the Cookie to chunks? Is it even possible to send multiple chunks of a Cookie header?), but it might as well happen with other browsers and different requests. To solve the problem, the following setting in the http context will solve your problem:

large_client_header_buffers 4 8k

Actually, nginx documentation mentions this problem, but it’s one of those default settings you think to yourself you’ll never have to change. Well, apparently, sometimes you do.

MySQL off-site replication and timezones

Recently we were doing some testing on a mysql slave server which was located at an off-site location from the master server. While the on-site slave was having no problems replicating, the off-site slave would once a day break replication on a duplicate entry error because of a unique key insert problem.

This was very weird, especially because the on-site slave was having no trouble at all. After some brainstorming aided by percona, we realized the cause. The off-site slave was in a different timezone than the other master and slave. This fact together with the fact that we had a unique key that contained curdate(), caused the following scenario:

  • On January 3, 23:57, there was an insert on table t, unique key was 2009-01-03.
  • On January 4, 00:01, there was an insert on table t, unique key was 2009-01-04

No problem, but — times are replicated as timestamps, so on the off-site slave, which was 1 hour ahead (EST instead of CST):

  • First insert was replicated as January 4, 00:57, insert on table t, unique key 2009-01-04
  • Second insert was replicated as January 4, 01:01, insert on table t, and error on unique key for 2009-01-04

The solution is either to set the whole machine’s timezone to the timezone of the master you’re replicating, or using the same time-zone and default-time-zone settings for the mysql server.

PEAR's Services_Twitter and the Source Option

I recently started using PEAR’s Services_Twitter package (still in beta) written by Joe Stump (from Digg) and David Jean Louis. My natural choice would have been Zend Framework’s Zend_Service_Twitter, but I wasn’t aware that it was out of the incubator by the time I started.

Using and integrating the Services_Twitter package was very easy, and didn’t require much more than copying and pasting the code from the class documentation:

require_once 'Services/Twitter.php';
$username = 'You_Username';
$password = 'Your_Password';

try {
    $twitter = new Services_Twitter($username, $password);
    $msg = $twitter->statuses->update("I'm coding with PEAR right now!");
    print_r($msg); // Should be a SimpleXMLElement structure
} catch (Services_Twitter_Exception $e) {
    echo $e->getMessage();
}

However, once I requested the “source” option from twitter (that nifty thing that says where the tweet was sent via), things got complicated. Apparently, when the Twitter.php factory builds an instance of an API driver, it instantiates a new object, and doesn’t pass it the current factory options, so this didn’t work:

    $twitter = new Services_Twitter($username, $password);
    $twitter->setOptions('source','myapp');
    $msg = $twitter->statuses->update("I'm coding with PEAR right now!");
    print_r($msg); // Should be a SimpleXMLElement structure

The status was updated, but with no source. A short debug and look through the source revealed that factory behavior (not passing the current options to the newly instantiated statuses object). I don’t really see the rationale behind this, so I filed a bug report.

What did work, was instantiating a Twitter_Statuses object, and working with it directly, like this:

    include_once "Services/Twitter/Statuses.php";

    $twitter = new Services_Twitter_Statuses($username, $password);
    $twitter->setOptions('source','myapp');
    $msg = $twitter->update("I'm coding with PEAR right now!");

Now you can Tweet away with source!