[This post has previously been published on Nicolas Cynober's blog. Nicolas is a R&D Engineer and co-founder of Pearltrees]
Many of you asked us about the technologies behind all these pearls. The web, as much as the internet or even computer science is a succession of technologies stacked one on another, so we’ll try to list here some of the essential technologies we use as well as to represent our production architecture in a simple way.
Flex is an ActionScript (Flash) Framework that enables us to develop the user interface of the web version. It makes it possible to quickly develop a rich interface and to deploy it on a large number of terminals. Today, our main challenge is to manage the size of the generated application which is a key element in app loading time.
Essentially used to feed the search engines and to give an alternate version for terminals that don’t support Flash. All the content in Pearltrees is available in HTML format. We also worked a lot on HTML when developing the Pearltrees embed. To help us resolve integration issues in an uncontrolled environment we are using HTML5 properties.
RDF is a semantic web related format that we use for exporting data from Pearltrees. Instead of creating another XML scheme, RDF is a format better adapted for sharing linked data which is crucial since pearls are both linked together and linked to URLs.
XUL is the presentation format used by Mozilla. We use it in Pearltrees’ extension for Firefox.
The IE extension has been entirely developed in C++.
Soon, you will know why we have used Objective-C We use the cocos2d game framework to handle various animations and effects.
We are using Java 6 and Tomcat in order to make the Pealrtrees application and our ranking algorithm “Tree Rank” work. Thanks to its “secret recipe”, Tree Rank enables us to calculate the relations between the pearltrees and to make it easy to discover related contents. Tree Rank is running on an HazelCast cluster and is hosted in the Cloud. We also are using Lucene as a basis for our search engine and we are currently experimenting with Cassandra.
Initially we tested PHP, but Java is now used in most of our server software.
Our backend is still using PHP. First on Zend Framework and Apache, most of our server code is now on Java. We decided to use java instead of the Zend framework because we had performance issues with the latter. However, we still have some applications in Zend Framework, since we are using Piwik for our stats.
MySQL 5.1 is our main database. The core base has 35 tables and more than 60 million rows. It is run on two master / slave machines (48Gigs of ram and 32 CPUs).
Our file server used to be powered by NFS v3 to share logos, avatars and thumbshots, but we recently moved and mirrored our 200Gig of of assets into several Amazon S3 locations. Although we use Amazon CloudFront, we also publish some of our files through the level3 CDN.
Xen enables the virtualization of our “Fetch” software, which is running on 16 virtual machines. Fetch is managing the creation of thumbnails and the preparation of the pre-loading of the web pages when browsing a pearltree. Today, we can process 10.000 URLs / hr. We can then process our whole thumbshot basis in 15 days.
The administration of our servers is mostly done with scripts. These are written in bash and python. We also use python in software that detects the URLs that can’t be viewed within an iFrame.
We are using Bugzilla to define the priority of bugs and to follow their resolution.
Here’s the state of our bug list:
Our whole code has been written in versions and is split in branches thanks to SVN. Through today we have made more than 23.000 commits.
Here’s a simplified version of our current architecture. We also have a pre-production and a development architectures that are similar to this one. UPDATE Sept. 2011 : The following sketch does not really reflect our current architecture as we have moved some services into the Cloud during the past several months.