LZSQL compiler hacking and future VoltDB applications

We’ve working on a compiler for our own language named "lzsql". It’s for our nginx-based web service platform that drives our data product lz.taobao.com. Our "lzsql" compiler can now emit lua code that has passed lots of real world tests.

We can now decide whether to run a sql query at a remote mysql node or at the nginx core, all in the lzsql language.

For "local sql queries", we’ve implemented a full-fledged sql engine in pure lua. It’s damn fast, especially using LuaJIT. 6k q/s for a single nginx worker process is not uncommon in our benchmark.

And we’ve introduced a type system in our language such that it can handle sql quoting rules automatically. The typechecker can ensure that a lzsql variable with a specific type is used correctly in the context of the sql query. The sql language is part of the language anyway. Therefore, sql injection cannot happen.

We mostly use the "local sql engine" for merging data from completely different data sources, like those from both mysql and a non-relational data source. We do have some non-relational data sources like our real time stats services and other Java-powered web services from other departments of Taobao.com.

Here’s a small example:

   text $pattern;
   location $mysql_node;

   @a := select count(id) as count
         from cats
         where name contains $pattern
         group by park
         at $mysql_node;

   @b := select count(id) as count
            from other_service.some_api($pattern)
            group by park;

   return (@a union all @b);

In this sample, "other_service.some_api" is a non-blocking call to some remote non-relational data source. And the first SQL query runs on a remote mysql node specified by the variable $mysql_node while the last two both run directly in the nginx core by our sql engine written in Lua.

The .lzsql source file is compiled down to (very compact) Lua code before deploying to our production servers. Because it is a true compiler, we use Perl 5, one of the not-so-fast scripting languages, to implement the whole compiler (approximately 3k lines of hand-written code). Perl modules like Moose and Parse::Descent have made the compiler construction process quite enjoyable 🙂

In the future, the lzsql compiler is also expected to optimize the sql queries automatically for specific remote sql engine, like mysql’s.

The lzsql compiler will be eventually be released under an opensource license with the name "RestyScript" when we decouple those our specific business logic from the compiler. For now, we hardcode some business logic into the compiler for the sake of convenience. We’re going to move them into compiler plugins or language extensions and make the lzsql toolchain itself more general.

My intern students become very productive when they start using the lzsql language 😉 The old system they’re replacing is written in tons of ugly php code, oh well 😉 we’ve cut off 90% of the codebase size and also got 20 ~ 30 times faster 😀

We’re also puting our heads around VoltDB, a really nice memory database. And we’re also looking forward to rewriting our "real time stats services" mentioned above using VoltDB and Erlang or Lua or etc. An nginx upstream module for the VoltDB binary protocol is also on chaoslawful’s and my TODO list.

The only sad part regarding VoltDB is that it’s written in Java, but it’s not a very big issue for us. It has some ugly limitations regarding its sql and interfaces, but we can work around those details on the level of our lzsql language and just use it (combined with java) as the runtime.

It’s already starting to become more and more interesting 🙂

Stay tuned!

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to LZSQL compiler hacking and future VoltDB applications

  1. Daniele Testa says:

    Did you even create a VoltDB client module for Lua? If you did, or planning to do, could you please make it open source? I would love to use Lua with VoltDB using the binary protocol instead of the HTTP/JSON interface they are working on.

Leave a reply to yichunzhang Cancel reply