LZSQL compiler hacking and future VoltDB applications

We’ve working on a compiler for our own language named "lzsql". It’s for our nginx-based web service platform that drives our data product lz.taobao.com. Our "lzsql" compiler can now emit lua code that has passed lots of real world tests.

We can now decide whether to run a sql query at a remote mysql node or at the nginx core, all in the lzsql language.

For "local sql queries", we’ve implemented a full-fledged sql engine in pure lua. It’s damn fast, especially using LuaJIT. 6k q/s for a single nginx worker process is not uncommon in our benchmark.

And we’ve introduced a type system in our language such that it can handle sql quoting rules automatically. The typechecker can ensure that a lzsql variable with a specific type is used correctly in the context of the sql query. The sql language is part of the language anyway. Therefore, sql injection cannot happen.

We mostly use the "local sql engine" for merging data from completely different data sources, like those from both mysql and a non-relational data source. We do have some non-relational data sources like our real time stats services and other Java-powered web services from other departments of Taobao.com.

Here’s a small example:

   text $pattern;
   location $mysql_node;

   @a := select count(id) as count
         from cats
         where name contains $pattern
         group by park
         at $mysql_node;

   @b := select count(id) as count
            from other_service.some_api($pattern)
            group by park;

   return (@a union all @b);

In this sample, "other_service.some_api" is a non-blocking call to some remote non-relational data source. And the first SQL query runs on a remote mysql node specified by the variable $mysql_node while the last two both run directly in the nginx core by our sql engine written in Lua.

The .lzsql source file is compiled down to (very compact) Lua code before deploying to our production servers. Because it is a true compiler, we use Perl 5, one of the not-so-fast scripting languages, to implement the whole compiler (approximately 3k lines of hand-written code). Perl modules like Moose and Parse::Descent have made the compiler construction process quite enjoyable 🙂

In the future, the lzsql compiler is also expected to optimize the sql queries automatically for specific remote sql engine, like mysql’s.

The lzsql compiler will be eventually be released under an opensource license with the name "RestyScript" when we decouple those our specific business logic from the compiler. For now, we hardcode some business logic into the compiler for the sake of convenience. We’re going to move them into compiler plugins or language extensions and make the lzsql toolchain itself more general.

My intern students become very productive when they start using the lzsql language 😉 The old system they’re replacing is written in tons of ugly php code, oh well 😉 we’ve cut off 90% of the codebase size and also got 20 ~ 30 times faster 😀

We’re also puting our heads around VoltDB, a really nice memory database. And we’re also looking forward to rewriting our "real time stats services" mentioned above using VoltDB and Erlang or Lua or etc. An nginx upstream module for the VoltDB binary protocol is also on chaoslawful’s and my TODO list.

The only sad part regarding VoltDB is that it’s written in Java, but it’s not a very big issue for us. It has some ugly limitations regarding its sql and interfaces, but we can work around those details on the level of our lzsql language and just use it (combined with java) as the runtime.

It’s already starting to become more and more interesting 🙂

Stay tuned!

Posted in Uncategorized | 3 Comments

ngx_lua module updated

chaoslawful++ has just added several new features to our ngx_lua module. Checkout his blog article for details:

    http://chaoslawful.javaeye.com/blog/755013

ngx_lua embeds the power of the Lua language into the nginx core and can be used to construct high performance web services and web applications:

    http://github.com/chaoslawful/lua-nginx-module

Happy lua hacking! 😀

Posted in Uncategorized | Leave a comment

ngx_set_misc v0.14: extending ngx_rewrite’s “set” directive

I’m happy to announce the first public release of our Taobao.com ngx_set_misc module, v0.14.

ngx_set_misc is an nginx module that extends the standard ngx_rewrite module’s "set" directive to support various advanced functionalities like MD5, SHA1, json/mysql/postgresql string literal quoting, URI escaping/unescaping, default variable value assignment, upstream hashing based on a custom key, base32 encoding/decoding, and more 🙂

Please see the project homepage for more details:

    http://github.com/agentzh/set-misc-nginx-module

And the release tarball can be downloaded from

    http://github.com/agentzh/set-misc-nginx-module/tarball/v0.14

Various (funny) use cases can be found in my "nginx.conf scripting" talk’s slides:

    http://agentzh.org/misc/slides/nginx-conf-scripting/  (use the arrow keys on your keyboard to switch pages)

I must thank my colleagues shrimp and calio for their work on polishing this module in the last few months.

This module won’t be possible if Marcus Clyne does not publish his crazy Nginx Development Kit (NDK) project:

    http://github.com/simpl-it/ngx_devel_kit

And it’s a prerequisite for this module 🙂

I know that this module has a really terrible name, but it’s been there for months already 😛

We’ve been using it extensively in our products of Taobao.com. And Qunar.com is also using it heavily in their production environment.

Enjoy!

Posted in Uncategorized | Leave a comment

ngx_drizzle v0.0.12: better timeout control

I’m delighted to announce the v0.0.12 release of ngx_drizzle, a non-blocking upstream module that helps nginx talk directly to mysql, drizzle, and sqlite3 servers (and with an optional connection pool). The project source repository and the homepage is on GitHub:

   http://github.com/chaoslawful/drizzle-nginx-module

and the release tarball can be downloaded from

   http://github.com/chaoslawful/drizzle-nginx-module/tarball/v0.0.12

This release has the highlight of several new config directives that control the various timeout settings used by ngx_drizzle:

  drizzle_connect_timeout <time>
  drizzle_send_query_timeout <time>
  drizzle_recv_cols_timeout <time>
  drizzle_recv_rows_timeout <time>

The default timeout values for them are all "60 s", i.e., 60 seconds, which may be too long for many real world applications.

Thanks my colleague shrimp++ for testing them and fixing bugs in the original (undocumented) implementation 🙂

Also thanks Piotr Sikora for his tireless improving the test suite to fit our grand test build farm and his fixes for the recent nginx 0.8.47+ releases.

As always, special thanks go to my colleague and closest friend, chaoslawful++, for creating such an excellent module in the first place 😉

FWIW, we’re already using this module (combined with ngx_rds_json as well as many other modules) in production, in particular, the Taobao.com LiangZi Shop Stats website:

   http://lz.taobao.com/

Have fun!

 

Posted in Uncategorized | Leave a comment

cheater: yet another rule-driven tool to generate random databases

cheater 是我前一阵子开发的规则驱动的随机数据库生成器,是量子开发、测试工具链的组成部分。

cheater 已在量子店铺统计的前端开发中得到了广泛的应用,为前端开发提供大量的指定模式的伪造数据,从而大大减轻了对后端和真实数据的依赖,同时也可以得到比较理想的数据覆盖率。

cheater 工具的源码仓库位于下面这个位置:

    http://github.com/agentzh/cheater/

相比我们 QA 部门的 xdata,Ruby 世界的 faker 和 Perl 世界的 Data::Faker 等同类型的工具,cheater 具有以下优点:

  1. 能自动处理表间的关联和外键约束,因此是真正的“数据库实例生成器”
  2. 定义了一种类 SQL 的小语言来表达期望生成的数据模型
  3. 支持强大的 {a, b, c} 离散集合,数值/时间/日期区间记法 a..b,Perl 正则表达式模式 /regex/,常量值 ‘string’, 1.32 等方式来表达数据字段的值域
  4. 能直接生成 JSON 或者 SQL insert 语句,便于导入到 mysql/Pg 等数据库或者 hive 等其他系统

下面是一个很简单的示例,演示其基本用法。

我们首先在一个工作目录(比如 ~/work/ 下)新建一个 .cht 文件,用来描述我们想生成的数据模型。假设我们有一个 company.cht 文件:

    # Empolyee table
table employees ( id serial;
name text /[A-Z]a-z{2,5} [A-Z]a-z{2,7}/ not null unique;
age integer 18..60 not null;
tel text /1[35]8\d{8}/;
birthday date;
height real 1.50 .. 1.90 not null;
grades text {'A','B','C','D','E'} not null;
department references departments.id;
)

# Department table
table departments (
id serial;
name text /\w{2,10}/ not null;
)

10 employees;
2 departments;

这里我们使用的是 cheater 自己的小语言,其含义几乎是一眼就明了的,特别地,最后两行是说随机生成符合规则的 10 条 employees 表的记录和 2 条 departments 表的记录。

然后我们使用 cht-compile 命令来编译我们的 company.cht 生成随机的数据库实例:

    $ cht-compile company.cht
Wrote ./data/departments.rows.json
Wrote ./data/employees.rows.json

我们看到它分别为 departments 和 employees 这两张表生成了两个 JSON 格式的数据文件。其中的 data/employees.rows.json 文件在我机器上的此次运行是这个样子的:

    $ cat data/employees.rows.json
[["id","name","age","tel","birthday","height","grades","department"],
["7606","Kxhwcn Cflub",54,"15872171866","2011-04-01","1.67276","D","408862"],
["63649","Whf Iajgw",55,"13850771916",null,"1.65297","E","844615"],
["348161","Nnwe Obfkln",27,"15801601215","2011-03-06","1.69275","D","408862"],
["353404","Shgpak Xvqxw",28,"15816453097",null,"1.67796","A","408862"],
["445500","Bdt Mhepht",47,"13855517847",null,"1.89943","C","844615"],
["513515","Ipsa Mcbtk",25,"13874017694","2011-01-06","1.79534","A","844615"],
["658009","Lboe Etqo",27,null,"2011-04-14","1.85162","E","408862"],
["716899","Gey Elacflr",18,"15804516095","2011-02-27","1.75681","A","844615"],
["945911","Hsuz Qcmky",39,"13862516775","2011-05-31","1.75947","B","408862"],
["960643","Qbmbe Ijnbqsb",24,"15872418765","2011-04-11","1.78864","B","844615"]]

最后,为导入到关系数据库,我们可以使用 cht-rows2sql 命令将得到的 .json 数据文件转换为 .sql 文件:

    $ cht-rows2sql data/*.rows.json
Wrote ./sql/departments.rows.sql
Wrote ./sql/employees.rows.sql

其中的 sql/departments.rows.sql 在我这里是这样的:

    $ cat sql/departments.rows.sql
insert into departments (id,name) values
(408862,'dJRq7LCXL'),
(844615,'G_m9Nkh3q');

这样我们就可以直接往 mysql 这样的数据库里导入数据了:

    $ mysql -u monty test -p < sql/departments.rows.sql
目前 cheater 仍处于比较活跃的开发阶段,缺乏比较完整的文档。最完整的文档是它的自动化测试集: 

     http://github.com/agentzh/cheater/tree/master/t/

点开其中的 .t 文件,便可以看到一个个的声明性的测试用例

如果您在使用过程中发现任何 bug 或者有任何功能提议,请在 GitHub 上创建相应的 ticket:

     http://github.com/agentzh/cheater/issues

Enjoy!

Posted in Uncategorized | Leave a comment

A patch for libdrizzle to fix issues on Mac OS X

libdrizzle is an excellent piece of software but we’ve noticed that it does not compile on Mac OS X due to its use of the new bool type in C:

  /usr/local/include/libdrizzle/result.h:69: error: syntax error before ‘drizzle_result_eof’

The following small patch for libdrizzle 0.8 fixes this (as well as another bug regarding streaming parsing on the TCP protocol level, as reported by my colleague chaoslawful++ months ago):

   http://agentzh.org/misc/nginx/libdrizzle-0.8-parsebug_and_mac_fixes.patch

I’m looking forward to the next libdrizzle release or I’ll have to keep my patch for my  ngx_drizzle module’s users 😉

Posted in Uncategorized | Leave a comment

ngx_chunkin v0.20: fixed some memory bugs and added support for chunked PUT

I’ve just pushed a new release (v0.20) for my ngx_chunkin module:

  http://github.com/agentzh/chunkin-nginx-module/tarball/v0.20

Here’s the changes included in this version:

  •   fixed a bug that may read incomplete chunked body. thanks Gong Kaihui (龚开晖).
  •   fixed various memory issues in the implementation which may cause nginx processes to crash.
  •   added support for chunked POST requests.
  •   now we always require "error_page 411 @resume" and no default (buggy) magic any more. thanks Gong Kaihui (龚开晖).

This module adds HTTP 1.1 chunked input support for Nginx without the need of patching the Nginx core.

Behind the scene, it registers an access-phase handler that will eagerly read and decode incoming request bodies when a "Transfer-Encoding: chunked" header triggers a 411 error page in Nginx. For requests that are not in the chunked transfer encoding, this module is a "no-op".

To enable the magic, just turn on the chunkin config option and define a custom 411 error_page using chunkin_resume, like this:

     server {

       chunkin on;

       error_page 411 = @my_411_error;

       location @my_411_error {
           chunkin_resume;

       }

       …

     }

See its wiki page for the full documentation:

  http://wiki.nginx.org/NginxHttpChunkinModule

Note that nginx 0.8.41 and 0.7.67 are confirmed to work with this version. Nginx releases newer than 0.8.41 will not work due to this
issue:

  http://forum.nginx.org/read.php?29,103078

There’s no reply from the nginx author Igor Sysoev yet 😦

Enjoy!

Update: I’ve just kicked ngx_chunkin v0.21 out of the door, which applies a patch from Gong Kaihui that fixed a small bug:

   http://wiki.nginx.org/NginxHttpChunkinModule#v0.21

   http://github.com/agentzh/chunkin-nginx-module/tarball/v0.21

Posted in Uncategorized | Leave a comment