Co2y's Blog

Phoenix的一些操作记录

在phoenix上做了TPC-C测试,记录一些SQL的转换经历。

insert转换

phoenix本身是不支持insert语法的,取而代之的是upsert,关于upsert的用法官方有两种:

  1. upsert into table select
  2. upsert into table values

update转换

phoenix把insert和update统一成了upsert,这就使得原先rdbms中常用的一些写法如:

1
2
3
UPDATE bmsql_district SET d_next_o_id = d_next_o_id + 1 WHERE d_w_id = ? AND d_id = ?;
UPDATE bmsql_stock SET s_quantity = ?, s_ytd = s_ytd + ?, s_order_cnt = s_order_cnt + 1, s_remote_cnt = s_remote_cnt + ? WHERE s_w_id = ? AND s_i_id = ?;

改写成

1
2
3
4
UPSERT INTO bmsql_district(d_w_id, d_id, d_next_o_id)
SELECT d_w_id, d_id, d_next_o_id+1 FROM bmsql_district WHERE d_w_id = ? AND d_id = ?;
UPSERT INTO bmsql_stock(s_w_id,s_i_id,s_quantity,s_ytd,s_order_cnt,s_remote_cnt) select s_w_id,s_i_id,?, s_ytd + ?, s_order_cnt + 1, s_remote_cnt + ? FROM bmsql_stock WHERE s_w_id = ? AND s_i_id = ?;

没错,从上面还可以看出,所有的upsert在phoenix中必须指明rowkey。

join转换

phoenix支持两种join,hash join和sort merge join。 但是phoenix不支持非等值join。 例如:

1
2
3
4
5
6
7
8
9
SELECT count(*) AS low_stock FROM (
SELECT s_w_id, s_i_id, s_quantity FROM bmsql_stock
WHERE s_w_id = ? AND s_quantity < ? AND s_i_id IN (
SELECT ol_i_id FROM bmsql_district
JOIN bmsql_order_line ON ol_w_id = d_w_id
AND ol_d_id = d_id
AND ol_o_id >= d_next_o_id - 20
AND ol_o_id < d_next_o_id WHERE d_w_id = ? AND d_id = ? )
);

必须改写成

1
2
3
4
5
6
7
8
9
SELECT count(*) AS low_stock FROM (
SELECT s_w_id, s_i_id, s_quantity FROM bmsql_stock
WHERE s_w_id = ? AND s_quantity < ? AND s_i_id IN (
SELECT ol_i_id FROM bmsql_district
JOIN bmsql_order_line ON ol_w_id = d_w_id
AND ol_d_id = d_id
WHERE d_w_id = ? AND d_id = ? AND ol_o_id >= d_next_o_id - 20
AND ol_o_id < d_next_o_id)
);

但是这样phoenix会去全表扫描bmsql_order_line表,效率低下。 但是从上面的SQL中可以看出,其实是可以直接定位到bmsql_order_line表中的具体范围的。具体如下:

1
2
3
4
5
6
7
8
9
10
11
12
SELECT ol_i_id FROM bmsql_district
INNER JOIN bmsql_order_line ON ol_w_id = d_w_id
AND ol_d_id = d_id
WHERE d_w_id = 3 AND d_id = 4 AND ol_o_id >= d_next_o_id - 20
AND ol_o_id < d_next_o_id;
SELECT ol_i_id FROM bmsql_district
INNER JOIN bmsql_order_line ON ol_w_id = 3
AND ol_d_id = 4
WHERE d_w_id = 3 AND d_id = 4 AND ol_o_id >= d_next_o_id - 20
AND ol_o_id < d_next_o_id;

分别对应如下执行计划

1
2
3
CLIENT 16-CHUNK PARALLEL 16-WAY ROUND ROBIN FULL SCAN OVER BMSQL_DISTRICT
CLIENT 1-CHUNK 1 ROWS 682 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 1 KEY OVER BMSQL_DISTRICT

从上面还可以看出phoenix的join执行顺序,具体见这里