Co2y's Blog

Tephra介绍

Tephra 在 Apache HBase 的基础上提供了全局一致性的事务支持。HBase 提供了强一致性的基于行(row)和区域(region)的 ACID 操作支持,但是牺牲了在跨区域操作的支持。这就要求应用开发者花很大力气来确保区域边界上操作的一致性。而 Tephra 提供了全局事务支持,可以夸区域、跨表以及多个 RPC 上简化了应用的开发。

架构

Tephra leverages HBase’s native data versioning to provide multi-versioned concurrency control (MVCC) for transactional reads and writes. With MVCC capability, each transaction sees its own consistent “snapshot” of data, providing snapshot isolation of concurrent transactions.

Tephra consists of three main components:

  • Transaction Server - maintains global view of transaction state, assigns new transaction IDs and performs conflict detection;
  • Transaction Client - coordinates start, commit, and rollback of transactions;
  • TransactionProcessor Coprocessor - applies filtering to the data read (based on a given transaction’s state) and cleans up any data from old (no longer visible) transactions.

Transaction Server

A central transaction manager generates a globally unique, time-based transaction ID for each transaction that is started, and maintains the state of all in-progress and recently committed transactions for conflict detection. While multiple transaction server instances can be run concurrently for automatic failover, only one server instance is actively serving requests at a time. This is coordinated by performing leader election amongst the running instances through Apache ZooKeeper. The active transaction server instance will also register itself using a service discovery interface in ZooKeeper, allowing clients to discover the currently active server instance without additional configuration.

Transaction Client

A client makes a call to the active transaction server in order to start a new transaction. This returns a new transaction instance to the client, with a unique transaction ID (used to identify writes for the transaction), as well as a list of transaction IDs to exclude for reads (from in-progress or invalidated transactions). When performing writes, the client overrides the timestamp for all modified HBase cells with the transaction ID. When reading data from HBase, the client skips cells associated with any of the excluded transaction IDs. The read exclusions are applied through a server-side filter injected by the TransactionProcessor coprocessor.

TransactionProcessor Coprocessor

The TransactionProcessor coprocessor is loaded on all HBase tables where transactional reads and writes are performed. When clients read data, it coordinates the server-side filtering performed based on the client transaction’s snapshot. Data cells from any transactions that are currently in-progress or those that have failed and could not be rolled back (“invalid” transactions) will be skipped on these reads. In addition, the TransactionProcessor cleans up any data versions that are no longer visible to any running transactions, either because the transaction that the cell is associated with failed or a write from a newer transaction was successfully committed to the same column.

安装

phoenix的安装,见https://phoenix.apache.org/installation.html,只需要解压然后把phoenix-4.7.0-HBase-0.98-server.jar放到hbase的lib下,重启hbase,设置环境变量就可以使用sqlline.py slave进入shell。

这里介绍在已有phoenix环境下安装tephra,参考这里。可能会遇到guava版本不匹配的问题,tephra依赖于guava13+,而hbase本身是guava12+,下载最新的guava13+的jar包,将其放在classpath的前面能够优先加载。例如phoenix目录下新建一个lib目录,放进去。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
以下写入hbase/conf下的hbase-site.xml
<property>
<name>phoenix.transactions.enabled</name>
<value>true</value>
</property>
<property>
<name>data.tx.snapshot.dir</name>
<value>/tmp/tephra/snapshots</value>
</property>
<property>
<name>data.tx.timeout</name>
<value>60</value>
</property>

开发

http://tephra.incubator.apache.org/GettingStarted.html

参考资料

官网http://tephra.incubator.apache.org/
相关的presentation可以参考这里
http://tephra.incubator.apache.org/Presentations.html