Co2y's Blog

利用docker搭建hbase集群

github上没找到直接搭好的docker镜像,于是自己搭了一遍,这里只是搭建的步骤,相关脚本以及docker镜像暂不公开。

环境

两台机器,192.168.0.2和192.168.0.1,主机名分别为server2和server1,搭建一个两节点的HBase集群,把server1作为master+slave,server2作为slave。

具体步骤

这个集群是用docker搭建的,当然不用docker也可以,关于docker环境不再赘述。用docker只是多了一层封装,迁移和删除要方便一些。

准备阶段

先在两台机器上拉取ubuntu 14.04镜像。

1
sudo docker pull ubuntu:14.04

在两台机器上分别启动对应容器,这里用的网络模式是host模式,这样访问host对应的端口就是访问容器,也就是局域网内都可以访问集群。

1
2
3
4
5
sudo docker run -it --name hbase-master -v /home/co2y/hbase-master:/home --net=host ubuntu:14.04
ctrl+p+q 退回host
sudo docker run -it --name hbase-slave -v /home/co2y/hbase-master:/home --net=host ubuntu:14.04
ctrl+p+q 退回host

环境配置

下面配置相关环境,以server1上的hbase-master为例,两个容器操作基本一样。

1
sudo docker exec -it hbase-master bash

先更新软件源,安装vim

1
2
3
apt-get update
apt-get install vim

配置Java环境

1
2
3
4
5
6
7
8
9
10
apt-get install openjdk-7-jdk
vim /etc/profile 和~/.bashrc (所有的环境变量两个文件各写一份)
JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
PATH=$PATH:$JAVA_HOME/bin
CLASSPATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export JAVA_HOME CLASSPATH PATH
source /etc/profile

配置ssh

1
2
apt-get install openssh-server
修改/etc/ssh/ssh_config和/etc/ssh/sshd_config里面的端口为2222
配置ssh免密码登陆
1
2
3
4
5
6
7
8
ssh-keygen -t rsa
把另一台机器的公钥和自己的公钥添加到authorized_keys
cat id_rsa.pub >> ~/.ssh/authorized_keys
sed -ri 's/UsePAM yes/#UsePAM yes/g' /etc/ssh/sshd_config
sed -ri 's/#UsePAM no/UsePAM no/g' /etc/ssh/sshd_config
/etc/init.d/ssh start

配置/etc/hosts

1
2
3
vim /etc/hosts
192.168.0.1 server1
192.168.0.2 server2

配置Hadoop

下载

1
2
3
4
5
apt-get install wget
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.3.0/hadoop-2.3.0.tar.gz
tar -zxvf hadoop-2.3.0.tar.gz

配置Hadoop环境变量

1
2
3
4
5
6
7
vim /etc/profile (~/.bashrc也写一份)
HADOOP_HOME=/root/hadoop-2.3.0
PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_HOME PATH
source /etc/profile

配置Hadoop相关文件

1
2
3
4
vim slaves
server2
server1
1
2
3
4
5
6
7
8
9
10
11
12
vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://server1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/</value>
</property>
</configuration>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
vim hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/dfs/name</value>
</property>
<property>
<name>dfs.namenode.data.dir</name>
<value>file:/home/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
vim yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>server1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>server1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>server1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>server1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>server1:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>server1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>server1:19888</value>
</property>
</configuration>

修改环境变量 vim ~/.bashrc

1
2
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

把配置文件scp到slave上

1
scp -r hadoop-2.3.0 server2:/root/

启动Hadoop

1
2
3
hdfs namenode -format
start-dfs.sh
start-yarn.sh

jps应该看到,如果没有这几项,查看对应log看错误原因

1
2
3
4
5
6
9379 ResourceManager
8904 NameNode
9183 SecondaryNameNode
28013 Jps
9021 DataNode
9478 NodeManager

配置Zookeeper

下载

1
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz

环境变量

1
2
3
ZOOKEEPER_HOME=/root/zookeeper-3.4.5
PATH=$PATH:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf
export ZOOKEEPER_HOME PATH

配置

1
2
3
4
5
6
7
8
9
10
11
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
dataDir=/home/zookeeper/data
clientPort=2181
server.1=server1:2888:3888
server.2=server2:2888:3888
echo "1" > /home/zookeeper/data/myid
(另一台则是echo "2" > /home/zookeeper/data/myid)

把配置文件scp到slave上

1
scp -r zookeeper-3.4.5 server2:/root/

启动zookeeper (两台分别启动)

1
zkServer.sh start

配置hbase

下载

1
2
wget https://archive.apache.org/dist/hbase/0.98.16.1/hbase-0.98.16.1-hadoop2-bin.tar.gz
tar -zxvf hbase-0.98.16.1-hadoop2-bin.tar.gz

环境变量

1
2
3
HBASE_HOME=/root/hbase-0.98.16.1-hadoop2
PATH=$PATH:$HBASE_HOME/bin:$HBASE_HOME/conf
export HBASE_HOME PATH

修改conf/hbase-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://server1:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>hdfs://server1:60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>server1,server2</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/zookeeper/data</value>
</property>
</configuration>

修改conf/regionservers

1
2
server1
server2
scp到server2上
1
scp -r hbase-0.98.16.1-hadoop2 server2:/root/

启动HBase

1
bin/start-hbase.sh

配置完毕&&测试

1
2
3
4
5
6
7
8
9
10
11
bin/hbase shell
hbase(main):001:0> list
test
1 row(s) in 0.8560 seconds
=> ["test"]
hbase(main):002:0> status
2 servers, 0 dead, 1.5000 average load

HIVE

1
2
3
4
5
6
wget
tar -xzvf
HIVE_HOME=/root/apache-hive-1.2.1-bin
PATH=$PATH:$HIVE_HOME/bin
export HIVE_HOME PATH
export HADOOP_USER_CLASSPATH_FIRST=true

替换Derby为MySQL
修改hive-site.xml

如果要启动集群,scp到客户端一份,修改客户端hive-site.xml

1
2
3
4
5
6
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop-master:9083</value>
</property>
</configuration>

主节点上执行hive --service metastore &

此时客户端直接hive即可获得主节点上hive的metastore信息