hadoop -- hive 伪分布式搭建


我是一个经常笑的人,可我不是经常开心的人 -- 未闻花名


环境说明

  • ubuntu 16.06
  • hadoop 2.9
  • hive 2.3.3

下载 hive

http://mirrors.hust.edu.cn/apache/hive/hive-2.3.3/apache-hive-2.3.3-bin.tar.gz

解压到 /opt 中并命名为 hive 方便操作

pinsily@zhu:/opt$ sudo tar -zxvf apache-hive-2.3.3-bin.tar.gz
pinsily@zhu:/opt$ sudo mv apache-hive-2.3.3-bin hive

添加到环境变量

pinsily@zhu:/opt$ vim ~/.bashrc
# hive
HIVW_HOME=/opt/hive
export PATH=$PATH:/opt/hive/bin
pinsily@zhu:/opt$ source ~/.bashrc

安装 mysql

安装服务器端即可

pinsily@zhu:~$ sudo apt-get install mysql-server

下载 jdbc 依赖包并解压 https://dev.mysql.com/downloads/connector/j/

将解压后的 mysql-connector-java-5.1.46.jar 复制到 $HIVE_HOME/lib

pinsily@zhu:/opt$ sudo cp mysql-connector-java-5.1.46.jar /opt/hive/lib/

为 hive 新建一个 hive 用户

# 创建 hive 数据库
mysql> create database hive;

# 新建 hive 用户并赋权限
mysql> create user "hive"@"localhost" identified by "hive";

mysql> grant all privileges on "." to "hive"@"localhost" with grant option;

mysql> show grants for "hive"@"localhost";

mysql> flush privileges;

修改配置文件 在 conf 目录下获得以下四个文件

pinsily@zhu:/opt/hive/conf$ sudo cp hive-env.sh.template hive-env.sh
pinsily@zhu:/opt/hive/conf$ sudo cp hive-log4j2.properties.template hive-log4j2.properties
pinsily@zhu:/opt/hive/conf$ sudo cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties
pinsily@zhu:/opt/hive/conf$ sudo cp hive-default.xml.template hive-default.xml

增加环境变量

pinsily@zhu:/opt/hive/conf$ sudo vim hive-env.sh

# set env variable pinsily 2018.08.05
export HADOOP_HOME=/opt/hadoop
export JAVA_HOME=/opt/java
export HIVE_HOME=/opt/hive
export HIVE_CONF_DIR=/opt/hive/conf
export HIVE_AUX_JARS_PATH=/opt/hive/lib
pinsily@zhu:/opt/hive/conf$ sudo vim hive-site.xml

将以下字段
${system:java.io.tmpdir}/${system:user.name}
${system:java.io.tmpdir}/${hive.session.id}_resources
${system:java.io.tmpdir}/${system:user.name}/operation_logs

分别替换为
/tmp/hive
/tmp/hive/resources
/tmp/hive/operation_logs

并新建相应的目录

pinsily@zhu:/tmp$ sudo mkdir hive
pinsily@zhu:/tmp$ sudo mkdir hive/resources
pinsily@zhu:/tmp$ sudo mkdir hive/operation_logs

Hive 使用 MySQL

参考:https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin

需要在 hive-site.xml 中配置以下参数的 value:

javax.jdo.option.ConnectionURL

<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNoExist=true&amp;useSSL=false</value>
    <description></description>
</property>

<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
</property>

<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    <description>Username to use against metastore database</description>
</property>

<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
    <description>password to use against metastore database</description>
</property>

需要注意的是,在 xml 中,必须使用 &amp; 表示 &,不然启动时会报错


启动

pinsily@zhu:/opt/hive/conf$ hive


问题

  1. Class path contains multiple SLF4J bindings

解决方法:两个log4j冲突了,使用hadoop的,将hive的删除

pinsily@zhu:/opt/hive$ sudo rm -f lib/log4j-slf4j-impl-2.6.2.jar
  1. Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Permission denied

解决方法: 设置好临时目录的权限

pinsily@zhu:/opt/hive$ sudo chown pinsily -R /tmp/hive

  1. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

解决方法:查看mysql是否创建好相应的数据库,即 hive-site.xml 中 javax.jdo.option.ConnectionURL 的value中的hive,然后进行初始化:

pinsily@zhu:/opt/hive/conf$ schematool -dbType mysql -initSchema