hive on spark向数据库中插入数据的时候报错
问题描述
搭建了hive on spark集群,并且已有数据库,向数据库中插入数据的时候报错。 create table student(id int, name string); insert into table student values(1002,‘lisi’);
hive> insert into student values(1002,lisi); Query ID = atguigu_20220505185820_0f520b40-7a94-4485-b5d1-3d85ec50ad7e Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Failed to execute spark task, with exception org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 964d8908-3adb-46db-899f-3a2186c87976) FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session 964d8908-3adb-46db-899f-3a2186c87976
yarn中可以记录到job,但是并未成功执行。 日志如下: 发现日志中主要的原因是由于连接相关的问题引起的。
原因分析:
最后发现是spark配置中的namenode端口号和hadoop集群配置中namenode端口号不一致导致的。
解决方案:
修改配置,使得spark配置中的namenode端口号和hadoop集群配置中namenode端口号一致。 hadoop-3.1.3/etc/hadoop/core-site.xml 文件(这个文件规定了namenode的ip地址和端口号,是以后所有配置的依据,要记住端口号是9820):
<property> <name>fs.defaultFS</name> <value>hdfs://hadoop102:9820 </value> </property>
hive/conf/hive-site.xml 文件:
<property> <name>spark.yarn.jars</name> <value>hdfs://hadoop102:9820/spark-jars/*</value> </property> <!--Hive执行引擎--> <property> <name>hive.execution.engine</name> <value>spark</value> </property> <!--Hive和Spark连接超时时间--> <property> <name>hive.spark.client.connect.timeout</name> <value>10000ms</value> </property> </configuration>
hive/conf/spark-defaults.conf 文件(就是因为这里的端口号错配成8020才出错,改成9820就好了):
spark.master=yarn spark.eventLog.enabled=true spark.eventLog.dir=hdfs://hadoop102:9820/spark-history spark.executor.memory=1g spark.driver.memory=1g
最后插入数据成功
下一篇:
SpringBoot获取项目日志