ambari spark2.3.2 集成hive及ES问题集

1、beeline出现org.apache.thrift.TApplicationException: Required field client_protocol is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000, use:database=default})

由于sparksql中的hive-cli 等包的版本是1.2需要替换或使用spark/bin下面的beeline或使用spark/bin下面自带的beeline

2、新版本spark不使用hive的源数据表的问题

修改metastore.catalog.default为hive

3、ambari spark thrift server查询hive映射的ES表出现找不到类的情况

在spark2-thrift-sparkconf中spark.sql.hive.metastore.jars添加ES的jar包(注意用:隔开)并添加到/usr/hdp/3.1.0.0-78/spark2/jars/中

4、出现AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Table default.partition_test failed strict managed table checks due to the following reason: Table is marked as a managed table but is not transactional.)

修改hive-site.xml中hive.strict.managed.tables为false(注意是hive中)

5、出现org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Load Data failed for hdfs://***:8020/warehouse/tablespace/managed/hive/***/.hive-staging_hive_ 2019-07-02_18-17-08_028_419193115114639265-1/-ext-10000/part-00000-1f0e8f19-6a12-448f-ba18-a2319711c0aa-c000 as the file is not owned by hive and load data is also not ran as hive;

spark hive-site.xml添加hive.load.data.owner=spark(具体执行用户)

6、出现org.apache.spark.sql.AnalysisException: java.lang.NullPointerException: null;

at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)

spark sql不支持hive的OrcInputFormat格式

经验分享 程序员 微信小程序 职场和发展