Spark落地到hive表时saveAsTable与insertInto的区别

Spark落地到hive表时saveAsTable与insertInto的区别 2023-04-23 559

SaveAsTable

//Api的解释

Saves the content of the `DataFrame` as the specified table.
*
* In the case the table already exists, behavior of this function depends on the
* save mode, specified by the `mode` function (default to throwing an exception).
* When `mode` is `Overwrite`, the schema of the `DataFrame` does not need to be
* the same as that of the existing table.
*
* When `mode` is `Append`, if there is an existing table, we will use the format and options of
* the existing table. The column order in the schema of the `DataFrame` doesnt need to be same
* as that of the existing table. Unlike `insertInto`, `saveAsTable` will use the column names to
* find the correct column positions.

意思就是说,当hive中已经存在目标表，无论SaveMode是append还是overwrite，不需要schema一样,只要列名存在就行,会根据列名进行匹配覆盖数据

举个例子：
-----当hive中列名是i,j时
  scala> Seq((1, 2)).toDF("i", "j").write.mode("overwrite").saveAsTable("t1")
*    scala> Seq((3, 4)).toDF("j", "i").write.mode("append").saveAsTable("t1")
*    scala> sql("select * from t1").show
*    +---+---+
*    |  i|  j|
*    +---+---+
*    |  1|  2|
*    |  4|  3|
*    +---+---+

InsertInto

//Api解释
Inserts the content of the `DataFrame` to the specified table. It requires that
* the schema of the `DataFrame` is the same as the schema of the table.
*
* @note Unlike `saveAsTable`, `insertInto` ignores the column names and just uses position-based
* resolution. For example:

意思就是说,当hive中存在目标表时,无论SaveMode是append还是overwrite，需要当前DF的schema与目标表的schema必须一致

举个例子：
-----当hive中列名是i,j时，schema为（int,int）

* {
         
  {
         
  {
*    scala> Seq((1, 2)).toDF("i", "j").write.mode("overwrite").saveAsTable("t1")
*    scala> Seq((3, 4)).toDF("j", "i").write.insertInto("t1")
*    scala> Seq((5, 6)).toDF("a", "b").write.insertInto("t1")
*    scala> sql("select * from t1").show
*    +---+---+
*    |  i|  j|
*    +---+---+
*    |  5|  6|
*    |  3|  4|
*    |  1|  2|
*    +---+---+
* }}}

免费搭建微信查券返利机器人来轻松赚佣金