Iceberg 基础
跳到导航
跳到搜索
Iceberg 表版本(V1和V2):
- V1 定义了如何使用不可变类型的文件(Parquet、ORC、AVRO)来管理大型分析型的表,包括元数据文件、属性、数据类型、表的模式,分区信息,以及如何写入与读取
- V2 在 V1 的基础上增加了如何通过这些类型的表实现行级别的更新与删除功能。其最主要的改变是引入了 delete file 记录需要删除的行数据,这样可以在不重写原有(数据)文件的前提下,实现行数据的更新与删除
一般来说:
- V1 表只支持增量数据插入,适合做纯增量写入场景,如埋点数据表
- V2 表支持行级更新,适合做状态变化的更新,如用户表、订单表
TBLPROPERTIES ( 'format-version'='2', )
hive -> iceberg
hive
# 分隔符为空格,字符串中有空格用双引号引起来 CREATE TABLE test1 ( col1 STRING, col2 INT, col3 STRING, col4 STRING, col5 STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = " ", "quoteChar" = "\"", "serialization.encoding"="UTF-8" ) STORED AS TEXTFILE; # hdfs dfs -put test1.csv /tmp/data/ LOAD DATA INPATH '/tmp/data/test1.csv' OVERWRITE INTO TABLE test1; select col4, count(*) cs from test1 group by col4 limit 10; ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 30 30 0 0 0 0 Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 99.62 s ---------------------------------------------------------------------------------------------- INFO : Completed executing command(queryId=hdfs_20241224111257_4f4363b0-d489-4bac-99d8-86451fa0a45c); Time taken: 99.725 seconds +----------------+-----------+ | col4 | cs | +----------------+-----------+ | xeron x5 3708 | 40075712 | +----------------+-----------+
iceberg
# 分区字段不能出现在建表字段中 CREATE TABLE test_ice ( col1 STRING, col2 INT, col4 STRING, col5 STRING ) PARTITIONED BY (col3 STRING) STORED by iceberg; insert into test_ice(col1, col2, col3, col4, col5) select col1, col2, col3, col4, col5 from test1; select col4, count(*) cs from test_ice group by col4 limit 10; ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 4 4 0 0 0 0 Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 22.68 s ---------------------------------------------------------------------------------------------- INFO : Completed executing command(queryId=hdfs_20241224111143_d436171c-f9fe-4e9a-87e1-0e2ea9b48b1b); Time taken: 26.945 seconds
运行时
# vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 24 2 0 4735832 1816 6360404 0 0 3 25 4 23 2 1 97 0 0 31 0 0 4675836 1816 6360596 0 0 0 41 2591 3733 85 15 0 0 0 44 0 0 4615600 1816 6360656 0 0 0 80 2943 5270 88 11 1 0 0 65 0 0 4531204 1816 6360548 0 0 0 24 3148 4973 86 14 1 0 0 38 0 0 4477960 1816 6360580 0 0 0 28 2751 6194 88 12 0 0 0 ... 1 3 0 3117460 1708 7576584 0 0 0 0 2368 3685 1 1 49 49 0 0 3 0 3117144 1708 7576596 0 0 0 16384 2550 3815 1 1 50 49 0 ... 0 0 0 3062816 1708 8845520 0 0 0 108 4263 7792 3 1 95 1 0 0 0 0 3062804 1708 8845684 0 0 0 100 4112 8209 2 1 97 1 0 # top Tasks: 259 total, 2 running, 257 sleeping, 0 stopped, 0 zombie %Cpu(s): 92.3 us, 6.7 sy, 0.0 ni, 0.1 id, 0.0 wa, 0.0 hi, 0.9 si, 0.0 st KiB Mem : 16266524 total, 273132 free, 9388948 used, 6604444 buff/cache KiB Swap: 0 total, 0 free, 0 used. 6373620 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 23006 hdfs 20 0 2536576 551248 25184 S 57.8 3.4 0:28.10 java 22892 hdfs 20 0 2507524 491888 25188 S 55.8 3.0 0:28.05 java 23020 hdfs 20 0 2534372 636328 25168 S 55.1 3.9 0:23.80 java 22887 hdfs 20 0 2532756 646060 25176 S 46.8 4.0 0:27.88 java 22895 hdfs 20 0 2537776 662796 25184 S 40.9 4.1 0:30.90 java 22959 hdfs 20 0 2506228 502380 25180 S 40.5 3.1 0:28.43 java 22897 hdfs 20 0 2507996 486764 25188 S 39.5 3.0 0:28.39 java 23004 hdfs 20 0 2504672 652876 25176 S 35.5 4.0 0:30.46 java