Iceberg

来自牛奶河Wiki
阿奔讨论 | 贡献2024年12月13日 (五) 09:46的版本 (创建页面,内容为“右|无框 Apache Iceberg™ is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. === Expressive SQL === Iceberg supports flexible SQL commands to merge new data, update existing rows, and perform targeted deletes. Iceb…”)
(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)
跳到导航 跳到搜索
Iceberg-logo.png

Apache Iceberg™ is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time.

Expressive SQL

Iceberg supports flexible SQL commands to merge new data, update existing rows, and perform targeted deletes. Iceberg can eagerly rewrite data files for read performance, or it can use delete deltas for faster updates.

MERGE INTO prod.nyc.taxis pt

USING (SELECT * FROM staging.nyc.taxis) st

ON pt.id = st.id

WHEN NOT MATCHED THEN INSERT *;

Full Schema Evolution

Schema evolution just works. Adding a column won't bring back "zombie" data. Columns can be renamed and reordered. Best of all, schema changes never require rewriting your table.

ALTER TABLE taxisRENAME COLUMN trip_distanceTO distance;

Hidden Partitioning

Iceberg handles the tedious and error-prone task of producing partition values for rows in a table and skips unnecessary partitions and files automatically. No extra filters are needed for fast queries, and table layout can be updated as data or queries change.

Time Travel and Rollback

Time-travel enables reproducible queries that use exactly the same table snapshot, or lets users easily examine changes. Version rollback allows users to quickly correct problems by resetting tables to a good state.

SELECT count(*) FROM nyc.taxis FOR TIMESTAMP AS OF TIMESTAMP '2022-01-01 00:00:00.000000 Z';

Data Compaction

Data compaction is supported out-of-the-box and you can choose from different rewrite strategies such as bin-packing or sorting to optimize file layout and size.

CALL system.rewrite_data_files("nyc.taxis");