查看“Iceberg”的源代码
←
Iceberg
跳到导航
跳到搜索
因为以下原因,您没有权限编辑本页:
您请求的操作仅限属于该用户组的用户执行:
用户
您可以查看和复制此页面的源代码。
[[文件:Iceberg-logo.png|右|无框]] Apache Iceberg™ is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. 为了解决数据存储和计算引擎之间的适配的问题,Netflix 开发了 Iceberg,2018 年 11 月 16 日进入 Apache 孵化器,2020 年 5 月 19 日从孵化器毕业,成为 Apache的顶级项目。 Iceberg 是一个面向海量数据分析场景的开放表格式(Table Format)。表格式(Table Format)可以理解为元数据以及数据文件的一种组织方式,处于计算框架(Flink,Spark,Hive)之下,数据文件之上。 === Expressive SQL === Iceberg supports flexible SQL commands to merge new data, update existing rows, and perform targeted deletes. Iceberg can eagerly rewrite data files for read performance, or it can use delete deltas for faster updates.<blockquote>MERGE INTO prod.nyc.taxis pt USING (SELECT * FROM staging.nyc.taxis) st ON pt.id = st.id WHEN NOT MATCHED THEN INSERT *;</blockquote> === Full Schema Evolution === Schema evolution just works. Adding a column won't bring back "zombie" data. Columns can be renamed and reordered. Best of all, schema changes never require rewriting your table.<blockquote>ALTER TABLE taxisRENAME COLUMN trip_distanceTO distance;</blockquote> === Hidden Partitioning === Iceberg handles the tedious and error-prone task of producing partition values for rows in a table and skips unnecessary partitions and files automatically. No extra filters are needed for fast queries, and table layout can be updated as data or queries change. === Time Travel and Rollback === Time-travel enables reproducible queries that use exactly the same table snapshot, or lets users easily examine changes. Version rollback allows users to quickly correct problems by resetting tables to a good state.<blockquote>SELECT count(*) FROM nyc.taxis FOR TIMESTAMP AS OF TIMESTAMP '2022-01-01 00:00:00.000000 Z';</blockquote> === Data Compaction === Data compaction is supported out-of-the-box and you can choose from different rewrite strategies such as bin-packing or sorting to optimize file layout and size.<blockquote>CALL system.rewrite_data_files("nyc.taxis");</blockquote> [[分类:Develop]] [[分类:Hadoop]] [[分类:Iceberg]]
返回
Iceberg
。
导航菜单
个人工具
登录
命名空间
页面
讨论
大陆简体
查看
阅读
查看源代码
查看历史
更多
搜索
导航
首页
最近更改
随机页面
目录
文章分类
侧边栏
帮助
工具
链入页面
相关更改
特殊页面
页面信息