Release Note 3.1.0

Thanks to our devoted developers and supportive community users, the much-expected Apache Doris 3.1.0 is now available!

## VARIANT

### Sparse Columns and Sub-columns With Vertical Compaction

Traditional OLAP systems often encounter metadata bloat, compaction amplification, and query degradation when dealing with "extremely wide tables/excessive columns" (ranging from thousands to tens of thousands). Doris 3.1 leverages the sparsity of VARIANT sub-columns and sub-column-level Vertical Compaction to increase the manageable column limit to the order of tens of thousands.

Through in-depth optimizations at the storage layer, VARIANT delivers the following benefits to users:

- Stable support for "thousands to tens of thousands" of sub-columns (columnar storage), with smoother query and compaction latencies.
- Controllable metadata and indexes, avoiding exponential growth.
- Proven capability to extract over 10,000 sub-columns (columnar storage) with efficient Compaction performance.

### Schema Template

Using Schema Template provides the following benefits when working with the VARIANT data type:

- Type Stability: Critical sub-paths can have their types fixed in the DDL, preventing query errors, index invalidation, and overhead from implicit conversions caused by type drift.
- Faster and More Accurate Retrieval: Inverted indexing strategies (tokenized/non-tokenized, parsers, phrase search, etc.) can be customized for different sub-paths, resulting in lower latency and more stable hit rates for common queries.
- Controllable Indexing and Costs: Moves away from "uniform column-wide index inheritance" (an approach in 2.1 that easily leads to bloat) to "fine-grained configuration by sub-path," significantly reducing the number of indexes, write amplification, and storage costs.
- Improved Maintainability and Collaboration: Equivalent to adding a "data contract" to JSON, ensuring semantic consistency across teams; type and index states are more observable, making issues easier to diagnose.
- Evolution-Friendly: Core high-frequency paths can be templated with optional indexing, while long-tail fields retain flexible extensibility, preserving scalability.

## Inverted Index

### Inverted Index Storage Format V3

Further storage optimizations compared to V2. Index files are smaller, reducing disk usage and I/O overhead. Based on test results from the httplogs and logsbench datasets, storage space can be reduced by up to 20% with V3, making it ideal for large-scale text data and log analytics scenarios.

### New Tokenizers

- ICU(International Components for Unicode) Tokenizer - Internationalized text containing complex writing systems, particularly suitable for multilingual mixed documents
- IK Tokenizer - Chinese Tokenizer, Advanced algorithm-based Chinese tokenization, combining dictionary and statistical models
- Basic Tokenizer - Basic tokenization, using character type recognition for segmentation

### Custom Tokenizer
The custom tokenization feature is introduced to allow users to customize combinations according to their specific tokenization needs, further improving text retrieval recall. Custom tokenization overcomes the limitations of built-in tokenizers by enabling the combination of character filters, tokenizers, and token filters based on specific requirements, precisely defining how text is segmented into searchable terms, directly determining the relevance of search results and the accuracy of data analysis.

## LakeHouse

### Asynchronous Materialized Views Fully Support Data Lakes
In version 3.1, asynchronous materialized views fully support partitioned incremental building and partition transparent rewriting for Paimon, Iceberg, and Hudi.

### Iceberg
Version 3.1.0 introduces multiple optimizations and enhanced capabilities for the Iceberg table format, closely advancing integration with Iceberg's latest features.

- Supports full lifecycle management of Branches and Tags
- Supports querying Iceberg system tables
- Supports querying Iceberg views
- Supports modifying Iceberg table schema via ALTER statements

### Paimon
Version 3.1.0 introduces multiple feature updates and capability enhancements for the Paimon table format, based on real user scenarios.

- Supports Paimon Batch Incremental Query
- Supports reading Branches and Tags
- Supports querying Paimon system tables

### DataLake Query Perfermance
Version 3.1.0 introduces multiple deep optimizations for query performance on data lake table formats, aiming to provide users with more stable and efficient data lake analytics capabilities in real production environments.

- Dynamic Partition Pruning
- Batch Shard Execution

## Storage

- Flexible Column Updates
- Optimizes MOW performance in high-concurrency scenarios in Compute-Storage Decoupled Mode

## Query Perfermance

- Enhanced partition pruning performance and expanded applicability
- Provides the capability to optimize queries leveraging data characteristics

## Behavior Changed

### VARIANT

- variant_max_subcolumns_count constraint. Within the same table, the variant_max_subcolumns_count setting for all Variant columns must be either all 0 or all greater than 0. Mixing these values will result in an error during table creation or schema change.
- The new VARIANT read/write/serde and Compaction paths are compatible with existing data. However, queries on VARIANT data upgraded from older versions may exhibit format differences (e.g., additional whitespace, or the use of the '.' delimiter causing unintended hierarchical structure creation, resulting in extra levels).
- When creating an Inverted Index on a VARIANT column, if no fields in the data meet the indexing criteria, an empty index file will still be generated. This is the expected behavior.

### Permissions
- The permission requirement for "SHOW TRANSACTION" has been changed from requiring ADMIN_PRIV to requiring LOAD_PRIV on the corresponding database for imports.
- The permissions for SHOW FRONTENDS / BACKENDS and the NODE Restful API have been unified. Access to these interfaces now requires SELECT_PRIV on the information_schema database.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release Note 3.1.0 #55502

VARIANT

Sparse Columns and Sub-columns With Vertical Compaction

Schema Template

Inverted Index

Inverted Index Storage Format V3

New Tokenizers

Custom Tokenizer

LakeHouse

Asynchronous Materialized Views Fully Support Data Lakes

Iceberg

Paimon

DataLake Query Perfermance

Storage

Query Perfermance

Behavior Changed

VARIANT

Permissions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Release Note 3.1.0 #55502

Description

VARIANT

Sparse Columns and Sub-columns With Vertical Compaction

Schema Template

Inverted Index

Inverted Index Storage Format V3

New Tokenizers

Custom Tokenizer

LakeHouse

Asynchronous Materialized Views Fully Support Data Lakes

Iceberg

Paimon

DataLake Query Perfermance

Storage

Query Perfermance

Behavior Changed

VARIANT

Permissions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions