-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
Description
Behavior Changes
Query Optimizer
- Added the variable
use_max_length_of_varchar_in_ctas
to control the length behavior of VARCHAR type when executingCREATE TABLE AS SELECT
(CTAS) operations. #37069- This variable is set to true by default.
- When set to true, if the VARCHAR type column originates from a table, the derived length is used; otherwise, the maximum length is used.
- When set to false, the VARCHAR type will always use the derived length.
- All data types will now be displayed in lowercase to maintain compatibility with MySQL format. #38012
- Multiple query statements in the same query request must now be separated by semicolons. #38670
Query Execution
- The default number of parallel tasks after shuffle operations in the cluster is set to 100, which will improve query stability and concurrent processing capability in large clusters. #38196
Storage
- The default value of
trash_file_expire_time_sec
has been changed from 86400 seconds to 0 seconds, which means that if files are deleted by mistake and the FE trash is cleared, the data cannot be recovered. - The table attribute
enable_mow_delete_on_delete_predicate
(introduced in version 3.0.0) has been renamed toenable_mow_light_delete
. - Explicit transactions are now prohibited from performing delete operations on tables with written data.
- Heavy schema change operations are prohibited on tables with auto-increment fields.
New Features
Job Scheduling
- Optimized the execution logic of internal scheduling jobs, decoupling the strong association between start time and immediate execution parameters. Now, tasks can be created with a specified start time or selected for immediate execution, without conflict, enhancing scheduling flexibility. #36805
Compute-Storage Decoupled
- Supports dynamic modification of the upper limit for file cache usage. #37484
- Recycler now supports object storage rate limiting and server-side rate limiting retry functionality. #37663 #37680
Lakehouse
- Added the session variable
serde_dialect
to set the output format for complex types. #37039 - SQL interception now supports external tables.
- For more information, refer to the documentation on SQL Interception.
- Insert overwrite now supports Iceberg tables. #37191
Asynchronous Materialized Views
- Supports partition roll-up and build at the hourly level. #37678
- Supports atomic replacement of asynchronous materialized view definition statements. #36749
- Transparent rewriting now supports Insert statements. #38115
- Transparent rewriting now supports the VARIANT type. #37929
Query Execution
- The group concat function now supports DISTINCT and ORDER BY options. #38744
Semi-Structured Data Management
- The ES Catalog now maps
nested
orobject
types in Elasticsearch to the JSON type in Doris. #37101 - Added the
MULTI_MATCH
function, which supports matching keywords across multiple fields and can leverage inverted indexes to accelerate searches. #37722 - Added the
explode_json_object
function, which can unfold objects in JSON data into multiple rows. #36887 - Inverted indexes now support memtable advancement, requiring index construction only once during multi-replica writes, reducing CPU consumption and improving performance. #35891
- Added
MATCH_PHRASE
support for positive slop, e.g.,msg MATCH_PHRASE 'a b 2+'
can match instances containing words a and b with a slop of no more than two, and a preceding b; regular slop without the final+
does not guarantee this order. #36356
Other
- Added the FE parameter
skip_audit_user_list
, where user operations specified in this configuration will not be recorded in the audit log. #38310- For more information, refer to the documentation on Audit Plugin.
Improvements
Storage
- Reduced the likelihood of write failures caused by disk balancing within a single BE. #38000
- Decreased memory consumption by the memtable limiter. #37511
- Moved old partitions to the FE trash during partition replacement operations. #36361
- Optimized memory consumption during compaction. #37099
- Added a session variable to control audit logs for JDBC PreparedStatement, with default setting to not print. #38419
- Optimized the logic for selecting BEs for group commits. #35558
- Improved the performance of column updates. #38487
- Optimized the use of
delete bitmap cache
. #38761 - Added a configuration to control query affinity during hot and cold tiering. #37492
Compute-Storage Decoupled
- Implemented automatic retries when encountering object storage server rate limiting. #37199
- Adapted the number of threads for memtable flush in the compute-storage decoupled mode. #38789
- Added Azure as a compile option to support compilation in environments without Azure support.
- Optimized the observability of object storage access rate limiting. #38294
- Allowed the file cache TTL queue to perform LRU eviction, enhancing TTL queue usability. #37312
- Optimized the number of balance writeeditlog IO operations in the storage and compute separation mode. #37787
- Improved table creation speed in the storage and compute separation mode by sending tablet creation requests in batches. #36786
- Optimized read failures caused by potential inconsistencies in the local file cache through backoff retries. #38645
Lakehouse
- Optimized memory statistics for Parquet/ORC format read and write operations. #37234
- Trino Connector Catalog now supports predicate pushdown. #37874
- Added a session variable
enable_count_push_down_for_external_table
to control whether to enablecount(*)
pushdown optimization for external tables. #37046 - Optimized the read logic for Hudi snapshot reads, returning an empty set when the snapshot is empty, consistent with Spark behavior. #37702
- Improved the read performance of partition columns for Hive tables. #37377
Asynchronous Materialized Views
- Improved transparent rewrite plan speed by 20%. #37197
- Eliminated roll-up during transparent rewrite if the group key satisfies data uniqueness for better nested matching. #38387
- Transparent rewrite now performs better aggregation elimination to improve the matching success rate of nested materialized views. #36888
MySQL Compatibility
- Now correctly populates the database name, table name, and original name in the MySQL protocol result columns. #38126
- Supported the hint format
/*+ func(value) */
. #37720
Query Optimizer
- Significantly improved the plan speed for complex queries. #38317
- Adaptively chose whether to perform bucket shuffle based on the number of data buckets to avoid performance degradation in extreme cases. #36784
- Optimized the cost estimation logic for SEMI / ANTI JOIN. #37951 #37060
- Supported pushing Limit down to the first stage of aggregation to improve performance. #34853
- Partition pruning now supports filter conditions containing the
date_trunc
ordate
function. #38025 #38743 - SQL cache now supports query scenarios that include user variables. #37915
- Optimized error messages for invalid aggregation semantics. #38122
Query Execution
- Adapted AggState compatibility from 2.1 to 3.x and fixed Coredump issues. #37104
- Refactored the strategy selection for local shuffle without Join. #37282
- Modified the scanner for internal table queries to be asynchronous to prevent stalling during such queries. #38403
- Optimized the block merge process during Hash table construction for Join operators. #37471
- Optimized the duration of lock holding for MultiCast. #37462
- Optimized gRPC keepAliveTime and added link monitoring to reduce the probability of query failure due to RPC errors. #37304
- Cleaned up all dirty pages in jemalloc when memory limits were exceeded. #37164
- Optimized the processing performance of
aes_encrypt
/decrypt
functions for constant types. #37194 - Optimized the processing performance of the
json_extract
function for constant data. #36927 - Optimized the processing performance of the
ParseUrl
function for constant data. #36882
Semi-Structured Data Management
- Bitmap indexes now default to using inverted indexes, with
enable_create_bitmap_index_as_inverted_index
set to true by default. #36692 - In the compute-storage decoupled mode, DESC can now view sub-columns of VARIANT type. #38143
- Removed the step of checking file existence during inverted index queries to reduce access latency to remote storage. #36945
- Complex types ARRAY / MAP / STRUCT now support
replace_if_not_null
for AGG tables. #38304 - Escape characters for JSON data are now supported. #37176 #37251
- Inverted index queries now behave consistently on MOW tables and DUP tables. #37428
- Optimized the performance of inverted index acceleration for IN queries. #37395
- Reduced unnecessary memory allocation during TOPN queries to improve performance. #37429
- When creating an inverted index with tokenization, the
support_phrase
option is now automatically enabled to acceleratematch_phrase
series phrase queries. #37949
Other
- Audit logs can now record SQL types. #37790
- Added support for
information_schema.processlist
to show all FE. #38701 - Cached ranger's
atamask
androwpolicy
to accelerate query efficiency. #37723 - Optimized metadata management in job manager to release locks immediately after modifying metadata, reducing lock holding time. #38162
Bug Fixes
Upgrade
- Fix the issue where
mtmv load
fails during upgrade from version 2.1. #38799 - Resolve the issue where
null_type
cannot be found during the upgrade to version 2.1. #39373 - Address the compatibility issue with permission persistence during the upgrade from version 2.1 to 3.0. #39288
Load
- Fix the issue where parsing fails when the newline character is surrounded by delimiters in CSV format parsing. #38347
- Resolve potential exception issues when FE forwards group commit. #38228 #38265
- Group commit now supports the new optimizer. #37002
- Fix the issue where group commit reports data errors when JDBC setNull is used. #38262
- Optimize the retry logic for group commit when encountering
delete bitmap lock
errors. #37600 - Resolve the issue where routine load cannot use CSV delimiters and escape characters. #38402
- Fix the issue where routine load job names with mixed case cannot be displayed. #38523
- Optimize the logic for actively recovering routine load during FE master-slave switching. #37876
- Resolve the issue where routine load pauses when all data in Kafka is expired. #37288
- Fix the issue where
show routine load
returns empty results. #38199 - Resolve the memory leak issue during multi-table stream import in routine load. #38255
- Fix the issue where stream load does not return the error URL. #38325
- Resolve potential load channel leak issues. #38031 #37500
- Fix the issue where no error may be reported when importing fewer segments than expected. #36753
- Resolve the load stream leak issue. #38912
- Optimize the impact of offline nodes on import operations. #38198
- Fix the issue where transactions do not end when inserting into empty data. #38991
Storage
Backup and Restoration
- Fix the issue where tables cannot be written after backup and restoration. #37089
- Resolve the issue where view database names are incorrect after backup and restoration. #37412
Compaction
- Fix the issue where cumu compaction handles delete errors incorrectly during ordered data compression. #38742
- Resolve the issue of duplicate keys in aggregate tables caused by sequential compression optimization. #38224
- Fix the issue where compression operations cause coredump in large wide tables. #37960
- Resolve the compression starvation issue caused by inaccurate concurrent statistics of compression tasks. #37318
MOW Unique Key
- Resolve the issue of inconsistent data between replicas caused by cumulative compression deletion of delete sign. #37950
- MOW delete now uses partial column updates with the new optimizer. #38751
- Fix the potential duplicate key issue in MOW tables under compute-storage decoupled. #39018
- Resolve the issue where MOW unique and duplicate tables cannot modify column order. #37067
- Fix the potential data correctness issue caused by segcompaction. #37760
- Resolve the potential memory leak issue during column updates. #37706
Other
- Fix the small probability of exceptions in TOPN queries. #39119 #39199
- Resolve the issue where auto-increment IDs may duplicate during FE restart. #37306
- Fix the potential queuing issue in the delete operation priority queue. #37169
- Optimize the delete retry logic. #37363
- Resolve the issue with
bucket = 0
in table creation statements under the new optimizer. #38971 - Fix the issue where FE reports success incorrectly when image generation fails. #37508
- Resolve the issue where using the wrong nodename during FE offline nodes may cause inconsistent FE members. #37987
- Fix the issue where CCR partition addition may fail. #37295
- Resolve the
int32
overflow issue in inverted index files. #38891 - Fix the issue where TRUNCATE TABLE failure may cause BE to fail to go offline. #37334
- Resolve the issue where publish cannot continue due to null pointers. #37724 #37531
- Fix the potential coredump issue when manually triggering disk migration. #37712
Compute-Storage Decoupled
- Fixed the issue where
show create table
might display thefile_cache_ttl_seconds
attribute twice. #38052 - Fixed the issue where segment Footer TTL was not set correctly after setting file cache TTL. #37485
- Fixed the issue where file cache might cause coredump due to massive conversion of cache types. #38518
- Fixed the potential file descriptor (fd) leak in file cache. #38051
- Fixed the issue where schema change Job overwriting compaction Job prevented base tablet compaction from completing normally. #38210
- Fixed the potential inaccuracy of base compaction score due to data race. #38006
- Fixed the issue where error messages from imports might not be uploaded correctly to object storage. #38359
- Fixed the inconsistency in return information between compute-storage decoupled mode and storage and compute integration mode for 2PC imports. #38076
- Fix the issue where incorrect file size setting during file cache warm-up leads to coredump. #38939
- Fixed the issue where partial column updates did not correctly dequeue delete operations. #37151
- Fixed compatibility issues with permission persistence in compute-storage decoupled mode. #38136 #37708
- Fixed the issue where observer did not retry correctly when encountering a
-230
error. #37625 - Fixed the issue where
show load
with conditions did not perform correct analysis. #37656 - Fixed the issue where
show streamload
in compute-storage decoupled mode caused BE coredump. #37903 - Fixed the issue where
copy into
did not correctly verify column names in strict mode. #37650 - Fixed the issue where multi-stream imports into a single table lacked permissions. #38878
- Fixed the potential overflow issue in
getVersionUpdateTimeMs
. #38074 - Fixed the issue where FE azure blob list was not implemented correctly. #37986
- Fixed the issue where inaccurate azure blob recycling time calculation prevented recycling. #37535
- Fixed the issue where inverted index files were not deleted in compute-storage decoupled mode. #38306
Lakehouse
- Fixed the issue by reading binary data from Oracle Catalog. #37078
- Fixed the potential deadlock issue when acquiring external table metadata in multi-FE scenarios. #37756
- Fixed the issue where JNI scanner failure caused BE nodes to crash. #37697
- Fixed the issue with slow reading of date types from Trino Connector Catalog. #37266
- Optimized kerberos authentication logic for Hive Catalog. #37301
- Fixed the issue where region attributes might be parsed incorrectly when parsing MinIO properties. #37249
- Fixed the issue where creating too many FileSystems by FE caused memory leaks. #36954
- Fixed the issue by reading incorrect time zone information from Paimon. #37716
- Fixed the potential thread leak issue caused by Hive write-back operations. #36990
- Fixed the null pointer issue caused by enabling Hive metastore event synchronization. #38421
- Fixed the issue where error messages were unclear or caused stalling when creating catalogs. #37551
- Fixed the issue where reading Hive text format tables behaved differently from Hive. #37638
- Fixed the logic error when switching between catalogs and databases. #37828
MySQL Compatibility
- Fixed the issue where certain flags in the MySQL protocol were set incorrectly when SSL was enabled. #38086
Asynchronous Materialized Views
- Fixed the issue where construction might fail when the base table had a very large number of partitions. #37589
- Fixed the issue where nested materialized views incorrectly performed full table refreshes even when partition refreshes were possible. #38698
- Fixed the issue where partition refresh could not handle the simultaneous existence of valid and invalid dependencies when analyzing partition dependencies. #38367
- Fixed the issue where the final result containing NULL type might cause asynchronous materialized views to fail. #37019
- Fixed the planning error that might occur during transparent rewriting when both synchronous and asynchronous materialized views with the same name were present. #37311
Synchronous Materialized Views
- The rewritten synchronous materialized views now can correctly perform partition pruning. #38527
- When rewriting synchronous materialized views, those with unready data are no longer selected. #38148
Query Optimizer
- Fixed the deadlock issue that might occur when queries and delete operations are performed simultaneously. #38660
- Fixed the issue where bucket pruning might incorrectly prune on decimal column buckets. #37889
- Fixed the issue where planning might be incorrect when mark Join participates in Join reorder. #39152
- Fixed the issue where the result is incorrect when the correlation condition of a correlated subquery is not a simple column. #37644
- Fixed the issue where partition pruning cannot correctly handle or express. #38897
- Fixed the planning error that might occur when optimizing the execution order of JOIN and AGG. #37343
- Fixed the issue where
str_to_date
performs incorrect constant folding calculations on DATEV1 types. #37360 - Fixed the issue where the ACOS function's constant folding returns non-NaN values. #37932
- Fixed the occasional planning error: "The children format needs to be [WhenClause+, DefaultValue?]". #38491
- Fixed the issue where planning might be incorrect when the projection includes window functions and there is both the original column and its alias. #38166
- Fixed the issue where planning might report an error when the aggregation parameter contains a lambda expression. #37109
- Fixed the Insert error that might occur in extreme cases: "MultiCastDataSink cannot be cast to DataStreamSink". #38526
- Fixed the issue where the new optimizer does not correctly handle
char(0)/varchar(0)
when creating a table. #38427 - Fixed the incorrect behavior of
char(255) toSql
. #37340 - Fixed the issue where the nullable attribute within the
agg_state
type might lead to planning errors. #37489 - Fixed the issue where row count statistics are inaccurate during mark Join. #38270
Query Execution
- Fixed issues where the Pipeline execution engine was stuck, causing queries to not end, in multiple scenarios. #38657, #38206, #38885, #38151, #37297
- Fixed the coredump issue caused by null and non-null columns during set difference calculations. #38750
- Fixed the error when using the DECIMAL type with pure decimals in delete statements. #37801
- Fixed the issue where the
width_bucket
function returned incorrect results. #37892 - Fixed the query error when a single row of data was very large and the result set was also large (exceeding 2GB). #37990
- Fixed the coredump issue caused by incorrect release of rpc connections during single-replica imports. #38087
- Fixed the coredump issue caused by processing NULL values with the
foreach
function. #37349 - Fixed the issue where stddev returned incorrect results for DECIMALV2 types. #38731
- Fixed the slow performance of
bitmap union
calculations. #37816 - Fixed the issue where RowsProduced for aggregation operators was not set in the profile. #38271
- Fixed the overflow issue when calculating the number of buckets for the hash table under hash join. #37193, #37493
- Fixed the inaccurate recording of the
jemalloc cache memory tracker
. #37464 - Added the
enable_stacktrace
configuration option, allowing users to control whether exception stacks are output in BE logs. #37713 - Fixed the issue where Arrow Flight SQL did not work correctly when
enable_parallel_result_sink
was set to false. #37779 - Fixed the incorrect use of colocate Join. #37361, #37729
- Fixed the calculation overflow issue of the
round
function on DECIMAL128 types. #37733, #38106 - Fixed the coredump issue when passing a const string to the
sleep
function. #37681 - Increased the queue length for audit logs, solving the issue where audit logs could not be recorded normally under high concurrency scenarios with thousands of concurrent connections. #37786
- Fixed the issue where creating a workload group caused too many threads, leading to BE coredump. #38096
- Fixed the coredump issue caused by the
MULTI_MATCH_ANY
function. #37959 - Fixed the transaction rollback issue caused by
insert overwrite auto partition
. #38103 - Fixed the issue where the TimeUtils formatter did not use the correct time zone. #37465
- Fixed the issue where results were incorrect under constant folding scenarios for week/yearweek. #37376
- Fixed the issue where the
convert_tz
function returned incorrect results. #37358, #38764 - Fixed the coredump issue when using the
collect_set
function with window functions. #38234 - Fixed the coredump issue caused by
percentile_approx
during rolling upgrades. #39321 - Fixed the coredump issue caused by the
mod
function when encountering abnormal input. #37999 - Fixed the issue where the hash table was not fully built when the broadcast Join probe started running. #37643
- Fixed the issue where executing the same expression in multithreaded environments might lead to incorrect results for Java UDFs. #38612
- Fixed the overflow issue caused by incorrect return types of the
conv
function. #38001 - Fixed the issue where the
json_replace
function returned incorrect types. #3701 - Fixed the issue where the nullable attribute setting was unreasonable for the
percentile
aggregation function. #37330 - Fixed the issue where the results of the
histogram
function were unstable. #38608 - Fixed the issue where Task State was displayed incorrectly in the profile. #38082
- Fixed the issue where some queries were incorrectly canceled when the system just started. #37662
Semi-Structured Data Management
- Fix some issues with time series compression. #39170 #39176
- Fix the issue of incorrect index size statistics during compression. #37232
- Fix the potential incorrect matching of ultra-long strings without tokenization in inverted indexes. #37679 #38218
- Fix the high memory usage issue of
array_range
andarray_with_const
functions when dealing with large data volumes. #38284 #37495 - Fix the potential coredump issue when selecting columns of ARRAY / MAP / STRUCT types. #37936
- Fix the import failure issue caused by simdjson parsing errors when specifying jsonpath in Stream Load. #38490
- Fix the exception handling issue when there are duplicate keys in JSON data. #38146
- Fix the potential query error after DROP INDEX. #37646
- Fix the error return issue in row merging checks during index compression. #38732
- Inverted index v2 format now supports renaming columns. #38079
- Fix the coredump issue when the
MATCH
function matches an empty string without an index. #37947 - Fix the handling of NULL values in inverted indexes. #37921 #37842 #38741
- Fix the incorrect
row_store_page_size
after FE restart. #38240
Other
- Fix the timezone configuration issue. The default timezone is no longer fixed at UTC+8 and is now obtained from system configuration. #37294
- Fix the class conflict issue when using ranger due to multiple JSR specification implementations. #37575
- Fix the potential uninitialized field issue in some BE code. #37403
- Fix the error in delete statements for random distributed tables. #37985
- Fix the incorrect requirement for
alter_priv
permission on the base table when creating a synchronized materialized view. #38011 - Fix the issue of not authenticating resources when used in TVF. #36928