Skip to content

Release Note 3.0.1 #39570

@gavinchou

Description

@gavinchou

Behavior Changes

Query Optimizer

  • Added the variable use_max_length_of_varchar_in_ctas to control the length behavior of VARCHAR type when executing CREATE TABLE AS SELECT (CTAS) operations. #37069
    • This variable is set to true by default.
    • When set to true, if the VARCHAR type column originates from a table, the derived length is used; otherwise, the maximum length is used.
    • When set to false, the VARCHAR type will always use the derived length.
  • All data types will now be displayed in lowercase to maintain compatibility with MySQL format. #38012
  • Multiple query statements in the same query request must now be separated by semicolons. #38670

Query Execution

  • The default number of parallel tasks after shuffle operations in the cluster is set to 100, which will improve query stability and concurrent processing capability in large clusters. #38196

Storage

  • The default value of trash_file_expire_time_sec has been changed from 86400 seconds to 0 seconds, which means that if files are deleted by mistake and the FE trash is cleared, the data cannot be recovered.
  • The table attribute enable_mow_delete_on_delete_predicate (introduced in version 3.0.0) has been renamed to enable_mow_light_delete.
  • Explicit transactions are now prohibited from performing delete operations on tables with written data.
  • Heavy schema change operations are prohibited on tables with auto-increment fields.

New Features

Job Scheduling

  • Optimized the execution logic of internal scheduling jobs, decoupling the strong association between start time and immediate execution parameters. Now, tasks can be created with a specified start time or selected for immediate execution, without conflict, enhancing scheduling flexibility. #36805

Compute-Storage Decoupled

  • Supports dynamic modification of the upper limit for file cache usage. #37484
  • Recycler now supports object storage rate limiting and server-side rate limiting retry functionality. #37663 #37680

Lakehouse

  • Added the session variable serde_dialect to set the output format for complex types. #37039
  • SQL interception now supports external tables.
  • Insert overwrite now supports Iceberg tables. #37191

Asynchronous Materialized Views

  • Supports partition roll-up and build at the hourly level. #37678
  • Supports atomic replacement of asynchronous materialized view definition statements. #36749
  • Transparent rewriting now supports Insert statements. #38115
  • Transparent rewriting now supports the VARIANT type. #37929

Query Execution

  • The group concat function now supports DISTINCT and ORDER BY options. #38744

Semi-Structured Data Management

  • The ES Catalog now maps nested or object types in Elasticsearch to the JSON type in Doris. #37101
  • Added the MULTI_MATCH function, which supports matching keywords across multiple fields and can leverage inverted indexes to accelerate searches. #37722
  • Added the explode_json_object function, which can unfold objects in JSON data into multiple rows. #36887
  • Inverted indexes now support memtable advancement, requiring index construction only once during multi-replica writes, reducing CPU consumption and improving performance. #35891
  • Added MATCH_PHRASE support for positive slop, e.g., msg MATCH_PHRASE 'a b 2+' can match instances containing words a and b with a slop of no more than two, and a preceding b; regular slop without the final + does not guarantee this order. #36356

Other

  • Added the FE parameter skip_audit_user_list, where user operations specified in this configuration will not be recorded in the audit log. #38310
    • For more information, refer to the documentation on Audit Plugin.

Improvements

Storage

  • Reduced the likelihood of write failures caused by disk balancing within a single BE. #38000
  • Decreased memory consumption by the memtable limiter. #37511
  • Moved old partitions to the FE trash during partition replacement operations. #36361
  • Optimized memory consumption during compaction. #37099
  • Added a session variable to control audit logs for JDBC PreparedStatement, with default setting to not print. #38419
  • Optimized the logic for selecting BEs for group commits. #35558
  • Improved the performance of column updates. #38487
  • Optimized the use of delete bitmap cache. #38761
  • Added a configuration to control query affinity during hot and cold tiering. #37492

Compute-Storage Decoupled

  • Implemented automatic retries when encountering object storage server rate limiting. #37199
  • Adapted the number of threads for memtable flush in the compute-storage decoupled mode. #38789
  • Added Azure as a compile option to support compilation in environments without Azure support.
  • Optimized the observability of object storage access rate limiting. #38294
  • Allowed the file cache TTL queue to perform LRU eviction, enhancing TTL queue usability. #37312
  • Optimized the number of balance writeeditlog IO operations in the storage and compute separation mode. #37787
  • Improved table creation speed in the storage and compute separation mode by sending tablet creation requests in batches. #36786
  • Optimized read failures caused by potential inconsistencies in the local file cache through backoff retries. #38645

Lakehouse

  • Optimized memory statistics for Parquet/ORC format read and write operations. #37234
  • Trino Connector Catalog now supports predicate pushdown. #37874
  • Added a session variable enable_count_push_down_for_external_table to control whether to enable count(*) pushdown optimization for external tables. #37046
  • Optimized the read logic for Hudi snapshot reads, returning an empty set when the snapshot is empty, consistent with Spark behavior. #37702
  • Improved the read performance of partition columns for Hive tables. #37377

Asynchronous Materialized Views

  • Improved transparent rewrite plan speed by 20%. #37197
  • Eliminated roll-up during transparent rewrite if the group key satisfies data uniqueness for better nested matching. #38387
  • Transparent rewrite now performs better aggregation elimination to improve the matching success rate of nested materialized views. #36888

MySQL Compatibility

  • Now correctly populates the database name, table name, and original name in the MySQL protocol result columns. #38126
  • Supported the hint format /*+ func(value) */. #37720

Query Optimizer

  • Significantly improved the plan speed for complex queries. #38317
  • Adaptively chose whether to perform bucket shuffle based on the number of data buckets to avoid performance degradation in extreme cases. #36784
  • Optimized the cost estimation logic for SEMI / ANTI JOIN. #37951 #37060
  • Supported pushing Limit down to the first stage of aggregation to improve performance. #34853
  • Partition pruning now supports filter conditions containing the date_trunc or date function. #38025 #38743
  • SQL cache now supports query scenarios that include user variables. #37915
  • Optimized error messages for invalid aggregation semantics. #38122

Query Execution

  • Adapted AggState compatibility from 2.1 to 3.x and fixed Coredump issues. #37104
  • Refactored the strategy selection for local shuffle without Join. #37282
  • Modified the scanner for internal table queries to be asynchronous to prevent stalling during such queries. #38403
  • Optimized the block merge process during Hash table construction for Join operators. #37471
  • Optimized the duration of lock holding for MultiCast. #37462
  • Optimized gRPC keepAliveTime and added link monitoring to reduce the probability of query failure due to RPC errors. #37304
  • Cleaned up all dirty pages in jemalloc when memory limits were exceeded. #37164
  • Optimized the processing performance of aes_encrypt/decrypt functions for constant types. #37194
  • Optimized the processing performance of the json_extract function for constant data. #36927
  • Optimized the processing performance of the ParseUrl function for constant data. #36882

Semi-Structured Data Management

  • Bitmap indexes now default to using inverted indexes, with enable_create_bitmap_index_as_inverted_index set to true by default. #36692
  • In the compute-storage decoupled mode, DESC can now view sub-columns of VARIANT type. #38143
  • Removed the step of checking file existence during inverted index queries to reduce access latency to remote storage. #36945
  • Complex types ARRAY / MAP / STRUCT now support replace_if_not_null for AGG tables. #38304
  • Escape characters for JSON data are now supported. #37176 #37251
  • Inverted index queries now behave consistently on MOW tables and DUP tables. #37428
  • Optimized the performance of inverted index acceleration for IN queries. #37395
  • Reduced unnecessary memory allocation during TOPN queries to improve performance. #37429
  • When creating an inverted index with tokenization, the support_phrase option is now automatically enabled to accelerate match_phrase series phrase queries. #37949

Other

  • Audit logs can now record SQL types. #37790
  • Added support for information_schema.processlist to show all FE. #38701
  • Cached ranger's atamask and rowpolicy to accelerate query efficiency. #37723
  • Optimized metadata management in job manager to release locks immediately after modifying metadata, reducing lock holding time. #38162

Bug Fixes

Upgrade

  • Fix the issue where mtmv load fails during upgrade from version 2.1. #38799
  • Resolve the issue where null_type cannot be found during the upgrade to version 2.1. #39373
  • Address the compatibility issue with permission persistence during the upgrade from version 2.1 to 3.0. #39288

Load

  • Fix the issue where parsing fails when the newline character is surrounded by delimiters in CSV format parsing. #38347
  • Resolve potential exception issues when FE forwards group commit. #38228 #38265
  • Group commit now supports the new optimizer. #37002
  • Fix the issue where group commit reports data errors when JDBC setNull is used. #38262
  • Optimize the retry logic for group commit when encountering delete bitmap lock errors. #37600
  • Resolve the issue where routine load cannot use CSV delimiters and escape characters. #38402
  • Fix the issue where routine load job names with mixed case cannot be displayed. #38523
  • Optimize the logic for actively recovering routine load during FE master-slave switching. #37876
  • Resolve the issue where routine load pauses when all data in Kafka is expired. #37288
  • Fix the issue where show routine load returns empty results. #38199
  • Resolve the memory leak issue during multi-table stream import in routine load. #38255
  • Fix the issue where stream load does not return the error URL. #38325
  • Resolve potential load channel leak issues. #38031 #37500
  • Fix the issue where no error may be reported when importing fewer segments than expected. #36753
  • Resolve the load stream leak issue. #38912
  • Optimize the impact of offline nodes on import operations. #38198
  • Fix the issue where transactions do not end when inserting into empty data. #38991

Storage

Backup and Restoration

  • Fix the issue where tables cannot be written after backup and restoration. #37089
  • Resolve the issue where view database names are incorrect after backup and restoration. #37412

Compaction

  • Fix the issue where cumu compaction handles delete errors incorrectly during ordered data compression. #38742
  • Resolve the issue of duplicate keys in aggregate tables caused by sequential compression optimization. #38224
  • Fix the issue where compression operations cause coredump in large wide tables. #37960
  • Resolve the compression starvation issue caused by inaccurate concurrent statistics of compression tasks. #37318

MOW Unique Key

  • Resolve the issue of inconsistent data between replicas caused by cumulative compression deletion of delete sign. #37950
  • MOW delete now uses partial column updates with the new optimizer. #38751
  • Fix the potential duplicate key issue in MOW tables under compute-storage decoupled. #39018
  • Resolve the issue where MOW unique and duplicate tables cannot modify column order. #37067
  • Fix the potential data correctness issue caused by segcompaction. #37760
  • Resolve the potential memory leak issue during column updates. #37706

Other

  • Fix the small probability of exceptions in TOPN queries. #39119 #39199
  • Resolve the issue where auto-increment IDs may duplicate during FE restart. #37306
  • Fix the potential queuing issue in the delete operation priority queue. #37169
  • Optimize the delete retry logic. #37363
  • Resolve the issue with bucket = 0 in table creation statements under the new optimizer. #38971
  • Fix the issue where FE reports success incorrectly when image generation fails. #37508
  • Resolve the issue where using the wrong nodename during FE offline nodes may cause inconsistent FE members. #37987
  • Fix the issue where CCR partition addition may fail. #37295
  • Resolve the int32 overflow issue in inverted index files. #38891
  • Fix the issue where TRUNCATE TABLE failure may cause BE to fail to go offline. #37334
  • Resolve the issue where publish cannot continue due to null pointers. #37724 #37531
  • Fix the potential coredump issue when manually triggering disk migration. #37712

Compute-Storage Decoupled

  • Fixed the issue where show create table might display the file_cache_ttl_seconds attribute twice. #38052
  • Fixed the issue where segment Footer TTL was not set correctly after setting file cache TTL. #37485
  • Fixed the issue where file cache might cause coredump due to massive conversion of cache types. #38518
  • Fixed the potential file descriptor (fd) leak in file cache. #38051
  • Fixed the issue where schema change Job overwriting compaction Job prevented base tablet compaction from completing normally. #38210
  • Fixed the potential inaccuracy of base compaction score due to data race. #38006
  • Fixed the issue where error messages from imports might not be uploaded correctly to object storage. #38359
  • Fixed the inconsistency in return information between compute-storage decoupled mode and storage and compute integration mode for 2PC imports. #38076
  • Fix the issue where incorrect file size setting during file cache warm-up leads to coredump. #38939
  • Fixed the issue where partial column updates did not correctly dequeue delete operations. #37151
  • Fixed compatibility issues with permission persistence in compute-storage decoupled mode. #38136 #37708
  • Fixed the issue where observer did not retry correctly when encountering a -230 error. #37625
  • Fixed the issue where show load with conditions did not perform correct analysis. #37656
  • Fixed the issue where show streamload in compute-storage decoupled mode caused BE coredump. #37903
  • Fixed the issue where copy into did not correctly verify column names in strict mode. #37650
  • Fixed the issue where multi-stream imports into a single table lacked permissions. #38878
  • Fixed the potential overflow issue in getVersionUpdateTimeMs. #38074
  • Fixed the issue where FE azure blob list was not implemented correctly. #37986
  • Fixed the issue where inaccurate azure blob recycling time calculation prevented recycling. #37535
  • Fixed the issue where inverted index files were not deleted in compute-storage decoupled mode. #38306

Lakehouse

  • Fixed the issue by reading binary data from Oracle Catalog. #37078
  • Fixed the potential deadlock issue when acquiring external table metadata in multi-FE scenarios. #37756
  • Fixed the issue where JNI scanner failure caused BE nodes to crash. #37697
  • Fixed the issue with slow reading of date types from Trino Connector Catalog. #37266
  • Optimized kerberos authentication logic for Hive Catalog. #37301
  • Fixed the issue where region attributes might be parsed incorrectly when parsing MinIO properties. #37249
  • Fixed the issue where creating too many FileSystems by FE caused memory leaks. #36954
  • Fixed the issue by reading incorrect time zone information from Paimon. #37716
  • Fixed the potential thread leak issue caused by Hive write-back operations. #36990
  • Fixed the null pointer issue caused by enabling Hive metastore event synchronization. #38421
  • Fixed the issue where error messages were unclear or caused stalling when creating catalogs. #37551
  • Fixed the issue where reading Hive text format tables behaved differently from Hive. #37638
  • Fixed the logic error when switching between catalogs and databases. #37828

MySQL Compatibility

  • Fixed the issue where certain flags in the MySQL protocol were set incorrectly when SSL was enabled. #38086

Asynchronous Materialized Views

  • Fixed the issue where construction might fail when the base table had a very large number of partitions. #37589
  • Fixed the issue where nested materialized views incorrectly performed full table refreshes even when partition refreshes were possible. #38698
  • Fixed the issue where partition refresh could not handle the simultaneous existence of valid and invalid dependencies when analyzing partition dependencies. #38367
  • Fixed the issue where the final result containing NULL type might cause asynchronous materialized views to fail. #37019
  • Fixed the planning error that might occur during transparent rewriting when both synchronous and asynchronous materialized views with the same name were present. #37311

Synchronous Materialized Views

  • The rewritten synchronous materialized views now can correctly perform partition pruning. #38527
  • When rewriting synchronous materialized views, those with unready data are no longer selected. #38148

Query Optimizer

  • Fixed the deadlock issue that might occur when queries and delete operations are performed simultaneously. #38660
  • Fixed the issue where bucket pruning might incorrectly prune on decimal column buckets. #37889
  • Fixed the issue where planning might be incorrect when mark Join participates in Join reorder. #39152
  • Fixed the issue where the result is incorrect when the correlation condition of a correlated subquery is not a simple column. #37644
  • Fixed the issue where partition pruning cannot correctly handle or express. #38897
  • Fixed the planning error that might occur when optimizing the execution order of JOIN and AGG. #37343
  • Fixed the issue where str_to_date performs incorrect constant folding calculations on DATEV1 types. #37360
  • Fixed the issue where the ACOS function's constant folding returns non-NaN values. #37932
  • Fixed the occasional planning error: "The children format needs to be [WhenClause+, DefaultValue?]". #38491
  • Fixed the issue where planning might be incorrect when the projection includes window functions and there is both the original column and its alias. #38166
  • Fixed the issue where planning might report an error when the aggregation parameter contains a lambda expression. #37109
  • Fixed the Insert error that might occur in extreme cases: "MultiCastDataSink cannot be cast to DataStreamSink". #38526
  • Fixed the issue where the new optimizer does not correctly handle char(0)/varchar(0) when creating a table. #38427
  • Fixed the incorrect behavior of char(255) toSql. #37340
  • Fixed the issue where the nullable attribute within the agg_state type might lead to planning errors. #37489
  • Fixed the issue where row count statistics are inaccurate during mark Join. #38270

Query Execution

  • Fixed issues where the Pipeline execution engine was stuck, causing queries to not end, in multiple scenarios. #38657, #38206, #38885, #38151, #37297
  • Fixed the coredump issue caused by null and non-null columns during set difference calculations. #38750
  • Fixed the error when using the DECIMAL type with pure decimals in delete statements. #37801
  • Fixed the issue where the width_bucket function returned incorrect results. #37892
  • Fixed the query error when a single row of data was very large and the result set was also large (exceeding 2GB). #37990
  • Fixed the coredump issue caused by incorrect release of rpc connections during single-replica imports. #38087
  • Fixed the coredump issue caused by processing NULL values with the foreach function. #37349
  • Fixed the issue where stddev returned incorrect results for DECIMALV2 types. #38731
  • Fixed the slow performance of bitmap union calculations. #37816
  • Fixed the issue where RowsProduced for aggregation operators was not set in the profile. #38271
  • Fixed the overflow issue when calculating the number of buckets for the hash table under hash join. #37193, #37493
  • Fixed the inaccurate recording of the jemalloc cache memory tracker. #37464
  • Added the enable_stacktrace configuration option, allowing users to control whether exception stacks are output in BE logs. #37713
  • Fixed the issue where Arrow Flight SQL did not work correctly when enable_parallel_result_sink was set to false. #37779
  • Fixed the incorrect use of colocate Join. #37361, #37729
  • Fixed the calculation overflow issue of the round function on DECIMAL128 types. #37733, #38106
  • Fixed the coredump issue when passing a const string to the sleep function. #37681
  • Increased the queue length for audit logs, solving the issue where audit logs could not be recorded normally under high concurrency scenarios with thousands of concurrent connections. #37786
  • Fixed the issue where creating a workload group caused too many threads, leading to BE coredump. #38096
  • Fixed the coredump issue caused by the MULTI_MATCH_ANY function. #37959
  • Fixed the transaction rollback issue caused by insert overwrite auto partition. #38103
  • Fixed the issue where the TimeUtils formatter did not use the correct time zone. #37465
  • Fixed the issue where results were incorrect under constant folding scenarios for week/yearweek. #37376
  • Fixed the issue where the convert_tz function returned incorrect results. #37358, #38764
  • Fixed the coredump issue when using the collect_set function with window functions. #38234
  • Fixed the coredump issue caused by percentile_approx during rolling upgrades. #39321
  • Fixed the coredump issue caused by the mod function when encountering abnormal input. #37999
  • Fixed the issue where the hash table was not fully built when the broadcast Join probe started running. #37643
  • Fixed the issue where executing the same expression in multithreaded environments might lead to incorrect results for Java UDFs. #38612
  • Fixed the overflow issue caused by incorrect return types of the conv function. #38001
  • Fixed the issue where the json_replace function returned incorrect types. #3701
  • Fixed the issue where the nullable attribute setting was unreasonable for the percentile aggregation function. #37330
  • Fixed the issue where the results of the histogram function were unstable. #38608
  • Fixed the issue where Task State was displayed incorrectly in the profile. #38082
  • Fixed the issue where some queries were incorrectly canceled when the system just started. #37662

Semi-Structured Data Management

  • Fix some issues with time series compression. #39170 #39176
  • Fix the issue of incorrect index size statistics during compression. #37232
  • Fix the potential incorrect matching of ultra-long strings without tokenization in inverted indexes. #37679 #38218
  • Fix the high memory usage issue of array_range and array_with_const functions when dealing with large data volumes. #38284 #37495
  • Fix the potential coredump issue when selecting columns of ARRAY / MAP / STRUCT types. #37936
  • Fix the import failure issue caused by simdjson parsing errors when specifying jsonpath in Stream Load. #38490
  • Fix the exception handling issue when there are duplicate keys in JSON data. #38146
  • Fix the potential query error after DROP INDEX. #37646
  • Fix the error return issue in row merging checks during index compression. #38732
  • Inverted index v2 format now supports renaming columns. #38079
  • Fix the coredump issue when the MATCH function matches an empty string without an index. #37947
  • Fix the handling of NULL values in inverted indexes. #37921 #37842 #38741
  • Fix the incorrect row_store_page_size after FE restart. #38240

Other

  • Fix the timezone configuration issue. The default timezone is no longer fixed at UTC+8 and is now obtained from system configuration. #37294
  • Fix the class conflict issue when using ranger due to multiple JSR specification implementations. #37575
  • Fix the potential uninitialized field issue in some BE code. #37403
  • Fix the error in delete statements for random distributed tables. #37985
  • Fix the incorrect requirement for alter_priv permission on the base table when creating a synchronized materialized view. #38011
  • Fix the issue of not authenticating resources when used in TVF. #36928

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions