Skip to content

Conversation

keanji-x
Copy link
Contributor

Proposed changes

These optimizations allow the findValidItems method to correctly handle circular dependencies while maintaining the required output slots. The code is now more efficient and ensures that the necessary edges and items are preserved during the traversal process.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@keanji-x
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40295 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit cf2b6f76901f4549ede7c50d16a8383023782cbc, data reload: false

------ Round 1 ----------------------------------
q1	17616	4394	4281	4281
q2	2007	185	188	185
q3	10450	1120	1041	1041
q4	10180	864	781	781
q5	7717	2659	2684	2659
q6	217	130	132	130
q7	943	600	616	600
q8	9231	2066	2051	2051
q9	8901	6540	6488	6488
q10	8923	3720	3749	3720
q11	476	241	243	241
q12	457	221	231	221
q13	18220	3010	3003	3003
q14	280	222	229	222
q15	523	491	484	484
q16	491	391	374	374
q17	950	700	698	698
q18	8355	7876	7872	7872
q19	8371	1417	1333	1333
q20	657	326	308	308
q21	4875	3245	3962	3245
q22	430	374	358	358
Total cold run time: 120270 ms
Total hot run time: 40295 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5270	4239	4286	4239
q2	397	272	267	267
q3	3136	2944	2862	2862
q4	1990	1743	1658	1658
q5	5525	5418	5535	5418
q6	222	130	129	129
q7	2238	1850	1887	1850
q8	3268	3419	3388	3388
q9	8775	8720	8786	8720
q10	3885	3772	3845	3772
q11	586	508	516	508
q12	810	663	648	648
q13	16204	3222	3129	3129
q14	295	283	295	283
q15	512	496	465	465
q16	489	424	430	424
q17	1790	1520	1490	1490
q18	8148	7903	7654	7654
q19	1847	1762	1584	1584
q20	3188	1873	1840	1840
q21	8013	4617	4924	4617
q22	793	566	555	555
Total cold run time: 77381 ms
Total hot run time: 55500 ms

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 26, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-DS: Total hot run time: 174048 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit cf2b6f76901f4549ede7c50d16a8383023782cbc, data reload: false

query1	919	401	366	366
query2	6350	2397	2396	2396
query3	6623	201	208	201
query4	19173	17342	17279	17279
query5	3584	466	455	455
query6	238	159	152	152
query7	4583	309	288	288
query8	311	282	273	273
query9	8566	2472	2444	2444
query10	557	321	274	274
query11	10419	10052	9968	9968
query12	117	82	85	82
query13	1635	377	372	372
query14	8881	7957	6960	6960
query15	239	189	193	189
query16	7843	278	270	270
query17	1881	571	536	536
query18	2002	285	282	282
query19	201	150	159	150
query20	92	85	82	82
query21	216	136	123	123
query22	4679	4079	4041	4041
query23	33869	33501	33791	33501
query24	10904	2852	2911	2852
query25	676	401	394	394
query26	1387	162	161	161
query27	3094	323	329	323
query28	7720	2159	2148	2148
query29	962	636	637	636
query30	260	153	161	153
query31	1037	746	767	746
query32	100	55	58	55
query33	770	320	308	308
query34	1098	502	486	486
query35	781	647	679	647
query36	1146	985	964	964
query37	170	72	79	72
query38	2977	2824	2835	2824
query39	933	803	796	796
query40	220	128	124	124
query41	55	47	49	47
query42	123	107	103	103
query43	610	555	547	547
query44	1213	739	735	735
query45	211	165	165	165
query46	1083	712	698	698
query47	1862	1817	1783	1783
query48	372	298	301	298
query49	862	411	419	411
query50	777	393	399	393
query51	6887	6761	6752	6752
query52	110	92	95	92
query53	360	289	293	289
query54	915	456	450	450
query55	75	76	73	73
query56	301	354	260	260
query57	1160	1072	1026	1026
query58	240	248	233	233
query59	3417	3289	3206	3206
query60	300	268	276	268
query61	89	107	90	90
query62	583	447	441	441
query63	315	284	284	284
query64	8890	2247	1746	1746
query65	3144	3122	3239	3122
query66	1069	324	326	324
query67	15705	14880	14984	14880
query68	8543	547	530	530
query69	724	454	386	386
query70	1418	1164	1177	1164
query71	526	275	277	275
query72	8965	5337	5708	5337
query73	2223	329	370	329
query74	5845	5458	5510	5458
query75	5269	2649	2659	2649
query76	5141	1041	923	923
query77	755	311	315	311
query78	10492	9962	9912	9912
query79	8851	513	526	513
query80	1028	476	460	460
query81	554	217	221	217
query82	628	105	100	100
query83	344	167	164	164
query84	275	88	85	85
query85	1012	284	269	269
query86	352	300	334	300
query87	3351	3129	3098	3098
query88	4621	2464	2446	2446
query89	524	374	382	374
query90	2075	189	183	183
query91	129	97	100	97
query92	59	49	52	49
query93	6232	502	507	502
query94	1362	189	185	185
query95	405	315	318	315
query96	599	271	271	271
query97	3224	3015	3046	3015
query98	206	198	193	193
query99	1221	869	849	849
Total cold run time: 294894 ms
Total hot run time: 174048 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.45 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit cf2b6f76901f4549ede7c50d16a8383023782cbc, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.68	0.07	0.07
query5	0.50	0.48	0.49
query6	1.13	0.73	0.72
query7	0.01	0.01	0.02
query8	0.05	0.05	0.04
query9	0.54	0.48	0.49
query10	0.54	0.54	0.54
query11	0.15	0.12	0.11
query12	0.15	0.11	0.11
query13	0.59	0.58	0.58
query14	0.76	0.77	0.82
query15	0.84	0.81	0.80
query16	0.35	0.36	0.38
query17	1.03	0.96	0.99
query18	0.21	0.25	0.24
query19	1.79	1.74	1.71
query20	0.01	0.01	0.01
query21	15.43	0.66	0.64
query22	4.04	7.51	1.79
query23	18.42	1.38	1.33
query24	2.09	0.23	0.22
query25	0.15	0.10	0.09
query26	0.26	0.18	0.18
query27	0.08	0.08	0.09
query28	13.31	1.02	1.00
query29	12.58	3.28	3.32
query30	0.27	0.06	0.07
query31	2.86	0.39	0.39
query32	3.27	0.48	0.46
query33	2.90	2.92	2.96
query34	17.12	4.44	4.42
query35	4.53	4.44	4.50
query36	0.64	0.48	0.46
query37	0.19	0.16	0.15
query38	0.16	0.14	0.15
query39	0.05	0.03	0.03
query40	0.17	0.14	0.14
query41	0.09	0.04	0.05
query42	0.05	0.04	0.04
query43	0.04	0.04	0.04
Total cold run time: 109.37 s
Total hot run time: 30.45 s

@keanji-x keanji-x merged commit 015f051 into apache:master Jun 26, 2024
dataroaring pushed a commit that referenced this pull request Jun 28, 2024
…ndencies (#36839)

## Proposed changes

These optimizations allow the findValidItems method to correctly handle
circular dependencies while maintaining the required output slots. The
code is now more efficient and ensures that the necessary edges and
items are preserved during the traversal process.
morrySnow pushed a commit that referenced this pull request Jul 5, 2024
…eliminate fail (#36888)

this depends on 
#36839
#36886

Such as low level materialized view contains 5 group by dimension, and
query also has 5 group by dimension, they are equals.In this scene,
would not add aggregate on mv when try to rewrite query by materialized
view.
But if query only use 4 group by dimension and the remain demension is
can be eliminated, then the query will change to 4 group by dimension.
this will cause add aggregate on mv and will cause high level
materialize rewrite fail later.

Solution:
in aggregate rewrite by materialized view, we try to eliminate mv group
by dimension by query used dimension. if eliminate successfully. then
high level will rewrite continue.


such as 
low level mv def sql is as following:

    def join_mv_1 = """
        select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1,
        sum(o_totalprice) as sum_total, 
        max(o_totalprice) as max_total, 
        min(o_totalprice) as min_total, 
        count(*) as count_all, 
        bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) cnt_1, 
        bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as cnt_2 
        from lineitem_1
        inner join orders_1
        on lineitem_1.l_orderkey = orders_1.o_orderkey
        where lineitem_1.l_shipdate >= "2023-10-17"
        group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey
        """
    def join_mv_2 = """
        select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey,
        t.agg1 as agg1, 
        t.sum_total as agg3,
        t.max_total as agg4,
        t.min_total as agg5,
        t.count_all as agg6,
        cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2
        from ${mv_1} as t
        inner join partsupp_1
        on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey
        where partsupp_1.ps_suppkey > 1
        group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, agg5, agg6
        """

high level mv def sql is as following:

 def join_mv_3 = """
        select t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, t2.agg1, >t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6 
        from ${mv_2} as t1
        left join ${mv_2} as t2
        on t1.l_orderkey = t2.l_orderkey
        where t1.l_orderkey > 1
        group by t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, >t2.agg1, >t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6
        """

if we run the query as following, it can hit the mv3

select t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, t2.agg1, >t1.agg2, >t2.agg3, t1.agg4, t2.agg5, t1.agg6 
        from (
            select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, 
            t.agg1 as agg1, 
            t.sum_total as agg3,
            t.max_total as agg4,
            t.min_total as agg5,
            t.count_all as agg6,
            cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2
            from (
                select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * >IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1,
                sum(o_totalprice) as sum_total, 
                max(o_totalprice) as max_total, 
                min(o_totalprice) as min_total, 
                count(*) as count_all, 
                bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) >cnt_1, 
                bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as >cnt_2 
                from lineitem_1
                inner join orders_1
                on lineitem_1.l_orderkey = orders_1.o_orderkey
                where lineitem_1.l_shipdate >= "2023-10-17"
                group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey
            ) as t
            inner join partsupp_1
            on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey
            where partsupp_1.ps_suppkey > 1
            group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, >agg5, >agg6
        ) as t1
        left join (
            select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, 
            t.agg1 as agg1, 
            t.sum_total as agg3,
            t.max_total as agg4,
            t.min_total as agg5,
            t.count_all as agg6,
            cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2
            from (
                select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * >IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1,
                sum(o_totalprice) as sum_total, 
                max(o_totalprice) as max_total, 
                min(o_totalprice) as min_total, 
                count(*) as count_all, 
                bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) >cnt_1, 
                bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as >cnt_2 
                from lineitem_1
                inner join orders_1
                on lineitem_1.l_orderkey = orders_1.o_orderkey
                where lineitem_1.l_shipdate >= "2023-10-17"
                group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey
            ) as t
            inner join partsupp_1
            on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey
            where partsupp_1.ps_suppkey > 1
            group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, agg5, >agg6
        ) as t2
        on t1.l_orderkey = t2.l_orderkey
        where t1.l_orderkey > 1
        group by t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, >t2.agg1, >t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6

---------

Co-authored-by: xiejiann <[email protected]>
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
…eliminate fail (#36888)

this depends on 
#36839
#36886

Such as low level materialized view contains 5 group by dimension, and
query also has 5 group by dimension, they are equals.In this scene,
would not add aggregate on mv when try to rewrite query by materialized
view.
But if query only use 4 group by dimension and the remain demension is
can be eliminated, then the query will change to 4 group by dimension.
this will cause add aggregate on mv and will cause high level
materialize rewrite fail later.

Solution:
in aggregate rewrite by materialized view, we try to eliminate mv group
by dimension by query used dimension. if eliminate successfully. then
high level will rewrite continue.


such as 
low level mv def sql is as following:

    def join_mv_1 = """
        select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1,
        sum(o_totalprice) as sum_total, 
        max(o_totalprice) as max_total, 
        min(o_totalprice) as min_total, 
        count(*) as count_all, 
        bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) cnt_1, 
        bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as cnt_2 
        from lineitem_1
        inner join orders_1
        on lineitem_1.l_orderkey = orders_1.o_orderkey
        where lineitem_1.l_shipdate >= "2023-10-17"
        group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey
        """
    def join_mv_2 = """
        select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey,
        t.agg1 as agg1, 
        t.sum_total as agg3,
        t.max_total as agg4,
        t.min_total as agg5,
        t.count_all as agg6,
        cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2
        from ${mv_1} as t
        inner join partsupp_1
        on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey
        where partsupp_1.ps_suppkey > 1
        group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, agg5, agg6
        """

high level mv def sql is as following:

 def join_mv_3 = """
        select t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, t2.agg1, >t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6 
        from ${mv_2} as t1
        left join ${mv_2} as t2
        on t1.l_orderkey = t2.l_orderkey
        where t1.l_orderkey > 1
        group by t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, >t2.agg1, >t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6
        """

if we run the query as following, it can hit the mv3

select t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, t2.agg1, >t1.agg2, >t2.agg3, t1.agg4, t2.agg5, t1.agg6 
        from (
            select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, 
            t.agg1 as agg1, 
            t.sum_total as agg3,
            t.max_total as agg4,
            t.min_total as agg5,
            t.count_all as agg6,
            cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2
            from (
                select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * >IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1,
                sum(o_totalprice) as sum_total, 
                max(o_totalprice) as max_total, 
                min(o_totalprice) as min_total, 
                count(*) as count_all, 
                bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) >cnt_1, 
                bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as >cnt_2 
                from lineitem_1
                inner join orders_1
                on lineitem_1.l_orderkey = orders_1.o_orderkey
                where lineitem_1.l_shipdate >= "2023-10-17"
                group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey
            ) as t
            inner join partsupp_1
            on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey
            where partsupp_1.ps_suppkey > 1
            group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, >agg5, >agg6
        ) as t1
        left join (
            select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, 
            t.agg1 as agg1, 
            t.sum_total as agg3,
            t.max_total as agg4,
            t.min_total as agg5,
            t.count_all as agg6,
            cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2
            from (
                select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * >IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1,
                sum(o_totalprice) as sum_total, 
                max(o_totalprice) as max_total, 
                min(o_totalprice) as min_total, 
                count(*) as count_all, 
                bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) >cnt_1, 
                bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as >cnt_2 
                from lineitem_1
                inner join orders_1
                on lineitem_1.l_orderkey = orders_1.o_orderkey
                where lineitem_1.l_shipdate >= "2023-10-17"
                group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey
            ) as t
            inner join partsupp_1
            on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey
            where partsupp_1.ps_suppkey > 1
            group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, agg5, >agg6
        ) as t2
        on t1.l_orderkey = t2.l_orderkey
        where t1.l_orderkey > 1
        group by t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, >t2.agg1, >t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6

---------

Co-authored-by: xiejiann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.0-merged not-merge/2.1 reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants