Skip to content

Conversation

suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Oct 22, 2024

Proposed changes

OpenCSVSerde Properties:

Property Description Default Value Supported in Doris
separatorChar Defines the character used to separate fields (columns) in a CSV file. , Yes
quoteChar Defines the character used to quote fields that contain special characters, like the separator. " Yes
escapeChar Specifies the escape character used for escaping special characters, including quotes and delimiters. " Yes

Explanation:

  • separatorChar: This property defines the character that separates columns in the CSV file. Typically, a comma (,) is used as the default separator.
  • quoteChar: This character is used to enclose fields that contain special characters (like the separator). For example, if a field contains a comma, it is enclosed in quotes (").
  • escapeChar: Specifies the character used to escape special characters, such as quotes or the separator. In many cases, a backslash (\\) is used as the escape character.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41344 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2368408d5fb53dad6722122620878f57ff9dcf3d, data reload: false

------ Round 1 ----------------------------------
q1	17780	7536	7341	7341
q2	2062	164	151	151
q3	10665	1140	1140	1140
q4	10268	822	843	822
q5	7696	3053	3002	3002
q6	242	153	153	153
q7	984	609	614	609
q8	9347	1943	1919	1919
q9	6588	6443	6418	6418
q10	7057	2388	2426	2388
q11	454	248	252	248
q12	405	222	228	222
q13	17783	3023	3003	3003
q14	249	216	209	209
q15	577	518	512	512
q16	637	599	600	599
q17	970	548	578	548
q18	7179	6782	6692	6692
q19	1349	919	935	919
q20	493	189	184	184
q21	3927	3287	3249	3249
q22	1078	1024	1016	1016
Total cold run time: 107790 ms
Total hot run time: 41344 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7460	7205	7220	7205
q2	332	241	226	226
q3	2934	2753	2801	2753
q4	1942	1733	1752	1733
q5	5458	5516	5480	5480
q6	229	143	143	143
q7	2157	1710	1741	1710
q8	3254	3411	3409	3409
q9	8581	8557	8544	8544
q10	3538	3472	3433	3433
q11	570	498	487	487
q12	796	587	588	587
q13	11295	3006	3014	3006
q14	282	260	262	260
q15	562	516	508	508
q16	672	645	638	638
q17	1805	1604	1549	1549
q18	7781	7506	7584	7506
q19	1665	1462	1432	1432
q20	2044	1817	1819	1817
q21	5327	5263	5054	5054
q22	1138	1046	1016	1016
Total cold run time: 69822 ms
Total hot run time: 58496 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191462 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2368408d5fb53dad6722122620878f57ff9dcf3d, data reload: false

query1	980	369	376	369
query2	6528	2101	2030	2030
query3	6791	220	229	220
query4	33855	23491	23497	23491
query5	4326	463	478	463
query6	253	164	164	164
query7	4604	284	278	278
query8	278	246	241	241
query9	9751	2712	2710	2710
query10	465	267	276	267
query11	18013	15502	15297	15297
query12	154	108	99	99
query13	1662	440	397	397
query14	10173	7409	7087	7087
query15	313	176	178	176
query16	8014	462	471	462
query17	1739	563	575	563
query18	2005	311	299	299
query19	370	150	149	149
query20	120	106	108	106
query21	215	106	103	103
query22	4669	4166	4208	4166
query23	34978	34258	35402	34258
query24	11190	2765	2780	2765
query25	572	396	411	396
query26	961	162	160	160
query27	2750	283	296	283
query28	7964	2424	2441	2424
query29	683	432	423	423
query30	322	160	147	147
query31	1052	817	847	817
query32	101	57	58	57
query33	775	293	296	293
query34	954	505	508	505
query35	899	743	756	743
query36	1092	939	932	932
query37	147	88	88	88
query38	4151	3972	4077	3972
query39	1499	1453	1429	1429
query40	282	102	102	102
query41	49	48	48	48
query42	127	102	102	102
query43	536	488	487	487
query44	1282	810	810	810
query45	198	167	165	165
query46	1116	690	694	690
query47	1931	1835	1838	1835
query48	420	323	317	317
query49	1174	440	425	425
query50	834	380	393	380
query51	7129	6918	6915	6915
query52	103	87	89	87
query53	258	181	183	181
query54	1246	440	431	431
query55	82	77	80	77
query56	281	269	295	269
query57	1318	1163	1168	1163
query58	264	244	242	242
query59	3245	3065	2702	2702
query60	283	266	257	257
query61	101	99	104	99
query62	887	687	684	684
query63	219	184	184	184
query64	5200	642	615	615
query65	3300	3241	3210	3210
query66	1265	310	301	301
query67	16001	15898	15680	15680
query68	4937	568	540	540
query69	447	313	281	281
query70	1208	1130	1103	1103
query71	336	268	289	268
query72	6170	4140	3975	3975
query73	777	355	358	355
query74	10325	9071	9025	9025
query75	3445	2720	2669	2669
query76	2969	921	899	899
query77	437	294	293	293
query78	10570	9600	9572	9572
query79	2727	589	602	589
query80	1274	447	438	438
query81	564	244	242	242
query82	797	138	139	138
query83	229	134	136	134
query84	258	69	67	67
query85	1565	285	290	285
query86	448	300	298	298
query87	4449	4311	4299	4299
query88	3547	2190	2157	2157
query89	395	287	283	283
query90	2113	190	187	187
query91	139	102	103	102
query92	72	48	47	47
query93	1501	532	535	532
query94	1152	311	272	272
query95	360	246	243	243
query96	616	286	282	282
query97	3268	3119	3123	3119
query98	216	205	199	199
query99	1745	1323	1282	1282
Total cold run time: 303555 ms
Total hot run time: 191462 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.48 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2368408d5fb53dad6722122620878f57ff9dcf3d, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.03
query3	0.23	0.06	0.06
query4	1.64	0.10	0.10
query5	0.51	0.50	0.49
query6	1.13	0.73	0.73
query7	0.02	0.01	0.01
query8	0.06	0.03	0.03
query9	0.56	0.50	0.51
query10	0.54	0.56	0.53
query11	0.15	0.11	0.11
query12	0.14	0.11	0.11
query13	0.62	0.59	0.59
query14	2.71	2.84	2.71
query15	0.92	0.84	0.84
query16	0.37	0.37	0.40
query17	1.05	1.08	1.05
query18	0.20	0.20	0.19
query19	1.95	1.76	2.05
query20	0.02	0.01	0.00
query21	15.37	0.58	0.60
query22	2.40	1.78	2.42
query23	17.13	1.04	0.79
query24	3.25	1.60	0.94
query25	0.22	0.11	0.12
query26	0.47	0.14	0.14
query27	0.04	0.04	0.04
query28	10.23	1.10	1.08
query29	12.56	3.23	3.20
query30	0.24	0.06	0.06
query31	2.87	0.38	0.38
query32	3.28	0.46	0.45
query33	3.01	3.00	3.05
query34	17.15	4.45	4.48
query35	4.54	4.51	4.45
query36	0.66	0.49	0.50
query37	0.08	0.06	0.06
query38	0.05	0.03	0.03
query39	0.03	0.02	0.02
query40	0.15	0.12	0.12
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.03	0.03	0.04
Total cold run time: 106.8 s
Total hot run time: 32.48 s

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41015 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2368408d5fb53dad6722122620878f57ff9dcf3d, data reload: false

------ Round 1 ----------------------------------
q1	17594	8410	7283	7283
q2	2044	154	175	154
q3	10558	1114	1150	1114
q4	10234	788	855	788
q5	7782	3071	3052	3052
q6	237	152	145	145
q7	1004	602	600	600
q8	9356	1886	1949	1886
q9	6543	6431	6437	6431
q10	7030	2401	2465	2401
q11	442	238	251	238
q12	409	218	219	218
q13	17765	2995	3026	2995
q14	263	206	214	206
q15	578	528	509	509
q16	637	594	585	585
q17	972	505	575	505
q18	7329	6755	6833	6755
q19	1357	860	1020	860
q20	497	189	192	189
q21	4051	3287	3094	3094
q22	1111	1007	1020	1007
Total cold run time: 107793 ms
Total hot run time: 41015 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7266	7239	7230	7230
q2	322	233	227	227
q3	2934	2756	2754	2754
q4	1953	1675	1760	1675
q5	5465	5505	5512	5505
q6	226	140	143	140
q7	2114	1726	1696	1696
q8	3222	3438	3401	3401
q9	8545	8565	8571	8565
q10	3485	3446	3425	3425
q11	596	485	482	482
q12	782	610	573	573
q13	8478	3006	2988	2988
q14	309	267	277	267
q15	567	525	512	512
q16	677	635	630	630
q17	1809	1564	1596	1564
q18	7922	7458	7459	7458
q19	1663	1447	1510	1447
q20	2082	1805	1811	1805
q21	5364	5283	5273	5273
q22	1122	1027	1009	1009
Total cold run time: 66903 ms
Total hot run time: 58626 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191816 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2368408d5fb53dad6722122620878f57ff9dcf3d, data reload: false

query1	983	366	369	366
query2	6522	2091	2048	2048
query3	6790	218	227	218
query4	33871	23642	23696	23642
query5	4412	464	452	452
query6	263	169	165	165
query7	4602	297	290	290
query8	290	237	230	230
query9	9587	2625	2638	2625
query10	484	271	289	271
query11	18033	15212	15325	15212
query12	157	101	100	100
query13	1685	411	408	408
query14	10446	7220	6696	6696
query15	264	167	175	167
query16	7755	476	470	470
query17	1630	569	582	569
query18	1417	296	317	296
query19	334	155	172	155
query20	120	108	109	108
query21	219	108	106	106
query22	4610	4541	4384	4384
query23	34955	34371	34203	34203
query24	11114	2747	2799	2747
query25	656	414	410	410
query26	1363	160	155	155
query27	2811	287	289	287
query28	8091	2394	2412	2394
query29	861	453	423	423
query30	316	158	157	157
query31	1037	803	806	803
query32	99	62	58	58
query33	776	299	300	299
query34	955	497	531	497
query35	929	756	757	756
query36	1100	942	950	942
query37	150	92	87	87
query38	4023	3921	3906	3906
query39	1486	1438	1444	1438
query40	278	101	101	101
query41	51	50	48	48
query42	124	99	95	95
query43	529	476	468	468
query44	1299	798	799	798
query45	194	164	165	164
query46	1151	697	701	697
query47	1946	1864	1893	1864
query48	419	327	338	327
query49	1187	428	457	428
query50	827	383	392	383
query51	7201	7041	6965	6965
query52	97	92	97	92
query53	254	178	179	178
query54	1254	452	429	429
query55	80	79	81	79
query56	282	270	267	267
query57	1320	1183	1162	1162
query58	286	284	238	238
query59	3148	3051	3135	3051
query60	279	266	260	260
query61	103	104	102	102
query62	885	665	693	665
query63	215	184	183	183
query64	5329	632	602	602
query65	3268	3236	3215	3215
query66	1285	293	307	293
query67	16092	16085	15915	15915
query68	4757	543	546	543
query69	452	280	296	280
query70	1195	1122	1122	1122
query71	333	272	277	272
query72	6322	3977	4069	3977
query73	779	371	366	366
query74	9986	8988	9071	8988
query75	3371	2708	2692	2692
query76	2839	919	903	903
query77	430	291	291	291
query78	10536	9761	9558	9558
query79	2260	596	607	596
query80	1917	463	443	443
query81	564	241	241	241
query82	1009	138	133	133
query83	272	136	135	135
query84	271	69	73	69
query85	1766	287	295	287
query86	490	284	299	284
query87	4380	4311	4342	4311
query88	4072	2196	2164	2164
query89	395	293	284	284
query90	2053	180	187	180
query91	132	100	99	99
query92	72	47	48	47
query93	2071	548	536	536
query94	1049	288	285	285
query95	349	241	249	241
query96	611	281	286	281
query97	3262	3109	3136	3109
query98	210	200	195	195
query99	1551	1339	1295	1295
Total cold run time: 304335 ms
Total hot run time: 191816 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.09 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2368408d5fb53dad6722122620878f57ff9dcf3d, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.04
query3	0.23	0.06	0.06
query4	1.64	0.11	0.10
query5	0.53	0.52	0.51
query6	1.13	0.73	0.72
query7	0.02	0.01	0.01
query8	0.04	0.03	0.03
query9	0.55	0.51	0.51
query10	0.55	0.55	0.56
query11	0.14	0.10	0.11
query12	0.13	0.11	0.11
query13	0.61	0.59	0.58
query14	2.75	2.73	2.73
query15	0.89	0.82	0.83
query16	0.39	0.38	0.39
query17	1.08	1.03	1.07
query18	0.24	0.23	0.22
query19	1.99	1.85	2.02
query20	0.01	0.01	0.02
query21	15.37	0.58	0.57
query22	2.73	1.66	2.49
query23	16.86	1.02	0.86
query24	3.32	1.39	1.35
query25	0.37	0.14	0.05
query26	0.36	0.14	0.13
query27	0.04	0.03	0.04
query28	9.97	1.09	1.07
query29	12.57	3.30	3.26
query30	0.25	0.07	0.06
query31	2.87	0.40	0.39
query32	3.26	0.46	0.45
query33	2.94	3.03	3.03
query34	16.95	4.50	4.48
query35	4.46	4.53	4.51
query36	0.66	0.49	0.49
query37	0.08	0.07	0.07
query38	0.04	0.04	0.04
query39	0.03	0.02	0.02
query40	0.16	0.13	0.12
query41	0.08	0.02	0.03
query42	0.04	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 106.46 s
Total hot run time: 33.09 s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 24, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@CalvinKirs
Copy link
Member

run p0

@CalvinKirs CalvinKirs merged commit 5f10b21 into apache:master Oct 24, 2024
26 of 28 checks passed
morningman pushed a commit to morningman/doris that referenced this pull request Oct 30, 2024
…42257)

## Proposed changes
OpenCSVSerde Properties:
| **Property** | **Description** | **Default Value** | **Supported in
Doris** |

|---------------------------------------|---------------------------------------------------------------------------------------------------|-------------------|--------------------------|
| `separatorChar` | Defines the character used to separate fields
(columns) in a CSV file. | `,` | Yes |
| `quoteChar` | Defines the character used to quote fields that contain
special characters, like the separator. | `"` | Yes |
| `escapeChar` | Specifies the escape character used for escaping
special characters, including quotes and delimiters. | `"` | Yes |

### Explanation:
- **`separatorChar`**: This property defines the character that
separates columns in the CSV file. Typically, a comma (`,`) is used as
the default separator.
- **`quoteChar`**: This character is used to enclose fields that contain
special characters (like the separator). For example, if a field
contains a comma, it is enclosed in quotes (`"`).
- **`escapeChar`**: Specifies the character used to escape special
characters, such as quotes or the separator. In many cases, a backslash
(`\\`) is used as the escape character.
morningman pushed a commit to morningman/doris that referenced this pull request Oct 30, 2024
…42257)

## Proposed changes
OpenCSVSerde Properties:
| **Property** | **Description** | **Default Value** | **Supported in
Doris** |

|---------------------------------------|---------------------------------------------------------------------------------------------------|-------------------|--------------------------|
| `separatorChar` | Defines the character used to separate fields
(columns) in a CSV file. | `,` | Yes |
| `quoteChar` | Defines the character used to quote fields that contain
special characters, like the separator. | `"` | Yes |
| `escapeChar` | Specifies the escape character used for escaping
special characters, including quotes and delimiters. | `"` | Yes |

### Explanation:
- **`separatorChar`**: This property defines the character that
separates columns in the CSV file. Typically, a comma (`,`) is used as
the default separator.
- **`quoteChar`**: This character is used to enclose fields that contain
special characters (like the separator). For example, if a field
contains a comma, it is enclosed in quotes (`"`).
- **`escapeChar`**: Specifies the character used to escape special
characters, such as quotes or the separator. In many cases, a backslash
(`\\`) is used as the escape character.
morningman added a commit that referenced this pull request Oct 31, 2024
morningman added a commit that referenced this pull request Oct 31, 2024
@gavinchou gavinchou mentioned this pull request Nov 26, 2024
@suxiaogang223 suxiaogang223 deleted the support_open_csv_serde branch December 12, 2024 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.7-merged dev/3.0.3-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants