Skip to content

Conversation

zzzxl1993
Copy link
Contributor

@zzzxl1993 zzzxl1993 commented Nov 11, 2024

What problem does this PR solve?

Problem Summary:

  1. The function approx_top_sum has been implemented. Here is an example of its usage: select approx_top_sum(c1, c2, c3, 10, 300) from tbl.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions


#pragma once

#include <rapidjson/encodings.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'rapidjson/encodings.h' file not found [clang-diagnostic-error]

#include <rapidjson/encodings.h>
         ^

@zzzxl1993
Copy link
Contributor Author

run buildall

@zzzxl1993
Copy link
Contributor Author

run buildall

@zzzxl1993
Copy link
Contributor Author

run buildall

Comment on lines -91 to -92
qt_sql """ select approx_top_k(clientip) from ${tableName}; """
qt_sql """ select approx_top_k(clientip, 10) from ${tableName}; """
Copy link
Contributor

@superdiaodiao superdiaodiao Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask why deleting these tests in a feature PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, it is difficult to implement default values for the variable arguments in the function framework. It is hoped that users will directly specify the last two parameters.

@zzzxl1993
Copy link
Contributor Author

run buildall

@@ -43,7 +43,7 @@ class IDataType;

struct AggregateFunctionAttr {
bool enable_decimal256 {false};
std::vector<std::pair<std::string, bool>> column_infos;
std::vector<std::string> column_names;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this basic data structure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize the redundant data structures added in approx_top_k.

sub_writer.Key(_column_names[i].data(), _column_names[i].size());
sub_writer.String(row_str.data(), row_str.size());
}
sub_writer.Key("count");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -55,9 +55,19 @@ public ApproxTopK(boolean distinct, boolean alwaysNullable, Expression... varArg

@Override
public void checkLegalityBeforeTypeCoercion() {
if (arity() < 1) {
if (arity() < 3) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change arguments num from 1 to 3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing function framework code cannot implement approx_top_k and approx_top_sum with default values.

@zzzxl1993
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.96% (9904/26090)
Line Coverage: 29.16% (82764/283794)
Region Coverage: 28.29% (42514/150280)
Branch Coverage: 24.86% (21556/86694)
Coverage Report: http://coverage.selectdb-in.cc/coverage/cf48a1bd8a8007fe40606b7a4d907a182403e85c_cf48a1bd8a8007fe40606b7a4d907a182403e85c/report/index.html

@zzzxl1993
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.96% (9903/26090)
Line Coverage: 29.16% (82762/283795)
Region Coverage: 28.29% (42519/150280)
Branch Coverage: 24.86% (21554/86694)
Coverage Report: http://coverage.selectdb-in.cc/coverage/adf71e19982471db714eaf0aa221f815962c102d_adf71e19982471db714eaf0aa221f815962c102d/report/index.html

Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 17, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@qidaye qidaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qidaye qidaye merged commit 13ced20 into apache:master Nov 17, 2024
25 of 28 checks passed
zzzxl1993 added a commit to zzzxl1993/doris that referenced this pull request Nov 17, 2024
…3643)

Problem Summary:

1. The function approx_top_sum has been implemented. Here is an example
of its usage: select approx_top_sum(c1, c2, c3, 10, 300) from tbl.

### Release note

Add new function `approx_top_sum`.
zzzxl1993 added a commit to zzzxl1993/doris that referenced this pull request Nov 17, 2024
…3643)

Problem Summary:

1. The function approx_top_sum has been implemented. Here is an example
of its usage: select approx_top_sum(c1, c2, c3, 10, 300) from tbl.

Add new function `approx_top_sum`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants