Skip to content

Conversation

hawkyre
Copy link

@hawkyre hawkyre commented Jul 11, 2025

As commented in #258, this is another alternative to not send parameters through the URL that should also work. It's based on the form example in https://clickhouse.com/docs/interfaces/http. I've only allowed multipart requests if the body is a binary and the option multipart_request is set to true for inserts and selects, The others don't need to be sent as multipart requests since all the resources are already in the body

@hawkyre
Copy link
Author

hawkyre commented Jul 11, 2025

Should we raise if someone attempts to binary insert or stream with multipart: true?

@ruslandoga
Copy link
Contributor

ruslandoga commented Jul 14, 2025

👋

Thank you! I'll make sure to review it this week.

My first thought is that if multipart form works, we should maybe just default to it :)

@hawkyre
Copy link
Author

hawkyre commented Jul 15, 2025

That sounds great, but it does come at the cost of a larger request since the multipart body includes a bunch of boundaries and extra content. Not sure how impactful that would be but I believe we might benefit from making the behaviour configurable in some way, maybe through the config instead of parameters. If need be, I can take a deeper look into adapting the inserts and streams into multipart reqs too.

@ruslandoga ruslandoga mentioned this pull request Jul 23, 2025
@ruslandoga
Copy link
Contributor

ruslandoga commented Jul 23, 2025

👋 @hawkyre

Sorry for the delay!

Some preliminary notes:

  1. It appears that when we send a request with multipart/form-data, ClickHouse processes the settings and parameters from the multipart body and ignores the settings in the URL's query string, so we would need to move settings from the query string to the body as well.

  2. I think we can skip adding a new dependency, my take: multipart #269

  3. If we go with Multipart dependency, we need to make sure we pass binaries in there since it fails on iodata

  4. How should we handle raw binaries in parameters? Like %{raw: <<1, 2, 3>>}. Previously everything was percent encoded by URI.encode_query/1 but now it's probably on us. Right now tests in multipart #269 seem to be passing without any extra encoding.

    Copilot says it's OK to keep values as raw binaries

    Your current approach of keeping binaries as-is is correct for the values in a multipart/form-data body. Unlike URL encoding, you don't need to percent-encode the values. The multipart format is binary-safe.

    The Content-Disposition header for each part handles the naming, and the boundary separates the parts. The server will read the content of each part up to the next boundary.

    For text-based data, especially with Unicode, it's good practice to specify the charset in the Content-Type header of the part if possible, for example: Content-Type: text/plain; charset=UTF-8. However, for ClickHouse parameters, it expects UTF-8 strings by default, so your current implementation should work fine as long as your source strings are UTF-8 encoded. Your encode_param function seems to handle various data types correctly by converting them to their binary/string representation.


  5. I am not sure if we need to support INSERT multipart requests since it doesn't usually have many query string parameters (just settings), but it might be nice for consistency

@ruslandoga
Copy link
Contributor

ruslandoga commented Jul 23, 2025

Re config option, right now I'm leaning towards making it the default

  • it solves the proxy issue automatically, no need for people to dig for a special setting once they hit the URL length limit error (it's probably a bad experience which I didn't realize people were having)
  • we only have to build, test, and document one implementation, (e.g. for optional multipart: true it would probably take longer to discover bugs in its implementation compared to when everyone uses it, not just the "proxy" users)
  • performance seems OK, the slowest bit is :crypto.strong_rand_bytes/1 to generate the boundary (which is just a couple microseconds, meaning everything else in that encoding process is even faster), the payload size overhead is probably (?) negligible

@hawkyre
Copy link
Author

hawkyre commented Jul 29, 2025

@ruslandoga Let's do it with your custom multipart impl, it looks great. Did you manage to move the settings into the multipart in your version as well? It seems like that's the only thing pending. I agree on only parsing select statements into the multipart for now and making it the default, those are the only ones causing any problems.

@hawkyre
Copy link
Author

hawkyre commented Jul 29, 2025

@ruslandoga Settings seem to be working great, I added your custom implementation to this PR with some refactoring to make things cleaner. Give it a look!

@luis-serrano-l
Copy link

Hi, @ruslandoga. I would love to see this merged 😉
Thanks for your help!

@ruslandoga
Copy link
Contributor

ruslandoga commented Aug 11, 2025

👋 @luis-serrano-l

Sorry for the delay! I'm vacationing this week and will revisit this next Monday.

I think we still need to at least add settings support. This Ecto+Ch test fails with

  1) test LowCardinality insert_all (Ch.TypeTest)
     test/ch/type_test.exs:703
     ** (Ch.Error) Code: 455. DB::Exception: Creating columns of type LowCardinality(Date) is prohibited by default due to expected negative impact on performance. It can be enabled with the `allow_suspicious_low_cardinality_types` setting. (SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY) (version 25.6.4.12 (official build))

when using this PR's branch:

defp deps do
    [
      {:ch, github: "hawkyre/ch", branch: "send-params-in-multipart-form"},
      # ...

probably because right now the settings are going into the query string params (I haven't checked the new implementation, but something like this was the problem in #259 (comment)).

@luis-serrano-l
Copy link

Thanks for your answer @ruslandoga. Then, it's a good idea to revisit it from Monday on.
Enjoy your vacation!

@hawkyre
Copy link
Author

hawkyre commented Aug 18, 2025

I've looked into this a bit more. Sending the settings as parameters while the multipart body contains the query works, but I want to see if it's possible to also encode the settings in the body.

@hawkyre
Copy link
Author

hawkyre commented Aug 18, 2025

Doing this in the multipart request pipeline also seems to work but it feels much cleaner to pass them as parameters

      |> then(
        &Enum.reduce(settings, &1, fn {key, value}, acc ->
          add_multipart_part(acc, to_string(key), to_string(value), enc_boundary)
        end)
      )

@ruslandoga
Copy link
Contributor

ruslandoga commented Aug 19, 2025

Hm, you are right. Settings do get read from the query string while the multipart body contains the query.

# returns `async_insert	Bool	1`
curl -X POST \
  -F 'query=show settings like \'async_insert\'' \
  'http://localhost:8123/?async_insert=1'

But not for this particular case.

# works (old way)
curl -X POST \
  -d 'create table low1(i8 LowCardinality(Int8)) order by tuple()' \
  'http://localhost:8123/?allow_suspicious_low_cardinality_types=1'
  
# fails
curl -X POST \
  -F 'query=create table low2(i8 LowCardinality(Int8)) order by tuple()' \
  'http://localhost:8123/?allow_suspicious_low_cardinality_types=1'
  
# also fails (surprise)
# https://github.com/ClickHouse/ClickHouse/issues/85847
curl -X POST \
  -F 'query=create table low2(i8 LowCardinality(Int8)) order by tuple()' \
  -F 'allow_suspicious_low_cardinality_types=1' \
  'http://localhost:8123/'

Maybe we can limit multipart forms only to SELECT queries. Or queries with params.

@hawkyre
Copy link
Author

hawkyre commented Aug 19, 2025

The issue gets even weirder with select queries.

This doesn't work (it returns 5 rows):

curl -X POST \
  -F 'query=SELECT number, toString(number) FROM system.numbers LIMIT 5' \
  -F 'max_result_rows=3' \
  'http://localhost:8123/'

But this does (query fails):

curl -X POST \
  -F 'query=SELECT number, toString(number) FROM system.numbers LIMIT 5' \
  'http://localhost:8123/?max_result_rows=3'

And even within the create query, the setting seems to be detected and parsed but is then ignored, because this fails saying the setting doesn't exist:

curl -X POST \
  -F 'query=create table low2(i8 LowCardinality(Int8)) order by tuple()' \
  'http://localhost:8123/?allow_suspicious_low_cardinality_type=1'

When passed through multiform it is just straight up ignored:

# Same error as before, not a missing setting error
curl -X POST \
  -F 'query=create table low2(i8 LowCardinality(Int8)) order by tuple()' \
  -F 'allow_suspicious_low_cardinality_type=1' \
  'http://localhost:8123/'

dependabot bot and others added 4 commits September 8, 2025 08:56
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v5)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@hawkyre
Copy link
Author

hawkyre commented Sep 8, 2025

@ruslandoga hey, it seems they fixed this 5 days ago: ClickHouse/ClickHouse#85570

But I'm running the tests in local with the latest-alpine and head-alpine and it's returning JSON numbers as strings, but it seems like unrelated/intended behaviour if we look at the docs: https://clickhouse.com/docs/sql-reference/data-types/newjson

Should we address this somehow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants