Vulkan: add conv_transpose_2d operation #16022

relent95 · 2025-09-16T07:29:10Z

This PR adds conv_transpose_2d operation to Vulkan backend. The code are based on the implementation of the existing conv_2d operation. The shader supports strides (s0, s1), paddings (p0, p1) and dilations (d0, d1). But in ggml_vk_conv_transpose_2d(), they are constrained as s1 = s0, p0 = p1 = 0, d0 = d1 = 1, because of the existing GGML_OP_CONV_TRANSPOSE_2D interface.

…95/llama.cpp into add-ggml-vulkan-conv-transpose-2d

jeffbolznv · 2025-09-16T09:27:20Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

@@ -1117,6 +1119,56 @@ template <> void init_pushconst_fastdiv(vk_op_conv2d_push_constants &p) {
    init_fastdiv_values(p.OW*p.OH,  p.OWOHmp,  p.OWOHL);
 }

+struct vk_op_conv_transpose_2d_push_constants {


This uses more than 128B, which is the minimum required push constant size. 256B is pretty widely supported (and required in newer versions of vulkan). If you can't trim this down then there should be a check in supports_op.

Thanks for the review. In the commit f5ae689 below, I added checking the size limit at the pipeline creation time(ggml_vk_load_shaders()) and checking the pipeline pointer at the ggml_backend_vk_device_supports_op(). Also refactored shader code by merging conv_transpose_2d_mm.comp into conv2d_mm.comp, as suggessed by 0cc4m.

This is a bit different than how the rest of the code works - usually we put the detailed check in supports_op and then load_shaders doesn't need to check for support. It probably works OK this way, but I'll defer to @0cc4m to decide whether this should be changed.

Replaced checking pointer as checking push constants limit explicitly in the recent commit. I preserved the same check in load_shaders, because existing code(such as pipeline_multi_add) also check device limit in there via flags. It may be better to introduce a new flag in vk_device_struct. But I have no idea where to put it there(order of struct fields). What do you think?

0cc4m · 2025-09-16T09:44:58Z

If this is based on the existing conv_2d shader, shouldn't it be possible to use the existing .comp file and only change the parts that are actually different with a preprocessor variable?

…m.comp for conv_transpose_2d operation

etasnadi · 2025-09-18T08:30:46Z

If this is based on the existing conv_2d shader, shouldn't it be possible to use the existing .comp file and only change the parts that are actually different with a preprocessor variable?

You are right, if there will be different shaders with code repeats for each conv variant (transpose, gradient, channel-wise), the maintenance will be really though.

Also, there are still many ways left to optimize the conv kernel and I also have a few updates in my private repo that I might polish in the near future and submit.

0cc4m · 2025-09-18T08:35:26Z

@etasnadi Yeah, the PR has already been updated to do that. You could also do a review if you want, you know the shader better than me. I'll mostly make sure that the C++ code is fine and the shader passes on Intel, AMD and Nvidia.

etasnadi · 2025-09-18T08:54:54Z

@etasnadi Yeah, the PR has already been updated to do that. You could also do a review if you want, you know the shader better than me. I'll mostly make sure that the C++ code is fine and the shader passes on Intel, AMD and Nvidia.

Yes, I've just realized that it was updated since.

Now my problem is that the kernel is and will be too complicated (it was really complicated before too), so we might need to introduce some abstraction I guess. Maybe @jeffbolznv has some ideas.

What do you think about adding support for HLSL shaders? As far as I know thae glslangValidatior already has basic support but in the meantime we could use dxc.

etasnadi · 2025-09-18T09:02:50Z

ggml/src/ggml-vulkan/vulkan-shaders/conv2d_mm.comp

@@ -225,10 +229,16 @@ void main() {
            uint32_t B_ly    = r_offset + Ar;
            uint32_t B_lx    = Ac;
            uint32_t K_idx   = B_idx_K * BS_K + B_ly; /* Global K_idx (row index of A)*/
-            uint32_t knl_idx = min(KW_idx_a + KH_idx_a * p.nb01 + Cin_idx_a * p.nb02 + K_idx * p.nb03, K * CRS - 1);


The index calculation is moved into the else, is it indended? Some of such optimizations backfired on a few GPUs #15056

Thanks for the review. It was a premature optimization. I reverted the order of the index calculation and the bound check.

0cc4m · 2025-09-18T09:08:02Z

What do you think about adding support for HLSL shaders? As far as I know thae glslangValidatior already has basic support but in the meantime we could use dxc.

What's the advantage of HLSL over GLSL? I'm not familiar with it, and not a fan of Microsoft dependencies. It would probably make maintenance harder. If you want to look into a shader language with more modern features, wouldn't slang be more interesting and more open?

Personally I'm hoping one of the projects looking into a C++-based compute shader syntax (similar to CUDA and ROCm) pans out. For now GLSL is good enough for me.

etasnadi · 2025-09-18T09:08:27Z

ggml/src/ggml-vulkan/vulkan-shaders/conv2d_mm.comp

            uint32_t H_idx = OH_idx * p.s1 + KH_idx_b * p.d1 - p.p1;
            uint32_t W_idx = OW_idx * p.s0 + KW_idx_b * p.d0 - p.p0;
-            uint32_t src_idx =


The same here, it can influence the perf, so if it is not intended, it is better to do the calculation in the same way as it was before.

Also reverted the order as the above change.

etasnadi · 2025-09-18T09:23:42Z

What do you think about adding support for HLSL shaders? As far as I know thae glslangValidatior already has basic support but in the meantime we could use dxc.

What's the advantage of HLSL over GLSL? I'm not familiar with it, and not a fan of Microsoft dependencies. It would probably make maintenance harder. If you want to look into a shader language with more modern features, wouldn't slang be more interesting and more open?

Personally I'm hoping one of the projects looking into a C++-based compute shader syntax (similar to CUDA and ROCm) pans out. For now GLSL is good enough for me.

For example, it seems that it supports templates: https://devblogs.microsoft.com/directx/announcing-hlsl-2021/#template-functions-and-data-types - and templates alone would help a lot. I am not a fan of adding dependencies to projects governed by a single company either, but this kernel will be unmaintainable in the future at this abstraction level and GLSL have limited features to deal with the problem.

Sglang is also a good idea, however I do not know how much it is adopted and if it is mature enough. For example I tried to use coopmats with slang without success ~1 year ago - I guess their compiler does not support all extensions automatically?

etasnadi · 2025-09-18T10:53:12Z

What do you think about adding support for HLSL shaders? As far as I know thae glslangValidatior already has basic support but in the meantime we could use dxc.

What's the advantage of HLSL over GLSL? I'm not familiar with it, and not a fan of Microsoft dependencies. It would probably make maintenance harder. If you want to look into a shader language with more modern features, wouldn't slang be more interesting and more open?

Personally I'm hoping one of the projects looking into a C++-based compute shader syntax (similar to CUDA and ROCm) pans out. For now GLSL is good enough for me.

Now I see that they don't have coopmat support but it's WIP. shader-slang/slang#7634 So I believe it is a good idea to add support for slang in the near future! Also, there are several NV suffixed accounts contributing to the project so I guess Nvidia has a bet on the project.

0cc4m · 2025-09-18T11:15:21Z

@etasnadi I think it's Khronos, not Nvidia specifically, but yeah. Coopmat should already be there, see shader-slang/slang#7170 (comment), but let's not sidetrack this PR. If you want to look into it, go ahead. If discussion is needed, please open an issue about it.

jeffbolznv · 2025-09-18T14:53:50Z

ggml/src/ggml-vulkan/vulkan-shaders/conv2d_mm.comp

+#endif
+            float val;
+            if (CRS_idx_b >= CRS || NPQ_idx >= NPQ
+                || int32_t(H_idx) < 0 || H_idx >= p.H || int32_t(W_idx) < 0 || W_idx >= p.W


The < 0 checks aren't necessary.

Right. I missed that. What would you suggest between removing the < 0 expressions and preserving the old code(such as H_idx < 0)?

jeffbolznv · 2025-09-18T14:55:53Z

HLSL doesn't support spec constants which IMO is a deal breaker. It also only has coopmat1 level of support for use in Vulkan. slang supports coopmat2, spec constants, and generics, and there are cases where generics would be helpful.

…conv_2d shader

…v_transpose_2d operation.

relent95 added 2 commits September 16, 2025 15:58

Vulkan: add conv_transpose_2d operation

9d1b723

Merge branch 'ggml-org:master' into add-ggml-vulkan-conv-transpose-2d

4272243

relent95 requested a review from 0cc4m as a code owner September 16, 2025 07:29

relent95 added 3 commits September 16, 2025 16:39

Vulkan: fix typo in conv_transpose_2d shader(s0mp, s0L, s1mp, s1L)

5c888ca

Merge branch 'add-ggml-vulkan-conv-transpose-2d' of github.com:relent…

b02a5dd

…95/llama.cpp into add-ggml-vulkan-conv-transpose-2d

Vulkan: fix incorrect indentation in conv_transpose_2d shader

e029cda

github-actions bot added testing Everything test related Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Sep 16, 2025

jeffbolznv reviewed Sep 16, 2025

View reviewed changes

Vulkan: add checking the push constants size limit and reuse conv2d_m…

f5ae689

…m.comp for conv_transpose_2d operation

relent95 requested a review from jeffbolznv September 17, 2025 04:33

etasnadi reviewed Sep 18, 2025

View reviewed changes

jeffbolznv reviewed Sep 18, 2025

View reviewed changes

Vulkan: revert the order of the index calculation and bound check in …

12aaeae

…conv_2d shader

relent95 requested review from etasnadi and jeffbolznv September 19, 2025 02:08

Vulkan: explicity check push constants limit in supports_op() for con…

7e24d17

…v_transpose_2d operation.

Vulkan: add conv_transpose_2d operation #16022

Are you sure you want to change the base?

Vulkan: add conv_transpose_2d operation #16022

Conversation

relent95 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

relent95 Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

0cc4m commented Sep 16, 2025

Uh oh!

etasnadi commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0cc4m commented Sep 18, 2025

Uh oh!

etasnadi commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

0cc4m commented Sep 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

etasnadi commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

etasnadi commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0cc4m commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

relent95 Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeffbolznv commented Sep 18, 2025

Uh oh!

Uh oh!

relent95 commented Sep 16, 2025 •

edited

Loading

relent95 Sep 19, 2025 •

edited

Loading

etasnadi commented Sep 18, 2025 •

edited

Loading

etasnadi commented Sep 18, 2025 •

edited

Loading

etasnadi commented Sep 18, 2025 •

edited

Loading

etasnadi commented Sep 18, 2025 •

edited

Loading

0cc4m commented Sep 18, 2025 •

edited

Loading

relent95 Sep 19, 2025 •

edited

Loading