Skip to content

Conversation

relent95
Copy link
Contributor

@relent95 relent95 commented Sep 16, 2025

This PR adds conv_transpose_2d operation to Vulkan backend. The code are based on the implementation of the existing conv_2d operation. The shader supports strides (s0, s1), paddings (p0, p1) and dilations (d0, d1). But in ggml_vk_conv_transpose_2d(), they are constrained as s1 = s0, p0 = p1 = 0, d0 = d1 = 1, because of the existing GGML_OP_CONV_TRANSPOSE_2D interface.

@relent95 relent95 requested a review from 0cc4m as a code owner September 16, 2025 07:29
@github-actions github-actions bot added testing Everything test related Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Sep 16, 2025
@@ -1117,6 +1119,56 @@ template <> void init_pushconst_fastdiv(vk_op_conv2d_push_constants &p) {
init_fastdiv_values(p.OW*p.OH, p.OWOHmp, p.OWOHL);
}

struct vk_op_conv_transpose_2d_push_constants {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses more than 128B, which is the minimum required push constant size. 256B is pretty widely supported (and required in newer versions of vulkan). If you can't trim this down then there should be a check in supports_op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. In the commit f5ae689 below, I added checking the size limit at the pipeline creation time(ggml_vk_load_shaders()) and checking the pipeline pointer at the ggml_backend_vk_device_supports_op(). Also refactored shader code by merging conv_transpose_2d_mm.comp into conv2d_mm.comp, as suggessed by 0cc4m.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit different than how the rest of the code works - usually we put the detailed check in supports_op and then load_shaders doesn't need to check for support. It probably works OK this way, but I'll defer to @0cc4m to decide whether this should be changed.

Copy link
Contributor Author

@relent95 relent95 Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced checking pointer as checking push constants limit explicitly in the recent commit. I preserved the same check in load_shaders, because existing code(such as pipeline_multi_add) also check device limit in there via flags. It may be better to introduce a new flag in vk_device_struct. But I have no idea where to put it there(order of struct fields). What do you think?

@0cc4m
Copy link
Collaborator

0cc4m commented Sep 16, 2025

If this is based on the existing conv_2d shader, shouldn't it be possible to use the existing .comp file and only change the parts that are actually different with a preprocessor variable?

@relent95 relent95 requested a review from jeffbolznv September 17, 2025 04:33
@etasnadi
Copy link
Contributor

etasnadi commented Sep 18, 2025

If this is based on the existing conv_2d shader, shouldn't it be possible to use the existing .comp file and only change the parts that are actually different with a preprocessor variable?

You are right, if there will be different shaders with code repeats for each conv variant (transpose, gradient, channel-wise), the maintenance will be really though.

Also, there are still many ways left to optimize the conv kernel and I also have a few updates in my private repo that I might polish in the near future and submit.

@0cc4m
Copy link
Collaborator

0cc4m commented Sep 18, 2025

@etasnadi Yeah, the PR has already been updated to do that. You could also do a review if you want, you know the shader better than me. I'll mostly make sure that the C++ code is fine and the shader passes on Intel, AMD and Nvidia.

@etasnadi
Copy link
Contributor

etasnadi commented Sep 18, 2025

@etasnadi Yeah, the PR has already been updated to do that. You could also do a review if you want, you know the shader better than me. I'll mostly make sure that the C++ code is fine and the shader passes on Intel, AMD and Nvidia.

Yes, I've just realized that it was updated since.

Now my problem is that the kernel is and will be too complicated (it was really complicated before too), so we might need to introduce some abstraction I guess. Maybe @jeffbolznv has some ideas.

What do you think about adding support for HLSL shaders? As far as I know thae glslangValidatior already has basic support but in the meantime we could use dxc.

@@ -225,10 +229,16 @@ void main() {
uint32_t B_ly = r_offset + Ar;
uint32_t B_lx = Ac;
uint32_t K_idx = B_idx_K * BS_K + B_ly; /* Global K_idx (row index of A)*/
uint32_t knl_idx = min(KW_idx_a + KH_idx_a * p.nb01 + Cin_idx_a * p.nb02 + K_idx * p.nb03, K * CRS - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index calculation is moved into the else, is it indended? Some of such optimizations backfired on a few GPUs #15056

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. It was a premature optimization. I reverted the order of the index calculation and the bound check.

@0cc4m
Copy link
Collaborator

0cc4m commented Sep 18, 2025

What do you think about adding support for HLSL shaders? As far as I know thae glslangValidatior already has basic support but in the meantime we could use dxc.

What's the advantage of HLSL over GLSL? I'm not familiar with it, and not a fan of Microsoft dependencies. It would probably make maintenance harder. If you want to look into a shader language with more modern features, wouldn't slang be more interesting and more open?

Personally I'm hoping one of the projects looking into a C++-based compute shader syntax (similar to CUDA and ROCm) pans out. For now GLSL is good enough for me.

uint32_t H_idx = OH_idx * p.s1 + KH_idx_b * p.d1 - p.p1;
uint32_t W_idx = OW_idx * p.s0 + KW_idx_b * p.d0 - p.p0;
uint32_t src_idx =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here, it can influence the perf, so if it is not intended, it is better to do the calculation in the same way as it was before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also reverted the order as the above change.

@etasnadi
Copy link
Contributor

etasnadi commented Sep 18, 2025

What do you think about adding support for HLSL shaders? As far as I know thae glslangValidatior already has basic support but in the meantime we could use dxc.

What's the advantage of HLSL over GLSL? I'm not familiar with it, and not a fan of Microsoft dependencies. It would probably make maintenance harder. If you want to look into a shader language with more modern features, wouldn't slang be more interesting and more open?

Personally I'm hoping one of the projects looking into a C++-based compute shader syntax (similar to CUDA and ROCm) pans out. For now GLSL is good enough for me.

For example, it seems that it supports templates: https://devblogs.microsoft.com/directx/announcing-hlsl-2021/#template-functions-and-data-types - and templates alone would help a lot. I am not a fan of adding dependencies to projects governed by a single company either, but this kernel will be unmaintainable in the future at this abstraction level and GLSL have limited features to deal with the problem.

Sglang is also a good idea, however I do not know how much it is adopted and if it is mature enough. For example I tried to use coopmats with slang without success ~1 year ago - I guess their compiler does not support all extensions automatically?

@etasnadi
Copy link
Contributor

etasnadi commented Sep 18, 2025

What do you think about adding support for HLSL shaders? As far as I know thae glslangValidatior already has basic support but in the meantime we could use dxc.

What's the advantage of HLSL over GLSL? I'm not familiar with it, and not a fan of Microsoft dependencies. It would probably make maintenance harder. If you want to look into a shader language with more modern features, wouldn't slang be more interesting and more open?

Personally I'm hoping one of the projects looking into a C++-based compute shader syntax (similar to CUDA and ROCm) pans out. For now GLSL is good enough for me.

Now I see that they don't have coopmat support but it's WIP. shader-slang/slang#7634 So I believe it is a good idea to add support for slang in the near future! Also, there are several NV suffixed accounts contributing to the project so I guess Nvidia has a bet on the project.

@0cc4m
Copy link
Collaborator

0cc4m commented Sep 18, 2025

@etasnadi I think it's Khronos, not Nvidia specifically, but yeah. Coopmat should already be there, see shader-slang/slang#7170 (comment), but let's not sidetrack this PR. If you want to look into it, go ahead. If discussion is needed, please open an issue about it.

#endif
float val;
if (CRS_idx_b >= CRS || NPQ_idx >= NPQ
|| int32_t(H_idx) < 0 || H_idx >= p.H || int32_t(W_idx) < 0 || W_idx >= p.W
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The < 0 checks aren't necessary.

Copy link
Contributor Author

@relent95 relent95 Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I missed that. What would you suggest between removing the < 0 expressions and preserving the old code(such as H_idx < 0)?

@jeffbolznv
Copy link
Collaborator

HLSL doesn't support spec constants which IMO is a deal breaker. It also only has coopmat1 level of support for use in Vulkan. slang supports coopmat2, spec constants, and generics, and there are cases where generics would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning testing Everything test related Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants