[WIP] Rpc split row #16020

LeaveNhA · 2025-09-16T03:22:50Z

The PR

This work aims one goal; having row-splitting mode on RPC clusters.

TL;DR

I got bored, wanted to contribute and join you guys on this beautiful journey. I hope you welcome me.

Details & Background

Heavy WIP situation, including this description, I will work on this PR and make sure it fits well with the rest of the project.

For the background:

Metal devices have only one GPU. This is a bit tricky because Row splitting has no use on one device/backend. But the ultimate goal is having it, so with RPC, devices can calculate inference effectively and faster. For this, I worked on both sides. I implemented a very, very early stage of row wise splitting mode on Metal backend and then make it work with RPC too.

The current PR has the implementation, but, -be aware- the performance is unacceptable and every device you add to the cluster, it gets worse. I will inspect the PR and will read sources I can find to have the Domain Knowledge I need to have to solve this.

Tests & Results:

❯ ./build-rpc-split-mode-row-release/bin/llama-bench -m ../llama.cpp.org.new.rpc/hfmodels/models/llama-2-7b.Q4_0.gguf --split-mode row --rpc 127.0.0.1:50052
| model                          |       size |     params | backend    | threads |    sm |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ----: | --------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Metal,BLAS,RPC |       8 |   row |           pp512 |         54.29 ± 6.55 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Metal,BLAS,RPC |       8 |   row |           tg128 |          0.64 ± 0.01 |

build: 997e3047 (6444)

In any cases, every comment, suggestion and support are very welcome.

…ort split-mode row

jeffbolznv · 2025-09-16T09:14:47Z

Is anybody still working on a backend-agnostic row splitting implementation?

ggerganov · 2025-09-16T09:20:52Z

Is anybody still working on a backend-agnostic row splitting implementation?

I don't think anyone is working on this atm. Will tag @slaren and @JohannesGaessler in case they are aware of any ongoing efforts.

slaren · 2025-09-16T09:26:19Z

@koush had an initial implementation (#13818 (comment)), but I am not sure if that's still being worked on.

JohannesGaessler · 2025-09-16T10:32:56Z

My current priorities not specific to CUDA are automating how to distribute tensors to GPUs (by reusing the code from #15860) and then I intend to get back to working on backend-agnostic tensor parallelism.

In parallel I'm refactoring and deduplicating the FlashAttention CUDA code and optimizing it for AMD. Since I've already invested the effort to read the AMD ISA documentations I'll probably buy an RDNA4 GPU and implement better support for the AMD equivalent of tensor cores.

LeaveNhA · 2025-09-17T05:18:34Z

Is anybody still working on a backend-agnostic row splitting implementation?

Backend agnostic approach would be much more valuable in the big picture, if you ask me.

On the other hand, if I can get in touch with @koush and get sync about the current situation, I can gladly get on board with another PR to make this feature work on both alone and cluster mode.

LeaveNhA added 13 commits September 10, 2025 06:40

Implement ggml_backend_metal_split_buffer_type for Metal backend supp…

b53f098

…ort split-mode row

Fixing the build.

e4e068a

checkpoint

c5cc46b

checkpoint

5c161b4

checkpoint

2f25907

checkpoint

77b1864

checkpoint

920b5f4

checkpoint

0c171fb

checkpoint

24d78d8

checkpoint

7ea7fc8

[X] working without rpc

85ea1b8

[X] working with rpc, but slow

997e304

gathering before clean-up

81ef79a

github-actions bot added examples ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Sep 16, 2025

LeaveNhA changed the title ~~Rpc split row~~ [WIP] Rpc split row Sep 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Rpc split row #16020

[WIP] Rpc split row #16020

LeaveNhA commented Sep 16, 2025 •

edited

Loading

Uh oh!

jeffbolznv commented Sep 16, 2025

Uh oh!

ggerganov commented Sep 16, 2025

Uh oh!

slaren commented Sep 16, 2025

Uh oh!

JohannesGaessler commented Sep 16, 2025

Uh oh!

LeaveNhA commented Sep 17, 2025

Uh oh!

Uh oh!

[WIP] Rpc split row #16020

Are you sure you want to change the base?

[WIP] Rpc split row #16020

Conversation

LeaveNhA commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The PR

TL;DR

Details & Background

For the background:

Tests & Results:

In any cases, every comment, suggestion and support are very welcome.

Uh oh!

jeffbolznv commented Sep 16, 2025

Uh oh!

ggerganov commented Sep 16, 2025

Uh oh!

slaren commented Sep 16, 2025

Uh oh!

JohannesGaessler commented Sep 16, 2025

Uh oh!

LeaveNhA commented Sep 17, 2025

Uh oh!

Uh oh!

LeaveNhA commented Sep 16, 2025 •

edited

Loading