fix: turbomind backend config in cli serve #3784

PeymanRM · 2025-07-27T14:09:52Z

Motivation

When serving with CLI, arguments: max_prefill_token_num and num_tokens_per_iter weren't being set.

Added max_prefill_token_num and num_tokens_per_iter as arguments for TurbomindEngineConfig in cli api_serve after being parsed.

Taking advantage of "Dynamic SplitFuse"-like behavior using CLI.

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
The documentation has been modified accordingly, like docstring or example tutorials.

num_tokens_per_iter & max_prefill_iters for tm engine fixed

9ccc6e5

PeymanRM marked this pull request as ready for review July 27, 2025 14:10

lvhan028 approved these changes Jul 28, 2025

View reviewed changes

lvhan028 added the improvement label Jul 28, 2025

lvhan028 merged commit 5f0647f into InternLM:main Jul 28, 2025
5 checks passed