Skip to content

Conversation

PeymanRM
Copy link
Contributor

Motivation

When serving with CLI, arguments: max_prefill_token_num and num_tokens_per_iter weren't being set.

Modification

Added max_prefill_token_num and num_tokens_per_iter as arguments for TurbomindEngineConfig in cli api_serve after being parsed.

Use cases

Taking advantage of "Dynamic SplitFuse"-like behavior using CLI.

Checklist

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  • If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  • The documentation has been modified accordingly, like docstring or example tutorials.

@PeymanRM PeymanRM marked this pull request as ready for review July 27, 2025 14:10
@lvhan028 lvhan028 merged commit 5f0647f into InternLM:main Jul 28, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants