Release v0.10.0 · InternLM/lmdeploy

What's Changed

🚀 Features

support offloading weights & kv_cache for turbomind by @irexyc in #3798
Add PPU backend support by @guozixu2001 in #3807
Add turbomind metrics by @lvhan028 in #3811
PytorchEngine support gpt-oss bf16 by @grimoire in #3820
support sleep/wakeup for pt engine by @irexyc in #3687
[ascend] run intern-s1 on A3 by @yao-fengchen in #3831
Initial gpt-oss support for turbomind by @lzhangzz in #3839
Support GLM-4-0414 and GLM-4.1V by @CUHKSZzxy in #3846
support internvl3.5 by @lvhan028 in #3886
Update turbomind communication library by @lzhangzz in #3736
MXFP4 support for turbomind GEMM library by @lzhangzz in #3927
Dispatch MXFP4 weight conversion for sm70 & sm75 by @lzhangzz in #3937

💥 Improvements

fix: turbomind backend config in cli serve by @PeymanRM in #3784
remove deprecated codes by @lvhan028 in #3759
Refactor FP8 MoE GEMM by @lzhangzz in #3795
Fix build rope params by @grimoire in #3760
Optimize rmsnorm with head_dim=128 by @grimoire in #3814
Simplify GEMM interface by @lzhangzz in #3818
Optimize create_model_inputs and schedule_decoding by @grimoire in #3766
add remote logs;optimize forward lock by @grimoire in #3737
support deepgemm new api by @grimoire in #3827
remove serving with gradio by @lvhan028 in #3829
Deprecate interactive mode from api_server by @lvhan028 in #3830
build(docker): Try to optimize docker by @windreamer in #3779
Make a common chat.py to replace each engine's by @lvhan028 in #3836
Ray mp engine backend by @grimoire in #3790
[Feat] support using external ray pg with bundles by @CyCle1024 in #3850
Remove unused code in PT Engine by @grimoire in #3858
support logprobs by @grimoire in #3852
optimize prefill preprocess by @grimoire in #3869
fix flash-attn bc by @grimoire in #3873
Graph warmup by @grimoire in #3851
Improve turbomind's prefix cache by @lvhan028 in #3835
Support OpenAI compatible parameter max_completion_tokens by @Huarong in #3876
[ascend] add env to set rt visable by ray and disable warmup by @tangzhiyi11 in #3894
support cache_max_entry_count >= 1 for Turbomind backend by @lh9171338 in #3913
adjust default values by @lvhan028 in #3921
[refactor][chat_template][1/N] adopt tokenizer's apply_chat_template by @lvhan028 in #3845
use FA 2.8.3 which is compatible with torch 2.8.0 by @lvhan028 in #3936
refactor ascend Dockerfile by @yao-fengchen in #3926

🐞 Bug fixes

fix gemma3 by @grimoire in #3772
fix head_dim=None by @grimoire in #3793
fix user-specified max_session_len by @grimoire in #3785
remove 'lmdeploy convert' from CLI by @lvhan028 in #3813
Fix EP with large batch size by @grimoire in #3808
fix internvl disable_vision_encoder by @grimoire in #3800
Align response behavior across both engines by @lvhan028 in #3821
fix: set text_config.tie_word_embedding = False in qwen2vl by @zenosai in #3824
Fix v1 comp protocol by @CUHKSZzxy in #3828
[dlinfer] fix get_backend err by @yao-fengchen in #3847
Update internvl.py to fix #3528 by @zodiacg in #3837
fix partial rotary factor by @CUHKSZzxy in #3861
fix: duplicated token usage in /chat/completions stream mode by @Huarong in #3859
fix chatting with VLM model via CLI by @lvhan028 in #3862
fix inference on windows platform by @irexyc in #3865
fix prebuild on cuda12.8 by @lvhan028 in #3857
Fix uninitialized members in cuBLAS wrapper by @lzhangzz in #3874
fix flashmla build for cuda12.4 by @CUHKSZzxy in #3872
[Fix] ray mp engine on ascend platform by @CyCle1024 in #3877
fix bug: leaves empty by @Tsundoku958 in #3868
Fix side effect brought by gpt-oss support by @lvhan028 in #3880
fix pytorch metrics in mp engine by @CUHKSZzxy in #3882
Fix stream assert error when wakeup 30+ times by @CyCle1024 in #3883
fix batched prefill by @grimoire in #3887
fix side effect brought by #3821 by @lvhan028 in #3888
check_env in multiprocess by @grimoire in #3879
fix cli serve --help by @RunningLeon in #3895
1. [PD Disaggregation] Some Bug Fix (adapte p2p_initialize, metrics, uniexecutor with pd disagg) by @JimyMa in #3893
Resolve a crash in the sleep endpoint by casting the level parameter from string to int by @irexyc in #3897
Fix nccl for docker cu11 by @RunningLeon in #3896
disable check_env in multiprocess on dlinfer devices by @tangzhiyi11 in #3914
[dlinfer] fix nn layout typo and scale t by @yuchiwang in #3915
fix chat and warmup of lora adapter by @grimoire in #3911
build(acsend): try to fix acsend CI docker build by @windreamer in #3906
fix internvl3 hf by @CUHKSZzxy in #3932
build(docker): fix ascend tag name by @windreamer in #3939
put eot_token to stop_words by @lvhan028 in #3941

📚 Documentations

update proxy docs by @CUHKSZzxy in #3796
add missing docs by @CUHKSZzxy in #3871
fix docs by @CUHKSZzxy in #3885
update news and citation by @lvhan028 in #3889

🌐 Other

add prometheus client by @CUHKSZzxy in #3792
fix: add dummy_prefill guard for PD connection operations by @FirwoodLin in #3803
minor fix about the log level and logs by @lvhan028 in #3758
assert PytorchEngineConfig block size by @Tsundoku958 in #3826
[ci] change restful api into openai and add more testcase by @zhulinJulia24 in #3866
remove ppu backend by @yao-fengchen in #3904
[ci] remove flash attn installation in ete test workflow by @zhulinJulia24 in #3908
dlinfer backend support ray by @yao-fengchen in #3903
style(types): fix return type annotation for get_all_requests by @xiaoajie738 in #3919
upgrade torch to 2.8.0 and triton 3.4.0 by @lvhan028 in #3930
Dlinfer readme by @jinminxi104 in #3938
bump version to v0.10.0 by @lvhan028 in #3933

New Contributors

@PeymanRM made their first contribution in #3784
@FirwoodLin made their first contribution in #3803
@zenosai made their first contribution in #3824
@Tsundoku958 made their first contribution in #3826
@guozixu2001 made their first contribution in #3807
@zodiacg made their first contribution in #3837
@Huarong made their first contribution in #3859
@yuchiwang made their first contribution in #3915
@lh9171338 made their first contribution in #3913

Full Changelog: v0.9.2...v0.10.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.10.0

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

Uh oh!