Skip to content

Releases: xorbitsai/inference

v1.10.0

13 Sep 12:21
b018733
Compare
Choose a tag to compare

What's new in 1.10.0 (2025-09-13)

These are the changes in inference v1.10.0.

New features

Enhancements

Bug fixes

New Contributors

Full Changelog: v1.9.1...v1.10.0

v1.9.1

30 Aug 12:07
b2d793d
Compare
Choose a tag to compare

What's new in 1.9.1 (2025-08-30)

These are the changes in inference v1.9.1.

New features

Enhancements

  • ENH: added zero shot and voice cloning ability for audio models by @qianduoduo0904 in #3968
  • ENH: Add Template for Qwen3 Reranker when model_engine = vllm by @zhcn000000 in #3983
  • ENH: Update the environment dependencies for cosyvoice2 by @Gmgge in #4015
  • ENH: Compat with xllamacpp 0.2.0 by @codingl2k1 in #4004
  • ENH: support chat_template_kwargs for llama.cpp by @qinxuye in #3988
  • BLD: Clean up Docker's last legacy cache and images before executing each step by @zwt-1234 in #3963
  • BLD: fix CI failures by @qinxuye in #4002

Bug fixes

  • BUG: disable flash_attention when GPU compute capability < 8.0 by @amumu96 in #3973
  • BUG: fix rerank model creation by @qinxuye in #3977

Documentation

Others

New Contributors

Full Changelog: v1.9.0...v1.9.1

v1.9.0

16 Aug 15:41
6e129a8
Compare
Choose a tag to compare

What's new in 1.9.0 (2025-08-16)

These are the changes in inference v1.9.0.

New features

Enhancements

Bug fixes

Documentation

Others

  • Replace @torch.no_grad() with @torch.inference_mode() in Qwen3-Reranker by @yasu-oh in #3911

Full Changelog: v1.8.1...v1.9.0

v1.8.1

03 Aug 19:08
0e5b67b
Compare
Choose a tag to compare

What's new in 1.8.1 (2025-08-03)

These are the changes in inference v1.8.1.

New features

Enhancements

  • ENH: add mlu device check by @nan9126 in #3844
  • ENH: Support for the bge-m3 llama.cpp backend by @codingl2k1 in #3861
  • ENH: Added mlx support for deepseek-v3-0324 by @uebber in #3864
  • ENH: Add context length limits and automatic truncation features to vLLM embedding models. by @amumu96 in #3887
  • BLD: remove sglang from pip install xinference[all] due to depedency conflicts with vllm by @qinxuye in #3865
  • BLD: upgrade base image for dockerfile by @zwt-1234 in #3318
  • BLD: change dokcer build time to 240 minutes to pass 12.8 build by @qinxuye in #3892
  • REF: add ui module that includes web and gradio UIs. by @qinxuye in #3819
  • REF: move continuous batching scheduler into model by @qinxuye in #3824

Bug fixes

Documentation

  • DOC: add experimental feature for virtualenv by @qinxuye in #3818
  • DOC: add doc about model virtual env settings when lauching model by @qinxuye in #3885

Others

New Contributors

Full Changelog: v1.8.0...v1.8.1

v1.8.1.rc1

03 Aug 15:41
0e5b67b
Compare
Choose a tag to compare
v1.8.1.rc1 Pre-release
Pre-release

What's new in 1.8.1.rc1 (2025-08-03)

These are the changes in inference v1.8.1.rc1.

New features

Enhancements

  • ENH: add mlu device check by @nan9126 in #3844
  • ENH: Support for the bge-m3 llama.cpp backend by @codingl2k1 in #3861
  • ENH: Added mlx support for deepseek-v3-0324 by @uebber in #3864
  • ENH: Add context length limits and automatic truncation features to vLLM embedding models. by @amumu96 in #3887
  • BLD: remove sglang from pip install xinference[all] due to depedency conflicts with vllm by @qinxuye in #3865
  • BLD: upgrade base image for dockerfile by @zwt-1234 in #3318
  • REF: add ui module that includes web and gradio UIs. by @qinxuye in #3819
  • REF: move continuous batching scheduler into model by @qinxuye in #3824

Bug fixes

Documentation

  • DOC: add experimental feature for virtualenv by @qinxuye in #3818
  • DOC: add doc about model virtual env settings when lauching model by @qinxuye in #3885

Others

New Contributors

Full Changelog: v1.8.0...v1.8.1.rc1

v1.8.0

20 Jul 07:34
abc42ca
Compare
Choose a tag to compare

What's new in 1.8.0 (2025-07-20)

These are the changes in inference v1.8.0.

New features

Enhancements

Bug fixes

  • BUG: disable flash_attn for qwen3 embedding & rerank when no gpu available by @qinxuye in #3739
  • BUG: Fix bugs in del async_client by @zhcn000000 in #3753
  • BUG: add message preprocessing to ensure that content is not null by @amumu96 in #3791
  • BUG: pre check to prevent from list index out of range for FunASR family models by @leslie2046 in #3809
  • BUG: resolve issue where AI output was lost when no tool was selected for function call #3767 by @aniya105 in #3768
  • BUG: fix error in content output at reasoning_content, when using enable_thinking in chat_template_kwargs by @amumu96 in #3794

Documentation

Full Changelog: v1.7.1...v1.8.0

v1.7.1.post1

30 Jun 11:28
84f10dc
Compare
Choose a tag to compare

What's new in 1.7.1.post1 (2025-06-30)

These are the changes in inference v1.7.1.post1.

Enhancements

  • BLD: pin transformers version at 4.52.4 to fix "Failed to import module 'SentenceTransformer'" error by @amumu96 in #3743

Full Changelog: v1.7.1...v1.7.1.post1

v1.7.1

27 Jun 12:17
cf64a86
Compare
Choose a tag to compare

What's new in 1.7.1 (2025-06-27)

These are the changes in inference v1.7.1.

New features

Enhancements

  • ENH: add enable_flash_attn param for loading qwen3 embedding & rerank by @qinxuye in #3640
  • ENH: add more abilities for builtin model families API by @qinxuye in #3658
  • ENH: improve local cluster startup reliability via child-process readiness signaling by @Checkmate544 in #3642
  • ENH: FishSpeech support pcm by @codingl2k1 in #3680
  • ENH: Add 4-sample micro-batching to Qwen-3 reranker to reduce GPU memory by @yasu-oh in #3666
  • ENH: Limit default n_parallel for llama.cpp backend by @codingl2k1 in #3712
  • BLD: pin flash-attn & flashinfer-python version and limit sgl-kernel version by @amumu96 in #3669
  • BLD: Update Dockerfile by @XiaoXiaoJiangYun in #3695
  • REF: remove unused code by @qinxuye in #3664

Bug fixes

  • BUG: fix TTS error bug :No such file or directory by @robin12jbj in #3625
  • BUG: Fix max_tokens value in Qwen3 Reranker by @yasu-oh in #3665
  • BUG: fix custom embedding by @qinxuye in #3677
  • BUG: [UI] rename the command-line argument from download-hub to download_hub. by @yiboyasss in #3685
  • BUG: fix jina-clip-v2 for text only or image only by @qinxuye in #3690
  • BUG: internvl chat error using vllm engine by @amumu96 in #3722
  • BUG: fix the parsing logic of streaming tool calls by @amumu96 in #3721
  • BUG: fix <think> wrongly added when set chat_template_kwargs {"enable_thinking": False} by @qinxuye in #3718

Documentation

New Contributors

Full Changelog: v1.7.0...v1.7.1

v1.7.0.post1

13 Jun 17:23
da2040e
Compare
Choose a tag to compare

What's new in 1.7.0.post1 (2025-06-13)

These are the changes in inference v1.7.0.post1.

Bug fixes

Full Changelog: v1.7.0...v1.7.0.post1

v1.7.0

13 Jun 10:58
a362dba
Compare
Choose a tag to compare

What's new in 1.7.0 (2025-06-13)

These are the changes in inference v1.7.0.

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Full Changelog: v1.6.1...v1.7.0