13 Sep 12:21

XprobeBot

b018733

v1.10.0 Latest

Latest

What's new in 1.10.0 (2025-09-13)

These are the changes in inference v1.10.0.

New features

FEAT: [model] Support Kokoro-82M-v1.1-zh by @JavisPeng in #4042
FEAT: IP restriction by env: XINFERENCE_ALLOWED_IPS by @qxo in #4047
FEAT: add support for the Anthropic API format by @OliverBryant in #4037
FEAT: Openai API support vLLM json schema output by @OliverBryant in #4061

Enhancements

ENH: Update the environment dependencies for GOT-OCR2 by @Gmgge in #4031
ENH: Clean memory during running MLX version's LLM models by @OliverBryant in #4026
BLD: bump funasr to 1.2.7 by @leslie2046 in #4039
BLD: cu128 version Dockerfile fix by @zwt-1234 in #4056
BLD: Update Dockerfile.cu128 by @amumu96 in #4059
REF: refactor tool calls functionality by @amumu96 in #4025

Bug fixes

BUG: Fix Kokoro-82M can't run on GPU by @OliverBryant in #4034
BUG: [embeddings] fix parsing str type hf_overrides for vllm engine by @llyycchhee in #4052
BUG: missing usage info in jina-embedding-v4 model response by @amumu96 in #4054
BUG: distributed registration bug by @llyycchhee in #4046

New Contributors

@JavisPeng made their first contribution in #4042
@qxo made their first contribution in #4047

Full Changelog: v1.9.1...v1.10.0

Contributors

qxo, leslie2046, and 6 other contributors

Assets 2

30 Aug 12:07

XprobeBot

v1.9.1

b2d793d

v1.9.1

What's new in 1.9.1 (2025-08-30)

These are the changes in inference v1.9.1.

New features

FEAT: Qwen-Image-Edit by @qinxuye in #3989
FEAT: Wan 2.2 by @qinxuye in #3996
FEAT: Update CosyVoice2 to support both streaming and non-streaming speech generation by @Gmgge in #3994
FEAT: support qwen-image-lightning by @qinxuye in #3995
FEAT: [UI] support gpu_count configuration in image model. by @yiboyasss in #4016
FEAT: image2image and inpainting for qwen-image by @qinxuye in #4014
FEAT: Support Custom vllm embedding dim by @zhcn000000 in #4000
FEAT: [embedding] support dimensions for embedding by @llyycchhee in #3965
FEAT: [Model] Support DeepSeek-V3.1 Quantization and tool by @Jun-Howie in #4022
FEAT: Seed-OSS-36B by @Jun-Howie in #4020

Enhancements

ENH: added zero shot and voice cloning ability for audio models by @qianduoduo0904 in #3968
ENH: Add Template for Qwen3 Reranker when model_engine = vllm by @zhcn000000 in #3983
ENH: Update the environment dependencies for cosyvoice2 by @Gmgge in #4015
ENH: Compat with xllamacpp 0.2.0 by @codingl2k1 in #4004
ENH: support chat_template_kwargs for llama.cpp by @qinxuye in #3988
BLD: Clean up Docker's last legacy cache and images before executing each step by @zwt-1234 in #3963
BLD: fix CI failures by @qinxuye in #4002

Bug fixes

BUG: disable flash_attention when GPU compute capability < 8.0 by @amumu96 in #3973
BUG: fix rerank model creation by @qinxuye in #3977

Documentation

DOC: update models by @qinxuye in #3958
DOC: add setting limitation of images for multi modal doc by @amumu96 in #4003
DOC: Update docs about custom models by @OliverBryant in #4019
DOC: update models & README by @qinxuye in #4023

Others

FEAT：KAT-V1 by @Jun-Howie in #3998

New Contributors

@qianduoduo0904 made their first contribution in #3968
@OliverBryant made their first contribution in #4019

Full Changelog: v1.9.0...v1.9.1

Contributors

qinxuye, Gmgge, and 9 other contributors

Assets 2

16 Aug 15:41

XprobeBot

v1.9.0

6e129a8

v1.9.0

What's new in 1.9.0 (2025-08-16)

These are the changes in inference v1.9.0.

New features

FEAT: [UI] running models data display replica. by @yiboyasss in #3897
FEAT: [model] Qwen-Image by @qinxuye in #3916
FEAT: [model] gpt-oss by @qinxuye in #3924
FEAT: function calling support for deepseek-r1-0528 by @qinxuye in #3931
FEAT: Support for GLM 4.5 quantized models by @Jun-Howie in #3945
FEAT: sglang support streaming function call by @aniya105 in #3939
FEAT: parsing harmony format for gpt-oss by @qinxuye in #3948
FEAT: Add support for switching rerank model engines and support for rerank of vllm engine by @zhcn000000 in #3881
FEAT: Support GLM-4.5v by @Jun-Howie in #3957

Enhancements

ENH: Add qwen3 new model to tool call list by @zhcn000000 in #3900
ENH: Update chat_template for Qwen3-Coder by @Jun-Howie in #3944
ENH: add flash_attention control params attn_implementation by @amumu96 in #3951
ENH: support qwen-image gguf by @qinxuye in #3954
ENH: clean embedding model cache when using vllm engine by @amumu96 in #3956
BLD: Downgrade flash-attn to version 2.7.4 by @zwt-1234 in #3953
BLD: Add Openfst source by @zwt-1234 in #3959

Bug fixes

BUG: limit datasets version by @qinxuye in #3943

Documentation

DOC: add doc about cu128 docker by @qinxuye in #3899
DOC: Update xllamacpp doc by @codingl2k1 in #3862

Others

Replace @torch.no_grad() with @torch.inference_mode() in Qwen3-Reranker by @yasu-oh in #3911

Full Changelog: v1.8.1...v1.9.0

Contributors

qinxuye, zhcn000000, and 7 other contributors

Assets 2

03 Aug 19:08

XprobeBot

v1.8.1

0e5b67b

v1.8.1

What's new in 1.8.1 (2025-08-03)

These are the changes in inference v1.8.1.

New features

FEAT: kokoro mlx support by @qinxuye in #3823
FEAT: Qwen3-Instruct by @Jun-Howie in #3840
FEAT: [UI] integrate user favorites into feature model output. by @yiboyasss in #3859
FEAT: support enable virtualenv and specify packages when lauching model by @qinxuye in #3854
FEAT: [UI] support enable virtualenv and specify packages when lauching model. by @yiboyasss in #3867
FEAT: setting max_tokens to maximum if not specified by @qinxuye in #3872
FEAT: [model] support GLM-4.5 series by @qinxuye in #3882
FEAT: Qwen3-30B-A3B-it by @Jun-Howie in #3886
FEAT: Support Qwen3-Thinking by @Jun-Howie in #3888
FEAT: Support Qwen3-Coder by @Jun-Howie in #3889

Enhancements

ENH: add mlu device check by @nan9126 in #3844
ENH: Support for the bge-m3 llama.cpp backend by @codingl2k1 in #3861
ENH: Added mlx support for deepseek-v3-0324 by @uebber in #3864
ENH: Add context length limits and automatic truncation features to vLLM embedding models. by @amumu96 in #3887
BLD: remove sglang from pip install xinference[all] due to depedency conflicts with vllm by @qinxuye in #3865
BLD: upgrade base image for dockerfile by @zwt-1234 in #3318
BLD: change dokcer build time to 240 minutes to pass 12.8 build by @qinxuye in #3892
REF: add ui module that includes web and gradio UIs. by @qinxuye in #3819
REF: move continuous batching scheduler into model by @qinxuye in #3824

Bug fixes

BUG: Fixed an error when using structured output in sglang #3825 by @aniya105 in #3826
BUG: fix compatibility for old vllm by @qinxuye in #3838
BUG: Fix abnormal GPU memory usage in Qwen3 Reranker by @JDanielWu in #3846
BUG: fix compatibility with vllm 0.10.0 by @qinxuye in #3875
BUG: fix version checks for vllm by @qinxuye in #3891

Documentation

DOC: add experimental feature for virtualenv by @qinxuye in #3818
DOC: add doc about model virtual env settings when lauching model by @qinxuye in #3885

Others

FIX: GLM4.1V Repository URL by @Jun-Howie in #3839
BLD：fix docker build for cu128 by @zwt-1234 in #3893
BLD：fix cu128 build by @zwt-1234 in #3895
CHORE: THUDM has been renamed to zai-org by @Jun-Howie in #3870

New Contributors

@JDanielWu made their first contribution in #3846
@uebber made their first contribution in #3864
@zwt-1234 made their first contribution in #3318

Full Changelog: v1.8.0...v1.8.1

Contributors

qinxuye, nan9126, and 8 other contributors

Assets 2

03 Aug 15:41

XprobeBot

v1.8.1.rc1

0e5b67b

v1.8.1.rc1 Pre-release

Pre-release

What's new in 1.8.1.rc1 (2025-08-03)

These are the changes in inference v1.8.1.rc1.

New features

FEAT: kokoro mlx support by @qinxuye in #3823
FEAT: Qwen3-Instruct by @Jun-Howie in #3840
FEAT: [UI] integrate user favorites into feature model output. by @yiboyasss in #3859
FEAT: support enable virtualenv and specify packages when lauching model by @qinxuye in #3854
FEAT: [UI] support enable virtualenv and specify packages when lauching model. by @yiboyasss in #3867
FEAT: setting max_tokens to maximum if not specified by @qinxuye in #3872
FEAT: [model] support GLM-4.5 series by @qinxuye in #3882
FEAT: Qwen3-30B-A3B-it by @Jun-Howie in #3886
FEAT: Support Qwen3-Thinking by @Jun-Howie in #3888
FEAT: Support Qwen3-Coder by @Jun-Howie in #3889

Enhancements

ENH: add mlu device check by @nan9126 in #3844
ENH: Support for the bge-m3 llama.cpp backend by @codingl2k1 in #3861
ENH: Added mlx support for deepseek-v3-0324 by @uebber in #3864
ENH: Add context length limits and automatic truncation features to vLLM embedding models. by @amumu96 in #3887
BLD: remove sglang from pip install xinference[all] due to depedency conflicts with vllm by @qinxuye in #3865
BLD: upgrade base image for dockerfile by @zwt-1234 in #3318
REF: add ui module that includes web and gradio UIs. by @qinxuye in #3819
REF: move continuous batching scheduler into model by @qinxuye in #3824

Bug fixes

BUG: Fixed an error when using structured output in sglang #3825 by @aniya105 in #3826
BUG: fix compatibility for old vllm by @qinxuye in #3838
BUG: Fix abnormal GPU memory usage in Qwen3 Reranker by @JDanielWu in #3846
BUG: fix compatibility with vllm 0.10.0 by @qinxuye in #3875
BUG: fix version checks for vllm by @qinxuye in #3891

Documentation

DOC: add experimental feature for virtualenv by @qinxuye in #3818
DOC: add doc about model virtual env settings when lauching model by @qinxuye in #3885

Others

FIX: GLM4.1V Repository URL by @Jun-Howie in #3839
CHORE: THUDM has been renamed to zai-org by @Jun-Howie in #3870

New Contributors

@JDanielWu made their first contribution in #3846
@uebber made their first contribution in #3864
@zwt-1234 made their first contribution in #3318

Full Changelog: v1.8.0...v1.8.1.rc1

Contributors

qinxuye, nan9126, and 8 other contributors

Assets 2

20 Jul 07:34

XprobeBot

v1.8.0

abc42ca

v1.8.0

What's new in 1.8.0 (2025-07-20)

These are the changes in inference v1.8.0.

New features

FEAT: Embedding support llama.cpp backend by @codingl2k1 in #3730
FEAT: non-stream tool calling for sglang by @aniya105 in #3760
FEAT: support migrate from v1 to v2 for custom models by @qinxuye in #3810
FEAT: FLUX.1-Kontext-dev by @qinxuye in #3728
FEAT: support ERNIE 4.5 by @qinxuye in #3812
FEAT: [embedding] add support for jina-embeddings-v4 model by @Minamiyama in #3814
FEAT: [model] support glm-4.1v-thinking by @llyycchhee in #3756

Enhancements

ENH: Pin xllamacpp>=0.1.23 by @codingl2k1 in #3780
ENH: add modelscope for fish speech 1.5 by @qinxuye in #3750
REF: [V2 BREAK] Merge multiple JSON files into one for difference model download sources by @ChengjieLi28 in #3765

Bug fixes

BUG: disable flash_attn for qwen3 embedding & rerank when no gpu available by @qinxuye in #3739
BUG: Fix bugs in del async_client by @zhcn000000 in #3753
BUG: add message preprocessing to ensure that content is not null by @amumu96 in #3791
BUG: pre check to prevent from list index out of range for FunASR family models by @leslie2046 in #3809
BUG: resolve issue where AI output was lost when no tool was selected for function call #3767 by @aniya105 in #3768
BUG: fix error in content output at reasoning_content, when using enable_thinking in chat_template_kwargs by @amumu96 in #3794

Documentation

DOC: fix links by @qinxuye in #3774
DOC: update info in docs by @qinxuye in #3779
DOC: update models by @qinxuye in #3815

Full Changelog: v1.7.1...v1.8.0

Contributors

qinxuye, Minamiyama, and 7 other contributors

Assets 2

30 Jun 11:28

XprobeBot

v1.7.1.post1

84f10dc

v1.7.1.post1

What's new in 1.7.1.post1 (2025-06-30)

These are the changes in inference v1.7.1.post1.

Enhancements

BLD: pin transformers version at 4.52.4 to fix "Failed to import module 'SentenceTransformer'" error by @amumu96 in #3743

Full Changelog: v1.7.1...v1.7.1.post1

Contributors

amumu96

Assets 2

27 Jun 12:17

XprobeBot

v1.7.1

cf64a86

v1.7.1

What's new in 1.7.1 (2025-06-27)

These are the changes in inference v1.7.1.

New features

FEAT: [UI] enhance audio & rerank model registration params. by @yiboyasss in #3656
FEAT: support async client by @zhcn000000 in #3645
FEAT: [UI] add max_tokens display in rerank model. by @yiboyasss in #3671
FEAT: [UI] add model_ability options for LLM registration. by @yiboyasss in #3663
FEAT: support qwenLong-l1 by @Jun-Howie in #3691
FEAT: [UI] model registration supports packages. by @yiboyasss in #3702
FEAT: support MLU device by @nan9126 in #3693
FEAT: vllm v1 auto enabling by @qinxuye in #3637
FEAT: distributed inference for MLX by @qinxuye in #3700

Enhancements

ENH: add enable_flash_attn param for loading qwen3 embedding & rerank by @qinxuye in #3640
ENH: add more abilities for builtin model families API by @qinxuye in #3658
ENH: improve local cluster startup reliability via child-process readiness signaling by @Checkmate544 in #3642
ENH: FishSpeech support pcm by @codingl2k1 in #3680
ENH: Add 4-sample micro-batching to Qwen-3 reranker to reduce GPU memory by @yasu-oh in #3666
ENH: Limit default n_parallel for llama.cpp backend by @codingl2k1 in #3712
BLD: pin flash-attn & flashinfer-python version and limit sgl-kernel version by @amumu96 in #3669
BLD: Update Dockerfile by @XiaoXiaoJiangYun in #3695
REF: remove unused code by @qinxuye in #3664

Bug fixes

BUG: fix TTS error bug :No such file or directory by @robin12jbj in #3625
BUG: Fix max_tokens value in Qwen3 Reranker by @yasu-oh in #3665
BUG: fix custom embedding by @qinxuye in #3677
BUG: [UI] rename the command-line argument from download-hub to download_hub. by @yiboyasss in #3685
BUG: fix jina-clip-v2 for text only or image only by @qinxuye in #3690
BUG: internvl chat error using vllm engine by @amumu96 in #3722
BUG: fix the parsing logic of streaming tool calls by @amumu96 in #3721
BUG: fix <think> wrongly added when set chat_template_kwargs {"enable_thinking": False} by @qinxuye in #3718

Documentation

DOC: add doc for paraformer by @leslie2046 in #3631
DOC: Flexible model (traditional ML models) by @qinxuye in #3714

New Contributors

@robin12jbj made their first contribution in #3625
@zhcn000000 made their first contribution in #3645
@yasu-oh made their first contribution in #3665
@Checkmate544 made their first contribution in #3642
@nan9126 made their first contribution in #3693
@XiaoXiaoJiangYun made their first contribution in #3695

Full Changelog: v1.7.0...v1.7.1

Contributors

qinxuye, leslie2046, and 10 other contributors

Assets 2

13 Jun 17:23

XprobeBot

v1.7.0.post1

da2040e

v1.7.0.post1

What's new in 1.7.0.post1 (2025-06-13)

These are the changes in inference v1.7.0.post1.

Bug fixes

BUG: fix qwen3-rerank to create model on GPU by @qinxuye in #3630
BUG: fix mincpm4 modeling by @Jun-Howie in #3632

Full Changelog: v1.7.0...v1.7.0.post1

Contributors

qinxuye and Jun-Howie

Assets 2

13 Jun 10:58

XprobeBot

v1.7.0

a362dba

v1.7.0

What's new in 1.7.0 (2025-06-13)

These are the changes in inference v1.7.0.

New features

FEAT: support CogView4 image model by @qinxuye in #3557
FEAT: [UI] support model_ability filter for image and video models. by @yiboyasss in #3563
FEAT: [UI] auto-switch to active tab when Running Models page loads. by @yiboyasss in #3568
FEAT: support first-last-frame to video by @qinxuye in #3555
FEAT: [UI] add Japanese and Korean language support. by @yiboyasss in #3574
FEAT: SeACoParaformer model by @leslie2046 in #3587
FEAT: support verbose_json for funasr family audio2text models by @leslie2046 in #3591
FEAT: support deepseek-r1-0528 Mixed quantization by @Jun-Howie in #3601
FEAT: support engines for embedding models by @pengjunfeng11 in #2791
FEAT:support MiniCPM4 Series by @Jun-Howie in #3609
FEAT: [UI] add model_engine parameter to embedding model. by @yiboyasss in #3617
FEAT: add kwargs for transripts client API by @leslie2046 in #3622
FEAT: support qwen3 embedding by @qinxuye in #3615
FEAT: support qwen3-reranker by @qinxuye in #3627

Enhancements

ENH: Support pcm response_format by @codingl2k1 in #3606

Bug fixes

BUG: Fix dependency by @codingl2k1 in #3566
BUG: Fix cmdline by @codingl2k1 in #3589
BUG: fix potential hang for sglang by @qinxuye in #3597
BUG: [UI] fixed the mobile language switching bug. by @yiboyasss in #3608
BUG: Fix the error when using Qwen function call with Spring AI. by @aniya105 in #3614

Documentation

DOC: update links by @qinxuye in #3565
DOC: Update CosyVoice doc by @codingl2k1 in #3605
DOC: update models by @qinxuye in #3628

Others

FIX: [UI] fix model_engine parameter bug. by @yiboyasss in #3620

New Contributors

@aniya105 made their first contribution in #3614

Full Changelog: v1.6.1...v1.7.0

Contributors

qinxuye, leslie2046, and 5 other contributors

Assets 2

Releases: xorbitsai/inference

v1.10.0

What's new in 1.10.0 (2025-09-13)

New features

Enhancements

Bug fixes

New Contributors

Contributors

Uh oh!

v1.9.1

What's new in 1.9.1 (2025-08-30)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors

Uh oh!

v1.9.0

What's new in 1.9.0 (2025-08-16)

New features

Enhancements

Bug fixes

Documentation

Others

Contributors

Uh oh!

v1.8.1

What's new in 1.8.1 (2025-08-03)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors

Uh oh!

v1.8.1.rc1

What's new in 1.8.1.rc1 (2025-08-03)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors

Uh oh!

v1.8.0

What's new in 1.8.0 (2025-07-20)

New features

Enhancements

Bug fixes

Documentation

Contributors

Uh oh!

v1.7.1.post1

What's new in 1.7.1.post1 (2025-06-30)

Enhancements

Contributors

Uh oh!

v1.7.1

What's new in 1.7.1 (2025-06-27)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

Uh oh!

v1.7.0.post1

What's new in 1.7.0.post1 (2025-06-13)

Bug fixes

Contributors

Uh oh!

v1.7.0

What's new in 1.7.0 (2025-06-13)

New features

Enhancements

Bug fixes