[KG-134] add LLModel contextLength and maxOutputTokens #438

ptitjes · 2025-07-14T12:08:26Z

This PR adds two new attributes to LLModel. cf. KG-134

I also:

added everywhere I could links to the model cards, in order to facilitate future attribute additions.
reused the model definitions of Anthropic and Google in the OpenRouter and BedRock model definitions.

Type of the change

New feature
Bug fix
Documentation fix
Tests improvement

Checklist for all pull requests

The pull request has a description of the proposed change
I read the Contributing Guidelines before opening the pull request
The pull request uses develop as the base branch
Tests for the changes have been added
All new and existing tests passed

Additional steps for pull requests adding a new feature

An issue describing the proposed change exists
The pull request includes a link to the issue
The change was discussed and approved in the issue
Docs have been added / updated

Ololoshechkin

Thank you so much for working on this!

Just a few questions:

Why contextLength is nullable? Are there any models that don’t have it?
Why some Ollama models (like LLAMA_3_GROK_TOOL_USE_8B and others) have contextLength but do not have maxOutputTokens ? I would assume they should have some output tokens limit?

ptitjes · 2025-07-14T19:52:14Z

Why contextLength is nullable? Are there any models that don’t have it?

~~Since the initial push, I found the context length for some that I was missing:~~

~~The Anthropic Claude Instant model in the BedrockModels~~
~~The Ollama embedding models~~

~~I will add them soon.~~ DONE

But there are still some models for which I wasn't able to find the context length:

The OpenAI moderation models
The OpenAI embedding models

I might need some help with those, and for the models defined in tests directly.

Now, there is the problem of the models pulled directly from the Ollama registry. I defined the OllamaModelCard.contextLength as nullable because it is not clear from the go code if it always present. As I never stumbled upon a model pulled from Ollama without a context length, I think I will just throw an error if it is not present. Tell me what you think.

If we resolve those two problems (missing OpenAI context length and the Ollama pulled models context length possibly null), then I will be able to make it non-nullable.

Why some Ollama models (like LLAMA_3_GROK_TOOL_USE_8B and others) have contextLength but do not have maxOutputTokens ? I would assume they should have some output tokens limit?

Now, that is a whole other story. From my understanding of LLMs, there is nothing that mandates having a max generated output tokens. I believe this is just something done by APIs to limit the output, maybe as some form of security.

I mean, the context length is the size of the context window, and there is nothing preventing to roll that window forever.

So, for example, there is no max output tokens for any of the Ollama models.

I propose we let that field optional.

NOTE: There is in Ollama API a num_predict request parameter, for the user to limit the output. And also an equivalent one, in the Anthropic, Google and OpenAI APIs. So I propose to add a maxOutputTokens to LLMParams, to model that. (This would be separate from the ContextWindowStrategy/ContextTruncationStrategy needed for Ollama, that will then only handle num_ctx.)

Ololoshechkin · 2025-07-14T23:57:35Z

@ptitjes ,

https://cookbook.openai.com/examples/embedding_long_inputs

The text-embedding-3-small model has a context length of 8191 tokens with the cl100k_base encoding, and we can see that going over that limit causes an error.

As for the 3-large, I couldn't find the official answer, but on 3rd party resources they say:

3-large embeds up to 3072 dimensions (compared to ada's 1536) and is about 40% more powerful.
3-large context window is 8191 tokens, same as ada. Approximately 6000 words.

Ao basically it's also 8191

Ololoshechkin · 2025-07-15T00:05:05Z

As for the moderation models, it's funny because I couldn't find it in any resources myself, but ChatGPT found this link:

https://danavan.ai/docs/models/?utm_source=chatgpt.com

It's in some other language but if you translate it:

omni‑moderation models : 32,768 tokens

text-moderation -- also same 32768

Ololoshechkin · 2025-07-15T00:05:05Z

As for the moderation models, it's funny because I couldn't find it in any resources myself, but ChatGPT found this link:

https://danavan.ai/docs/models/?utm_source=chatgpt.com

It's in some other language but if you translate it:

omni‑moderation models : 32,768 tokens

text-moderation -- also same 32768

Ololoshechkin · 2025-07-15T00:06:21Z

For models defined in tests please feel free to set everything to some random number I guess if it's for testing purposes only , like 1000, wdyt?

Ololoshechkin · 2025-07-15T00:16:40Z

As for the Ollama models -- looks like indeed for 3rd party models it's not required metadata fields (of course Meta or Mistral would include it, but some 3rd party providers might omit this field).

And then it's recommended to either fallback to some default value (2048 /4096) or find the actual base model it derives from....

So we either have to stay with nullable , or we can actually introduce a parameter for the client that would be either a single fallback size, or even a map LLModel -> Int (safer option).

Then we can throw exception in runtime if it's not provided by API and not even manually defined in this fallback map -- ans users would go and define some value consciously.

I still think it would be rather a rare case for some very specific models

WDYT?

ptitjes · 2025-07-15T05:56:46Z

I still think it would be rather a rare case for some very specific models

Yeah, I agree it would be rare.

I don't think having a map will be a good idea, for maintainance. Also, having a default (map) value from constructor might be confusing.

I would prefer that we have a default constant value. We add a warning log line indicating that we fallback to that value for model XYZ. And if ever the user needs to override it, they can always data-copy the LLModel to override it. Would that be OK?

ptitjes · 2025-07-15T10:57:50Z

I will finish this PR this evening.

Ololoshechkin · 2025-07-15T11:04:17Z

@ptitjes yes, let's try this approach and see how it goes -- we can always add more API later if we realize that map/default is required

kpavlov · 2025-07-16T04:53:00Z

Let's not forget adding an Open Telemetry span attribute gen_ai.request.max_tokens. Could be a separate PR

ptitjes · 2025-07-16T05:48:36Z

Let's not forget adding an Open Telemetry span attribute gen_ai.request.max_tokens. Could be a separate PR

It's unrelated to this specific PR. This is about model and provider capabilities.

I am not adding a maxOutputTokens to the LLMParams just yet.

ptitjes · 2025-07-18T09:02:29Z

Added all missing LLModel.contextLength
Made OllamaModelCard.toLLModel() log (warn) if the context length is undefined and use a default value
Made LLModel.contextLength non-nullable
Added dummy contextLength of 1000 to all LLModels defined in tests
Enhanced the contextLength and maxOutputTokens KDocs
Rebased on develop.

EugeneTheDev

Looks good, but I have questions about dependencies

EugeneTheDev · 2025-07-20T13:36:33Z

prompt/prompt-executor/prompt-executor-clients/prompt-executor-bedrock-client/build.gradle.kts

@@ -15,6 +15,7 @@ kotlin {
                api(project(":agents:agents-tools"))
                api(project(":agents:agents-utils"))
                api(project(":prompt:prompt-executor:prompt-executor-clients"))
+                implementation(project(":prompt:prompt-executor:prompt-executor-clients:prompt-executor-anthropic-client"))


Why did you add it to commonMain? It should be not needed here since we provide only JVM implementation for Bedrock, and this dependency is already included in the jvmMain block.

I answered for both your comments in the main thread, because this is the same answer.

EugeneTheDev · 2025-07-20T13:39:04Z

...t/prompt-executor/prompt-executor-clients/prompt-executor-openrouter-client/build.gradle.kts

@@ -15,6 +15,8 @@ kotlin {
                api(project(":agents:agents-tools"))
                api(project(":agents:agents-utils"))
                api(project(":prompt:prompt-executor:prompt-executor-clients"))
+                implementation(project(":prompt:prompt-executor:prompt-executor-clients:prompt-executor-anthropic-client"))


Why did you add dependencies on Anthropic and Google clients? It should be independent from these clients

I answered for both your comments in the main thread, because this is the same answer.

ptitjes · 2025-07-20T14:07:29Z

Hey Andrey, thanks a lot for your review.

Yes, I consciously added the dependencies (note it is implementation only) in order to reuse model definitions from the Anthropic and Google modules. You can notice that I .copy(...) most of the Anthropic and Google model definitions in the Bedrock and OpenRouter modules.

I expect we will add some more attributes to LLModel in the future and it will be very difficult to maintain them in sync. (You might see from the diff that there were already some discrepancies.)

Some LLModel attributes are model-specific (like contextLength) and some are provider-specific (like maxOutputTokens, or sometimes capabilities due to deployment differencies), but IMO being able to copy the original definitions will let us keep the model definitions in check.

I don't think that those additional dependencies are a problem, because all the LLMClients are rather lightweight and implemented using Ktor and thus adding them will not incur additional libraries for the users. Also, I carefully put them as implementation dependencies in order to not uselessly pollute the user's namespace.

I hope this makes sense.

EugeneTheDev

Ok, makes sense, thank you. I think we can merge now.

Ololoshechkin · 2025-07-22T23:54:31Z

@EugeneTheDev I think you should press the "merge" button by yourself then :)

Ololoshechkin approved these changes Jul 14, 2025

View reviewed changes

Ololoshechkin requested review from tiginamaria and EugeneTheDev July 14, 2025 19:27

EugeneTheDev approved these changes Jul 15, 2025

View reviewed changes

ptitjes marked this pull request as draft July 15, 2025 10:58

ptitjes added 7 commits July 18, 2025 10:58

feat(clients): add LLModel contextLength and maxOutputTokens

fa9d4c1

feat(clients): add some missing model contextLength

e4c099a

feat(clients): add some missing OpenAI model contextLength

6eb71fd

feat(ollama): log null contextLength and add fallback value

0b0527d

feat(clients): make LLModel contextLength non-nullable

d149fb7

feat(clients): enhance LLModel contextLength and maxOutputToken KDocs

741f168

test(clients): set dummy contextLength for LLModels defined in test

2fe3a16

ptitjes force-pushed the feature/KG-134-model-context-length branch from 6693727 to 2fe3a16 Compare July 18, 2025 08:58

ptitjes marked this pull request as ready for review July 18, 2025 09:02

feat(clients): add missing contextLength on newly-added Ollama model

9c9f0ee

EugeneTheDev requested changes Jul 20, 2025

View reviewed changes

EugeneTheDev approved these changes Jul 20, 2025

View reviewed changes

EugeneTheDev merged commit fde9ac9 into JetBrains:develop Jul 23, 2025
4 of 5 checks passed

ptitjes deleted the feature/KG-134-model-context-length branch July 23, 2025 10:48

slawa4s pushed a commit that referenced this pull request Jul 31, 2025

[KG-134] add LLModel contextLength and maxOutputTokens (#438)

b48fc70

[KG-134] add LLModel contextLength and maxOutputTokens #438

[KG-134] add LLModel contextLength and maxOutputTokens #438

Uh oh!

Conversation

ptitjes commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of the change

Checklist for all pull requests

Additional steps for pull requests adding a new feature

Uh oh!

Ololoshechkin left a comment

Choose a reason for hiding this comment

Uh oh!

ptitjes commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ololoshechkin commented Jul 14, 2025

Uh oh!

Ololoshechkin commented Jul 15, 2025

Uh oh!

Ololoshechkin commented Jul 15, 2025

Uh oh!

Ololoshechkin commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ololoshechkin commented Jul 15, 2025

Uh oh!

ptitjes commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ptitjes commented Jul 15, 2025

Uh oh!

Ololoshechkin commented Jul 15, 2025

Uh oh!

kpavlov commented Jul 16, 2025

Uh oh!

ptitjes commented Jul 16, 2025

Uh oh!

ptitjes commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EugeneTheDev left a comment

Choose a reason for hiding this comment

Uh oh!

EugeneTheDev Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

ptitjes Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

EugeneTheDev Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

ptitjes Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

ptitjes commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EugeneTheDev left a comment

Choose a reason for hiding this comment

Uh oh!

Ololoshechkin commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

ptitjes commented Jul 14, 2025 •

edited

Loading

ptitjes commented Jul 14, 2025 •

edited

Loading

Ololoshechkin commented Jul 15, 2025 •

edited

Loading

ptitjes commented Jul 15, 2025 •

edited

Loading

ptitjes commented Jul 18, 2025 •

edited

Loading

ptitjes commented Jul 20, 2025 •

edited

Loading