Skip to content

Conversation

ptitjes
Copy link
Contributor

@ptitjes ptitjes commented Jul 14, 2025

This PR adds two new attributes to LLModel. cf. KG-134

I also:

  • added everywhere I could links to the model cards, in order to facilitate future attribute additions.
  • reused the model definitions of Anthropic and Google in the OpenRouter and BedRock model definitions.

Type of the change

  • New feature
  • Bug fix
  • Documentation fix
  • Tests improvement

Checklist for all pull requests

  • The pull request has a description of the proposed change
  • I read the Contributing Guidelines before opening the pull request
  • The pull request uses develop as the base branch
  • Tests for the changes have been added
  • All new and existing tests passed
Additional steps for pull requests adding a new feature
  • An issue describing the proposed change exists
  • The pull request includes a link to the issue
  • The change was discussed and approved in the issue
  • Docs have been added / updated

Copy link
Collaborator

@Ololoshechkin Ololoshechkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for working on this!

Just a few questions:

  1. Why contextLength is nullable? Are there any models that don’t have it?
  2. Why some Ollama models (like LLAMA_3_GROK_TOOL_USE_8B and others) have contextLength but do not have maxOutputTokens ? I would assume they should have some output tokens limit?

@ptitjes
Copy link
Contributor Author

ptitjes commented Jul 14, 2025

  1. Why contextLength is nullable? Are there any models that don’t have it?

Since the initial push, I found the context length for some that I was missing:

  • The Anthropic Claude Instant model in the BedrockModels
  • The Ollama embedding models

I will add them soon. DONE

But there are still some models for which I wasn't able to find the context length:

  • The OpenAI moderation models
  • The OpenAI embedding models

I might need some help with those, and for the models defined in tests directly.

Now, there is the problem of the models pulled directly from the Ollama registry. I defined the OllamaModelCard.contextLength as nullable because it is not clear from the go code if it always present. As I never stumbled upon a model pulled from Ollama without a context length, I think I will just throw an error if it is not present. Tell me what you think.

If we resolve those two problems (missing OpenAI context length and the Ollama pulled models context length possibly null), then I will be able to make it non-nullable.

  1. Why some Ollama models (like LLAMA_3_GROK_TOOL_USE_8B and others) have contextLength but do not have maxOutputTokens ? I would assume they should have some output tokens limit?

Now, that is a whole other story. From my understanding of LLMs, there is nothing that mandates having a max generated output tokens. I believe this is just something done by APIs to limit the output, maybe as some form of security.

I mean, the context length is the size of the context window, and there is nothing preventing to roll that window forever.

So, for example, there is no max output tokens for any of the Ollama models.

I propose we let that field optional.

NOTE: There is in Ollama API a num_predict request parameter, for the user to limit the output. And also an equivalent one, in the Anthropic, Google and OpenAI APIs. So I propose to add a maxOutputTokens to LLMParams, to model that. (This would be separate from the ContextWindowStrategy/ContextTruncationStrategy needed for Ollama, that will then only handle num_ctx.)

@Ololoshechkin
Copy link
Collaborator

@ptitjes ,

https://cookbook.openai.com/examples/embedding_long_inputs

The text-embedding-3-small model has a context length of 8191 tokens with the cl100k_base encoding, and we can see that going over that limit causes an error.

As for the 3-large, I couldn't find the official answer, but on 3rd party resources they say:

3-large embeds up to 3072 dimensions (compared to ada's 1536) and is about 40% more powerful.
3-large context window is 8191 tokens, same as ada. Approximately 6000 words.

Ao basically it's also 8191

@Ololoshechkin
Copy link
Collaborator

As for the moderation models, it's funny because I couldn't find it in any resources myself, but ChatGPT found this link:

https://danavan.ai/docs/models/?utm_source=chatgpt.com

It's in some other language but if you translate it:

omni‑moderation models : 32,768 tokens

text-moderation -- also same 32768

1 similar comment
@Ololoshechkin
Copy link
Collaborator

As for the moderation models, it's funny because I couldn't find it in any resources myself, but ChatGPT found this link:

https://danavan.ai/docs/models/?utm_source=chatgpt.com

It's in some other language but if you translate it:

omni‑moderation models : 32,768 tokens

text-moderation -- also same 32768

@Ololoshechkin
Copy link
Collaborator

Ololoshechkin commented Jul 15, 2025

For models defined in tests please feel free to set everything to some random number I guess if it's for testing purposes only , like 1000, wdyt?

@Ololoshechkin
Copy link
Collaborator

As for the Ollama models -- looks like indeed for 3rd party models it's not required metadata fields (of course Meta or Mistral would include it, but some 3rd party providers might omit this field).

And then it's recommended to either fallback to some default value (2048 /4096) or find the actual base model it derives from....

So we either have to stay with nullable , or we can actually introduce a parameter for the client that would be either a single fallback size, or even a map LLModel -> Int (safer option).

Then we can throw exception in runtime if it's not provided by API and not even manually defined in this fallback map -- ans users would go and define some value consciously.

I still think it would be rather a rare case for some very specific models

WDYT?

@ptitjes
Copy link
Contributor Author

ptitjes commented Jul 15, 2025

I still think it would be rather a rare case for some very specific models

Yeah, I agree it would be rare.

I don't think having a map will be a good idea, for maintainance. Also, having a default (map) value from constructor might be confusing.

I would prefer that we have a default constant value. We add a warning log line indicating that we fallback to that value for model XYZ. And if ever the user needs to override it, they can always data-copy the LLModel to override it. Would that be OK?

@ptitjes
Copy link
Contributor Author

ptitjes commented Jul 15, 2025

I will finish this PR this evening.

@ptitjes ptitjes marked this pull request as draft July 15, 2025 10:58
@Ololoshechkin
Copy link
Collaborator

@ptitjes yes, let's try this approach and see how it goes -- we can always add more API later if we realize that map/default is required

@kpavlov
Copy link
Collaborator

kpavlov commented Jul 16, 2025

Let's not forget adding an Open Telemetry span attribute gen_ai.request.max_tokens. Could be a separate PR

@ptitjes
Copy link
Contributor Author

ptitjes commented Jul 16, 2025

Let's not forget adding an Open Telemetry span attribute gen_ai.request.max_tokens. Could be a separate PR

It's unrelated to this specific PR. This is about model and provider capabilities.

I am not adding a maxOutputTokens to the LLMParams just yet.

@ptitjes ptitjes force-pushed the feature/KG-134-model-context-length branch from 6693727 to 2fe3a16 Compare July 18, 2025 08:58
@ptitjes
Copy link
Contributor Author

ptitjes commented Jul 18, 2025

  • Added all missing LLModel.contextLength
  • Made OllamaModelCard.toLLModel() log (warn) if the context length is undefined and use a default value
  • Made LLModel.contextLength non-nullable
  • Added dummy contextLength of 1000 to all LLModels defined in tests
  • Enhanced the contextLength and maxOutputTokens KDocs
  • Rebased on develop.

@ptitjes ptitjes marked this pull request as ready for review July 18, 2025 09:02
Copy link
Contributor

@EugeneTheDev EugeneTheDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I have questions about dependencies

@@ -15,6 +15,7 @@ kotlin {
api(project(":agents:agents-tools"))
api(project(":agents:agents-utils"))
api(project(":prompt:prompt-executor:prompt-executor-clients"))
implementation(project(":prompt:prompt-executor:prompt-executor-clients:prompt-executor-anthropic-client"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add it to commonMain? It should be not needed here since we provide only JVM implementation for Bedrock, and this dependency is already included in the jvmMain block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I answered for both your comments in the main thread, because this is the same answer.

@@ -15,6 +15,8 @@ kotlin {
api(project(":agents:agents-tools"))
api(project(":agents:agents-utils"))
api(project(":prompt:prompt-executor:prompt-executor-clients"))
implementation(project(":prompt:prompt-executor:prompt-executor-clients:prompt-executor-anthropic-client"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add dependencies on Anthropic and Google clients? It should be independent from these clients

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I answered for both your comments in the main thread, because this is the same answer.

@ptitjes
Copy link
Contributor Author

ptitjes commented Jul 20, 2025

Hey Andrey, thanks a lot for your review.

Yes, I consciously added the dependencies (note it is implementation only) in order to reuse model definitions from the Anthropic and Google modules. You can notice that I .copy(...) most of the Anthropic and Google model definitions in the Bedrock and OpenRouter modules.

I expect we will add some more attributes to LLModel in the future and it will be very difficult to maintain them in sync. (You might see from the diff that there were already some discrepancies.)

Some LLModel attributes are model-specific (like contextLength) and some are provider-specific (like maxOutputTokens, or sometimes capabilities due to deployment differencies), but IMO being able to copy the original definitions will let us keep the model definitions in check.

I don't think that those additional dependencies are a problem, because all the LLMClients are rather lightweight and implemented using Ktor and thus adding them will not incur additional libraries for the users. Also, I carefully put them as implementation dependencies in order to not uselessly pollute the user's namespace.

I hope this makes sense.

Copy link
Contributor

@EugeneTheDev EugeneTheDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense, thank you. I think we can merge now.

@Ololoshechkin
Copy link
Collaborator

@EugeneTheDev I think you should press the "merge" button by yourself then :)

@EugeneTheDev EugeneTheDev merged commit fde9ac9 into JetBrains:develop Jul 23, 2025
4 of 5 checks passed
@ptitjes ptitjes deleted the feature/KG-134-model-context-length branch July 23, 2025 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants