-
Notifications
You must be signed in to change notification settings - Fork 195
[KG-134] add LLModel contextLength and maxOutputTokens #438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KG-134] add LLModel contextLength and maxOutputTokens #438
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for working on this!
Just a few questions:
- Why contextLength is nullable? Are there any models that don’t have it?
- Why some Ollama models (like LLAMA_3_GROK_TOOL_USE_8B and others) have contextLength but do not have maxOutputTokens ? I would assume they should have some output tokens limit?
But there are still some models for which I wasn't able to find the context length:
I might need some help with those, and for the models defined in tests directly. Now, there is the problem of the models pulled directly from the Ollama registry. I defined the If we resolve those two problems (missing OpenAI context length and the Ollama pulled models context length possibly null), then I will be able to make it non-nullable.
Now, that is a whole other story. From my understanding of LLMs, there is nothing that mandates having a max generated output tokens. I believe this is just something done by APIs to limit the output, maybe as some form of security. I mean, the context length is the size of the context window, and there is nothing preventing to roll that window forever. So, for example, there is no max output tokens for any of the Ollama models. I propose we let that field optional. NOTE: There is in Ollama API a |
@ptitjes , https://cookbook.openai.com/examples/embedding_long_inputs
As for the 3-large, I couldn't find the official answer, but on 3rd party resources they say:
Ao basically it's also 8191 |
As for the moderation models, it's funny because I couldn't find it in any resources myself, but ChatGPT found this link: https://danavan.ai/docs/models/?utm_source=chatgpt.com It's in some other language but if you translate it: omni‑moderation models : 32,768 tokens text-moderation -- also same 32768 |
1 similar comment
As for the moderation models, it's funny because I couldn't find it in any resources myself, but ChatGPT found this link: https://danavan.ai/docs/models/?utm_source=chatgpt.com It's in some other language but if you translate it: omni‑moderation models : 32,768 tokens text-moderation -- also same 32768 |
For models defined in tests please feel free to set everything to some random number I guess if it's for testing purposes only , like 1000, wdyt? |
As for the Ollama models -- looks like indeed for 3rd party models it's not required metadata fields (of course Meta or Mistral would include it, but some 3rd party providers might omit this field). And then it's recommended to either fallback to some default value (2048 /4096) or find the actual base model it derives from.... So we either have to stay with nullable , or we can actually introduce a parameter for the client that would be either a single fallback size, or even a map LLModel -> Int (safer option). Then we can throw exception in runtime if it's not provided by API and not even manually defined in this fallback map -- ans users would go and define some value consciously. I still think it would be rather a rare case for some very specific models WDYT? |
Yeah, I agree it would be rare. I don't think having a map will be a good idea, for maintainance. Also, having a default (map) value from constructor might be confusing. I would prefer that we have a default constant value. We add a warning log line indicating that we fallback to that value for model XYZ. And if ever the user needs to override it, they can always data-copy the LLModel to override it. Would that be OK? |
I will finish this PR this evening. |
@ptitjes yes, let's try this approach and see how it goes -- we can always add more API later if we realize that map/default is required |
Let's not forget adding an Open Telemetry span attribute gen_ai.request.max_tokens. Could be a separate PR |
It's unrelated to this specific PR. This is about model and provider capabilities. I am not adding a |
6693727
to
2fe3a16
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I have questions about dependencies
@@ -15,6 +15,7 @@ kotlin { | |||
api(project(":agents:agents-tools")) | |||
api(project(":agents:agents-utils")) | |||
api(project(":prompt:prompt-executor:prompt-executor-clients")) | |||
implementation(project(":prompt:prompt-executor:prompt-executor-clients:prompt-executor-anthropic-client")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add it to commonMain
? It should be not needed here since we provide only JVM implementation for Bedrock, and this dependency is already included in the jvmMain
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I answered for both your comments in the main thread, because this is the same answer.
@@ -15,6 +15,8 @@ kotlin { | |||
api(project(":agents:agents-tools")) | |||
api(project(":agents:agents-utils")) | |||
api(project(":prompt:prompt-executor:prompt-executor-clients")) | |||
implementation(project(":prompt:prompt-executor:prompt-executor-clients:prompt-executor-anthropic-client")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add dependencies on Anthropic and Google clients? It should be independent from these clients
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I answered for both your comments in the main thread, because this is the same answer.
Hey Andrey, thanks a lot for your review. Yes, I consciously added the dependencies (note it is I expect we will add some more attributes to Some I don't think that those additional dependencies are a problem, because all the LLMClients are rather lightweight and implemented using Ktor and thus adding them will not incur additional libraries for the users. Also, I carefully put them as I hope this makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, makes sense, thank you. I think we can merge now.
@EugeneTheDev I think you should press the "merge" button by yourself then :) |
This PR adds two new attributes to LLModel. cf. KG-134
I also:
Type of the change
Checklist for all pull requests
develop
as the base branchAdditional steps for pull requests adding a new feature