-
Notifications
You must be signed in to change notification settings - Fork 307
Updated LLMEval and VLMEval with the new AsyncStream token generation. #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The displayEveryNTokens prevents faster models from overwhelming SwiftUI, which unfortunately does happen quite easily. It may not be a bad idea to leave it in to alert developers of this issue. |
I agree that doing it on a time basis makes the most sense. |
As for doing the periodic display cleanly, I wonder if something like this might work: It could potentially combine tokens for e.g. 0.25 seconds and then generate a small chunk that we would display. I think it would be nice to use composition like this to describe the effect and it would fit in nicely with an async sequence. |
I implemented a simple throttle method. I think it works fine: for await batch in stream.throttle(for: 0.25) {
for result in batch {
switch result {
case .token(let token):
tokenCount += 1
if tokenCount >= maxTokens { await generationTask?.cancel() }
let text = context.tokenizer.decode(tokens: [token])
Task { @MainActor in
self.output += text
}
case .info(let info):
Task { @MainActor in
self.stat = "\(info.tokensPerSecond) tokens/s"
}
}
}
} |
Update: actually it does seem to batch automatically, I missed that initially, apologies. |
Yes, when throttled it returns Even so, I’m not sure if adding too many of these "details" is appropriate for this library, as it seems more like something that each developer should adapt to their needs. I prefer a thin layer, without many abstractions, that each developer can then adapt easily. But that's just me :) That being said, the work done so far by everyone in this library establishes a solid foundation, and it’s exciting to see how this can empower developers! Let me know if the changes feel right or need further polishing! |
I believe the main issue lies in the blurred lines between the libraries and the example apps in the current repo. The eval app serves as a demo to show developers how to use the libraries. I think throttling should be part of the example app so it runs smoothly on all devices and models, and demonstrating to developers how to use the library. Although it's a different discussion, separating the examples from the libraries could prevent a lot of confusion. We could keep them in a single repo, but with a clear distinction between the two. For instance, have a top-level source directory for Swift package sources and an example directory for the Xcode project, which would contain only the demo targets. I'm not sure about the CI or other internal tools at Apple prevent that approach, but I'd be happy to work on that. |
From issues & comments it seems like we have a mix of desires -- some people want a single method that does everything and others want a toolkit with everything exposed. The good news is I think we can accommodate both, to a certain extent (the latter more than the former I think). I agree that the chunking piece is really a UI concern so it belongs in the integration in the example app. Making sure the right pieces are Anyway, thank you for your efforts in opening things up. We need to community to both say what they want and help build it. |
I think the only real limitation is that |
Should I reintroduce the displayEveryNTokens property so everyone can tailor it to their needs? Or would it be better to incorporate a simple timer into the Evaluator? The stream functionality works seamlessly in the example apps, which was the primary goal of the PR. I really appreciate the engaging discussion we're having about the library. In my view, it should be a flexible toolkit—something lightweight yet robust, designed to make it easy to explore new ideas and support research. |
case .token(let token): | ||
tokenCount += 1 | ||
if tokenCount >= maxTokens { await generationTask?.cancel() } | ||
let text = context.tokenizer.decode(tokens: [token]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With #260 this can be simplified a bit:
for await item in try MLXLMCommon.generate(input: input, parameters: generateParameters, context: context) {
switch item {
case .chunk(let string):
print(string, terminator: "")
fflush(stdout)
case .info(let generateCompletionInfo):
break
}
}
The conversion to String now happens inside the generator.
Do you want to update these call sites? I am happy to as well!
#Conflicts: # Applications/LLMEval/ContentView.swift # Applications/VLMEval/ContentView.swift
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thank you for the contribution!
Summary
LLMEval and VLMEval apps now use the AsyncStream token generation.
Changes
Evaluators now include a generationTask property to manage the generation process. They also track the token count within the loop, though this functionality could potentially be integrated into the generation methods in Evaluate. This could be achieved either by passing maxTokens as an argument or by incorporating it into GenerateParameters.
Note
I removed the displayEveryNTokens part as I didn’t notice any improvements from it; in fact, performance was sometimes better without it. In any case, for a sample application, I believe prioritizing clarity over performance is the better approach.