Updated LLMEval and VLMEval with the new AsyncStream token generation. #256

Alessan-git · 2025-03-27T09:34:23Z

Summary

LLMEval and VLMEval apps now use the AsyncStream token generation.

Changes

LLMEval: generation logic in LLMEvaluator.
VLMEval: generation logic in VLMEvaluator.
GenerationCompletionInfo: made Sendable.

Evaluators now include a generationTask property to manage the generation process. They also track the token count within the loop, though this functionality could potentially be integrated into the generation methods in Evaluate. This could be achieved either by passing maxTokens as an argument or by incorporating it into GenerateParameters.

Note

I removed the displayEveryNTokens part as I didn’t notice any improvements from it; in fact, performance was sometimes better without it. In any case, for a sample application, I believe prioritizing clarity over performance is the better approach.

ronaldmannak · 2025-03-27T14:41:38Z

The displayEveryNTokens prevents faster models from overwhelming SwiftUI, which unfortunately does happen quite easily. It may not be a bad idea to leave it in to alert developers of this issue.
However, if we want to do it "right," it might make more sense to use time to limit to tokens per seconds using TimeInterval so it throttles only when needed.

davidkoski · 2025-03-27T15:11:20Z

The displayEveryNTokens prevents faster models from overwhelming SwiftUI, which unfortunately does happen quite easily. It may not be a bad idea to leave it in to alert developers of this issue. However, if we want to do it "right," it might make more sense to use time to limit to tokens per seconds using TimeInterval so it throttles only when needed.

I agree that doing it on a time basis makes the most sense.

davidkoski · 2025-03-27T16:40:56Z

As for doing the periodic display cleanly, I wonder if something like this might work:

https://github.com/apple/swift-async-algorithms/blob/main/Sources/AsyncAlgorithms/AsyncAlgorithms.docc/Guides/Throttle.md

It could potentially combine tokens for e.g. 0.25 seconds and then generate a small chunk that we would display. I think it would be nice to use composition like this to describe the effect and it would fit in nicely with an async sequence.

Alessan-git · 2025-03-27T18:15:38Z

I implemented a simple throttle method. I think it works fine:

for await batch in stream.throttle(for: 0.25) {
    for result in batch {
        switch result {
        case .token(let token):
            tokenCount += 1
            if tokenCount >= maxTokens { await generationTask?.cancel() }
            let text = context.tokenizer.decode(tokens: [token])
            Task { @MainActor in
                self.output += text
            }
        case .info(let info):
            Task { @MainActor in
                self.stat = "\(info.tokensPerSecond) tokens/s"
            }
        }
    }
}

ronaldmannak · 2025-03-27T18:48:00Z

Update: actually it does seem to batch automatically, I missed that initially, apologies.
correct me if I'm wrong, but doesn't this set the minimum throughput to 1 token every 0.25 seconds? What you want to do is still receive tokens as quickly as possible, buffer them, and then flush the buffer to the view every 0.25 seconds (or whatever interval you set)

Alessan-git · 2025-03-28T08:42:34Z

Yes, when throttled it returns [Generation] instead of Generation.

Even so, I’m not sure if adding too many of these "details" is appropriate for this library, as it seems more like something that each developer should adapt to their needs.

I prefer a thin layer, without many abstractions, that each developer can then adapt easily. But that's just me :)
Ideally, I think most people should write their own TokenIterator (or something similar) and even their own samplers and generation parameters. I find it too restrictive right now. That's why I decided to make some changes in previous PRs to make some models public.

That being said, the work done so far by everyone in this library establishes a solid foundation, and it’s exciting to see how this can empower developers!

Let me know if the changes feel right or need further polishing!

ronaldmannak · 2025-03-28T14:23:59Z

I believe the main issue lies in the blurred lines between the libraries and the example apps in the current repo. The eval app serves as a demo to show developers how to use the libraries. I think throttling should be part of the example app so it runs smoothly on all devices and models, and demonstrating to developers how to use the library.

Although it's a different discussion, separating the examples from the libraries could prevent a lot of confusion. We could keep them in a single repo, but with a clear distinction between the two. For instance, have a top-level source directory for Swift package sources and an example directory for the Xcode project, which would contain only the demo targets. I'm not sure about the CI or other internal tools at Apple prevent that approach, but I'd be happy to work on that.

davidkoski · 2025-03-28T15:36:22Z

I prefer a thin layer, without many abstractions, that each developer can then adapt easily. But that's just me :)
Ideally, I think most people should write their own TokenIterator (or something similar) and even their own samplers and generation parameters. I find it too restrictive right now. That's why I decided to make some changes in previous PRs to make some models public.

From issues & comments it seems like we have a mix of desires -- some people want a single method that does everything and others want a toolkit with everything exposed. The good news is I think we can accommodate both, to a certain extent (the latter more than the former I think).

I agree that the chunking piece is really a UI concern so it belongs in the integration in the example app. Making sure the right pieces are open or public will help a lot with the lower level pieces. Ideally you could implement your own samplers, etc. (right now) and use the pre-built TokenIterator if you wanted, or build the whole thing yourself.

Anyway, thank you for your efforts in opening things up. We need to community to both say what they want and help build it.

davidkoski · 2025-03-28T15:40:46Z

Although it's a different discussion, separating the examples from the libraries could prevent a lot of confusion. We could keep them in a single repo, but with a clear distinction between the two. For instance, have a top-level source directory for Swift package sources and an example directory for the Xcode project, which would contain only the demo targets. I'm not sure about the CI or other internal tools at Apple prevent that approach, but I'd be happy to work on that.

I think the only real limitation is that Package.swift be at the top level -- this is required for swiftpm to use it as an importable library. We can reorganize the examples however we would like. I don't know how most people consume this: as a swiftpm library or as an xcodeproj with examples. Or maybe both.

Alessan-git · 2025-03-28T19:10:03Z

Should I reintroduce the displayEveryNTokens property so everyone can tailor it to their needs? Or would it be better to incorporate a simple timer into the Evaluator?

The stream functionality works seamlessly in the example apps, which was the primary goal of the PR.

I really appreciate the engaging discussion we're having about the library. In my view, it should be a flexible toolkit—something lightweight yet robust, designed to make it easy to explore new ideas and support research.

davidkoski · 2025-04-08T19:39:25Z

Applications/LLMEval/ContentView.swift

+                        case .token(let token):
+                            tokenCount += 1
+                            if tokenCount >= maxTokens { await generationTask?.cancel() }
+                            let text = context.tokenizer.decode(tokens: [token])


With #260 this can be simplified a bit:

for await item in try MLXLMCommon.generate(input: input, parameters: generateParameters, context: context) { switch item { case .chunk(let string): print(string, terminator: "") fflush(stdout) case .info(let generateCompletionInfo): break } }

The conversion to String now happens inside the generator.

Do you want to update these call sites? I am happy to as well!

#Conflicts: # Applications/LLMEval/ContentView.swift # Applications/VLMEval/ContentView.swift

davidkoski

Great, thank you for the contribution!

Updated LLMEval and VLMEval with the new AsyncStream token generation.

8aa89fa

feat: throttle

aedce1d

ronaldmannak mentioned this pull request Apr 1, 2025

Decode token in generate() #260

Merged

davidkoski reviewed Apr 8, 2025

View reviewed changes

Alessan-git and others added 5 commits April 9, 2025 10:34

Merge branch 'ml-explore:main' into AsyncStream_Examples

93d3b20

Updated to receive string instead of int

702f878

Updated to receive string instead of int

5ad67c2

Merge remote-tracking branch 'refs/remotes/origin/AsyncStream_Examples'

fabc647

#Conflicts: # Applications/LLMEval/ContentView.swift # Applications/VLMEval/ContentView.swift

Merge branch 'main' into AsyncStream_Examples

d288911

davidkoski approved these changes Apr 9, 2025

View reviewed changes

davidkoski merged commit 289bb67 into ml-explore:main Apr 9, 2025
3 checks passed

davidkoski mentioned this pull request Apr 9, 2025

additional changes related to async eval #266

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updated LLMEval and VLMEval with the new AsyncStream token generation. #256

Updated LLMEval and VLMEval with the new AsyncStream token generation. #256

Uh oh!

Alessan-git commented Mar 27, 2025

Uh oh!

ronaldmannak commented Mar 27, 2025

Uh oh!

davidkoski commented Mar 27, 2025

Uh oh!

davidkoski commented Mar 27, 2025

Uh oh!

Alessan-git commented Mar 27, 2025

Uh oh!

ronaldmannak commented Mar 27, 2025 •

edited

Loading

Uh oh!

Alessan-git commented Mar 28, 2025

Uh oh!

ronaldmannak commented Mar 28, 2025

Uh oh!

davidkoski commented Mar 28, 2025

Uh oh!

davidkoski commented Mar 28, 2025

Uh oh!

Alessan-git commented Mar 28, 2025

Uh oh!

davidkoski Apr 8, 2025

Uh oh!

davidkoski left a comment

Uh oh!

Uh oh!

Uh oh!

Updated LLMEval and VLMEval with the new AsyncStream token generation. #256

Updated LLMEval and VLMEval with the new AsyncStream token generation. #256

Uh oh!

Conversation

Alessan-git commented Mar 27, 2025

Summary

Changes

Note

Uh oh!

ronaldmannak commented Mar 27, 2025

Uh oh!

davidkoski commented Mar 27, 2025

Uh oh!

davidkoski commented Mar 27, 2025

Uh oh!

Alessan-git commented Mar 27, 2025

Uh oh!

ronaldmannak commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Alessan-git commented Mar 28, 2025

Uh oh!

ronaldmannak commented Mar 28, 2025

Uh oh!

davidkoski commented Mar 28, 2025

Uh oh!

davidkoski commented Mar 28, 2025

Uh oh!

Alessan-git commented Mar 28, 2025

Uh oh!

davidkoski Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

davidkoski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ronaldmannak commented Mar 27, 2025 •

edited

Loading