Replies: 6 comments 22 replies
-
Absolutely - Totally happy to put that in, mind opening an issue? |
Beta Was this translation helpful? Give feedback.
-
Gotcha no worries - Mistral should fit, maybe that's worth a try?
Also if you want to make a section in the readme with what models work well
for you that would be awesome. (Similar to the 72 and 20 GiB sections). I'd
like to give folks with less vram some recommended models.
…On Sat, Jun 22, 2024, 12:52 Logge ***@***.***> wrote:
Do the new features do what you had in mind?
Looks perfect. Will have to test with a proper model tho. Only got 12GB of
VRAM and have to offload most things. So I'm mainly using llama3:8b.
—
Reply to this email directly, view it on GitHub
<#2 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALXHV5LLUKRCFRSDBXPFDSDZIXIWRAVCNFSM6AAAAABJXSTKEKVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TQNBYGY4TM>
.
You are receiving this because you commented.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
Gotcha, I didn't pick the 20GB models very well... I truthfully just picked
some of the top models on ollama.
It would be great to see what fine tunes work best! Although, if needed,
I'm happy to fine-tune some of those 8-14b models as I can fit those in
vram.
Curious to see how the proprietary models compare. If needed, I'm happy to
test one or two of your prompts so you can compare with the proprietary
models.
…On Sat, Jun 22, 2024, 18:25 Logge ***@***.***> wrote:
Yeah I'm mainly orienting myself at the lmsys leaderboard. Llama3 is
insanely good for it's size. There are some roleplay finetunes which might
work out well. I'll have to test. In my tests it outperforms the small
mixtral. Ollama also quantizes the models by default. Your 20GB section
should easily fit in cards up to 10GB if you don't load all models at once.
—
Reply to this email directly, view it on GitHub
<#2 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALXHV5IBSFUWKVXDSRTYOW3ZIYPYRAVCNFSM6AAAAABJXSTKEKVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TQNBZGU2DM>
.
You are receiving this because you commented.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
Awesome! I just made you a collaborator on the repo.
No worries about code quality - let's do the open issue and new branch
approach and we should be good to go.
Regarding cloud models, that sounds good, maybe one of the first things we
do is open an issue on this repo and implement cloud models fully.
…On Sun, Jun 23, 2024, 05:55 Logge ***@***.***> wrote:
I'm unsure if my code quality is good enough to push directly, but I can
try.
Yeah LLaMA 400B will be insane but quite costly.
I'm mainly looking at the quality / cost chart:
https://artificialanalysis.ai/models
As you can see the pareto front spans between llama3 70b, gemini flash and
llama3 8b so you already did a pretty good choice of models.
So I'd say:
small hardware: llama3 8b
large hardware: llama3 70b
no hardware: gemini-flash / claude 3.5 sonnet
Might be an idea to implement cloud providers in a proper way in the
future. Right now my implementation is really hacky.
I'll keep my eyes open for good fine-tunes.
—
Reply to this email directly, view it on GitHub
<#2 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALXHV5N7K2QF7BE3SYYMNI3ZI3AVFAVCNFSM6AAAAABJXSTKEKVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TQNJRHA2TQ>
.
You are receiving this because you commented.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
Huh, I'd like to test it with llama70b.
What does your cloud model system look like?
…On Sun, Jun 23, 2024, 15:04 Logge ***@***.***> wrote:
I did a bit of testing and it seems like gemini-flash barely needs
corrections (and will succeed in one correction pass). It's rather a
problem with the small models (e.g. llama3:8b).
Will do some more testing tho.
—
Reply to this email directly, view it on GitHub
<#2 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALXHV5LE2VYRLQLDGTGEH3LZI5BATAVCNFSM6AAAAABJXSTKEKVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TQNJUGE3DO>
.
You are receiving this because you commented.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
In branch 18, I've started creating a proper interface wrapper, but I'm getting too tired tonight to finish integrating it properly. If you are able to see what I've done and finish it, that would be fantastic - no worries if not. I can try and get to it tomorrow afternoonish, but tomorrow is going to be a busy day for me. I've made an interface class that holds the client, and then allows for the generation history and such to be kept within that class. Not sure if that's a good idea or not, but it might simplify langchains. The client itself is still an OLLAMA object, but maybe you can have it be smarter than that. The rest of the code needs to be updated so it uses the interface or something similar - as right now It'll still try and use the old client (Which doesn't exist anymore). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
It would be quite cool to have a final model translate the story to another language. This could be a smaller model that gets fed a chapter and some general info and translate each chapter.
Beta Was this translation helpful? Give feedback.
All reactions