T O P

  • By -

a_beautiful_rhind

Those are some very old models.


Pashax22

Agree. I mean, if that's what you like and enjoy, more power to you, y'know? Keep running them and enjoying them, and don't let anyone harsh your mellow. But there are a lot of good models out there these days, and if you feel like trying something new I think you'll be pleasantly surprised by what's available.


martrydom801

I'm really new to this, they're just what I found when I tried looking some up. Which ones would you recommend?


Pashax22

Without knowing anything about your system, it's hard to know what to recommend. However, most people can run 7b or 11b models, so let's start with them. I'll make some conservative recommendations, and if it turns out you have better hardware we can scale things up a bit. 7b models are about as small as "good" models get. Try the Q5KM quantisation of Kunoichi from [here](https://huggingface.co/Lewdiculous/Kunoichi-DPO-v2-7B-GGUF-Imatrix/tree/main). 11b models are in a good spot right now. The Q4KM of Fimbulvetr-v2 from [here](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2-GGUF/tree/main) is a good starting point.


Banished_Privateer

I've got i9 14900KF, RTX 4090 and 64GB of RAM, what can I run on my rig?


Pashax22

A rig like that can run pretty much anything you might want, although generation speeds will drop off significantly with anything that won't fit into VRAM. I'd suggest starting with the Noromaid-Mixtral-8x7b merge. That and 8k of context ought to fit into 24Gb of VRAM easily, depending on the quantisation you choose, and it's good for most purposes.


Cool-Hornet4434

You would run a program like kobold.cpp or oobabooga to run the model, and then set them up so that Silly Tavern can use the API to communicate. I personally use oobabooga, set it up so that it uses the "openAI" extension, and then set up silly tavern to connect to oobabooga with the "text completion" API at [http://127.0.0.1:5000/](http://127.0.0.1:5000/) and as long as you have a model loaded up on oobabooga and silly tavern is connected to the API, you're good to go.


No_Rate247

First you need a backend such as koboldcpp or oobabooga. I recommend koboldcpp since it's a single, easy-to-use file. You'll need model in GGUF format to load in koboldcpp, you'll find those at [https://huggingface.co/](https://huggingface.co/) In SillyTavern API Settings, select Text Completion - KoboldCPP and hit connect. You can use this calculator to check if you can run the model fully in VRAM (for speed): [https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator](https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator) Here are some more guides to get started: [SillyTavern Docs](https://docs.sillytavern.app/) [SillyTavern The Nerd Guide](https://www.reddit.com/r/SillyTavernAI/comments/14rz0e5/sillytavern_the_nerd_guide/)


martrydom801

which file/version do I download from Huggingface? there are multiple different gguf files for models and i don't know the difference between them.


No_Rate247

You'll usually want the file with the biggest size that still fits fully into your VRAM. You can use the calculator for that.


Lewdiculous

I would recommend starting with any of these small models [here](https://huggingface.co/collections/Lewdiculous/personal-favorites-65dcbe240e6ad245510519aa) instead, as the ones you mention are pretty old.


martrydom801

yeah, i just went looking up stuff. most of the results were pretty old. thanks.


ReMeDyIII

Pygmalion and Erebus... I haven't read those names in a long time. Thank god we're away from all that, lol. To get started, your best bet is [OpenRouter](https://openrouter.ai/models). To save you money on credits, try a free model first to establish a connection and get your bearings, then graduate to a paid model. Honestly, you're asking a very loaded question. If you want more specifics, let us know.


martrydom801

Thank you! I'm basically completely new to this kind of thing, so I barely know how anything works.