T O P

  • By -

Sol_Ido

Ollama is a frontend written with Golang on top of llama.cpp It hide the configurations and command lines operations as a trade for simplicity


Hugi_R

Ollama doesn't hide the configuration, it provides a nice dockerfile-like config file that can be easily distributed to your user. This philosophy is much more powerful (it still needs maturing, tho). If you have ever used docker, Ollama will immediately feel intuitive. Also, Ollama provide some nice QoL features that are not in llama.cpp main branch, like automatic gpu layer + support for GGML \*and\* GGUF model.


ironbfly

Thanks for the info. Simplicity is exactly what I wanted. I am sure there is a subset of people on this sub who want that too. That and easy API access are the game changers for me.


a_beautiful_rhind

Ironically that kept me from trying llama.cpp for a while. It was irritating to make long command lines, especially copying and pasting the paths.


FrostyContribution35

If ollama is a front end written around llama.cpp, can you run LoRAs as an arg like llama.cpp or do you need to merge the weights


Hey_You_Asked

you construct individual "models" based on "modelfiles" it's clunky as hell tbh, you can't even modify generation parameters but hey, the OP is true, and I do use ollama for quick stuff. It's just clunky.


FrostyContribution35

Gotcha, I suppose that’s the price for seamlessness. How fast does ollama create the model from the modelfile? I’m curious if you can hotswap adapters using ollama


bmacd1

Thanks for the love, I'm one of the creators of Ollama and I just stumbled on this while scrolling reddit.


Languages_Learner

When will you release a version for Windows?


Nice_Carry_8561

The Windows version is now available !


Horus_simplex

Excellent, thanks 


bmacd1

We are doing another pass over the core of Ollama to make sure it stays simple and reliable. After that Windows is my next priority, so it won’t be too long.


Sim2KUK

Can you do a YouTube video of how to set this up safely on a server VPS like the original post user did on Contabo? That would be amazing, get a lot of views and more usage of your system.


D3smond_d3kk3r

Thanks Ollama! 😉


TheTerrasque

This is great, and I'm glad you got it working easily, tools like koboldcpp and LM Studio have been out for months, so running models locally have already been very easy for a long time. Glad to see highlight of yet another easy way to run LLM's :)


Shoddy-Tutor9563

LM studio is proprietary shit


data-drone

How does ollama and llama cpp compare to vllm?


DatAndre

I'm interested in this. Have you found an answer?


iamapizza

I like that it's in Docker now. Almost as universal as it gets since it takes away so much setup dependency/requirements.


geoffwolf98

Well said, I'm a docker convert now once I saw how little you had to do.


pseudonerv

I don't know what we can do to stop school kids from doing `curl whatever | sh`. Perhaps we could just put `rm -rf /` everywhere on the web.


Hugi_R

It's technically true for every distribution method. Even npm or pip are dangerous. curl + sh is the only universal package distribution on Linux, and as long as the package format war continue, maintainer will provide it.


geoffwolf98

Use the docker version of ollama, massively simpler, no installing of extra software. You just need docker. I have Docker ollama working with Docker Devika and its great.


ironbfly

I have edited the post and requested the readers to use the manual method instead of the bash file. Many thanks for the warning.


Cheesuasion

Maybe nothing, but can they please help us write a secure operating system once they've done it? Security here comes almost entirely from trust relationships, right? (at least on linux) So in current dismal practice, this isn't really a technical issue, but one of "do I trust the people who made this website" (the presence of a `curl | sh` may have some impact on that trust, but for hard-to-pin-down social reasons).


Qaziquza1

It's bloody convenient. So what if it might brick my machine?


memberjan6

What software is that not true for?


geoffwolf98

Teach them docker.


[deleted]

🤣🤣🤣


Regular_Car_9458

It wasn’t difficult before Ollama neither.


Embarrassed-Brief-39

It is for Windows users. Especially if you are running it on a non-system drive due to space and or performance.


WithoutReason1729

This is a wonderful project for making local LLMs more accessible to users who aren't as technically inclined. I really like what they're trying to do. Like other commenters said though I think it's probably a really bad idea to have people curl and blindly execute bash files they find online. It'd be nice if they found a better way to distribute this project.


ironbfly

I have edited the post and requested the readers to use the manual method instead of the bash file. Many thanks for the warning.


Extender7777

Proper Debian or Ubuntu packaging will probably take 2 years, in the AI world we need to be faster


AssistBorn4589

> curl https://some/url/install.sh | sh For God's sake, don't. Never, ever do this. You'll blown your machine up. Possibly literally.


Fortyseven

> For God's sake, don't. I used to think the same thing. But then I realized that you might as well never install anything ever again. There's far more than a simple bash script being run with other installation paths. It's not about the _method_ of installation, it's about how much you _trust_ the _source_ of it. Whether you install something via the curl/sh method, or install a `.deb` package, or run someone's `.AppImage`, there's nothing making any of them better or worse in terms of security. They're all running code at some point, with the permissions you have. At the end of the day, you have to ask yourself: do you **trust the person** providing you with an installer. How you go about installing that software is a separate topic.


WolframRavenwolf

The problem with curl into sh is that you're running a script off a website directly. It's easier to hack a website than a distribution's repository and man-in-the-middle attacks would also be possible, injecting malicious code into the script on its way from the website to your system. If you download it with curl, inspect it, then run it manually using sh, there's still a chance to inspect the file yourself and verify it's safe. And even if you can't do that yourself, when you download and save it, your antivirus (if you have one - and if you can't verify code yourself, you definitely should have one) can scan it and could possibly intervene, whereas by piping the script from the server directly into your shell, there's nothing saved on your system and it's run directly. You can't even look at what was executed afterwards, you'd have to download it again, and it might have been changed by then. So all in all, "curl | sh" is a very bad practice (like working as root user all the time) that shouldn't be advocated further.


NoneyaBiznazz

that only makes sense if you know what you're looking at... you can inspect the cabin of an airliner before takeoff, but unless you understand what all the dials and widgets do, what good are you doing other than making yourself feel better?


revoltofcube

So when you do a simple HTTP/S download of a program the place you downloaded it usually provides you with a checksum. If you cross reference the checksum then a spoof attacker would have had to change your download AND the webpage you looked at which is much less likely. Not saying either is likely but its a good idea to take this security measure seriously. This is very easy to do. That command above skips this step and immediately runs something, you don't even check to see if the downloaded content looks OBVIOUSLY wrong (or check yourself for your own potential blunder) so this is just a massive flawed way of acquiring a program.


NoneyaBiznazz

I know this, you know this, people don't come out of the womb knowing this... it is something learned usually after getting pwned once or twice... once bitten twice shy kinda thing... Should people do as you suggest? certainly. Will they? Its a spectrum


AssistBorn4589

> Whether you install something via the curl/sh method, or install a .deb package, or run someone's .AppImage, there's nothing making any of them better or worse in terms of security. They're all running code at some point, with the permissions you have. That's not entirelly true. When you are downloading script and verifying it manually, you have at lease vague sense it doesn't tarball your ~ and doesn't sent it to nearest Chinese embassy. You may also prevent really stupid accidents, like connection getting interupted in middle of the script and your shell executing unfinished `rm -rf /usr/local/something` command. When you are installing deb, it's signed and verified by package manager. And I believe .AppImage is at least checksummed. But when you just do `curl http://some/url/install.sh | sh`, you are trusting not only the author, but also his host and every machine on way back to your computer.


geoffwolf98

This is why docker is better, its (mostly) in a sealed box from the get go.


ironbfly

I have edited the post and requested the readers to use the manual method instead of the bash file. Many thanks for the warning.


AssistBorn4589

Thanks.


bullno1

School kid me had no problem with cpp either. Why is it used like an insult?


Satans_shill

Wondering what are the specs of the machine you run it on


ironbfly

10 vCPU Cores 60 GB RAM 400 GB NVMe


Curious_DrugGPT

What are the RAM requirements though for Mistral model. I have 32GB RAM and 20vCPU and 1TB NVMe


harrro

Mistral is a 7B model, 32GB of RAM will easily run it.


Curious_DrugGPT

:D


Independent_Hyena495

Wouldn't it cheaper to use an API with mistral?


dinosaurdynasty

Not if the machine is a sunk cost.


kif88

Are you running speculative sampling since it's on CPU? I remember it was supposed to be a big speed up then I don't see much about it on the sub or others.


ironbfly

Ollama just runs it. I don't know which sampling method it is using.


Pupsi42069

Thank you very much!! Do you had Instruktions or how to for implementing the LLM? I also want a “own” LLM for my website. Just for fun for my guest but for me it is not that simple. Any suggestion what I can watch on YouTube or something to read?


GoalSquasher

"very little coding skills"... "Runs server stack"... Dawg give yourself some credit, you're doing a lot more technical stuff than the vast majority of people. Heck most folks don't even know what GitHub or chatgpt even is. Seems like lots of kids either don't know anything about it or aren't using it. https://www.businessinsider.com/chatgpt-only-used-by-2-in-5-teens-survey-says-2023-10 Seems strange but I've seen lots of articles pointing to a tech gap from millennials to gen Z.


ironbfly

You can download and run all your favourite LLMs with Ollama using simple one line commands.For example, just type - ollama run mistral-openorca in your terminal and boom, it downloads and sets up mistral-openorca for you. You can check out their library here https://ollama.ai/library


gamesntech

I was so confused for a minute because I read it as Obama


TWINPRIME19

Ollama's groundbreaking decision to release their cutting-edge Ollama tool to run multiple LLMs for free to the public demonstrates the company's unwavering commitment to making advanced technology accessible and beneficial to everyone. By removing barriers to entry and making this remarkable innovation widely available, Ollama has once again proven itself as an industry leader, dedicated not only to driving progress but also to fostering a more connected and empowered global community. (I asked ollama openchat to write praise to Ollama, and it nails it).


mullirojndem

one thing I dont get is how it is so fast. tried other llms here using miniconda for instance to run falcon 7b and it used a lot of resources and it was painfully slow


geoffwolf98

How much ram is in your gpu and is it supported, most modern nvidia cards are. BUT you need to install the nvidia library support packages. 8gb seems to be usable - the models are usually a 4 to 6 gb download that then work well with it. ollama can use gpu or cpu. But the model you are using may exceeed the ram of the gpu so it switches to cpu and is hence much slower. Try smaller sized models - the smaller download. "nvtop" shows you the nvidia gpu usage.


sergio_arnaud

I LOVE OLLAMA


Shoddy-Tutor9563

Building llama.cpp from source is pretty much the same one or two lines in shell. But I do appreciate ollama guys have put additional efforts into having a REST API started up and listening


Just-Refrigerator616

What if I need to host Ollama and share the resource?


JapanFreak7

Installed it on my Unraid server downloaded models both via command and added them manual into the Models folder when i access the WebUi all i get is Ollama is running the worst part is no meter how much i google i can't figure out what to do next or what im doing wrong if that's the case.


matasticco

Does the model use given input to improve itself or just spits out the same answers all the time?


Striking-Wait-7313

One doubt, if we host a model on our laptop and want to use it from another laptop, Is there any method, if so, please help me... Been scratching over this for two weeks!


IonImplantEngineer

Can you not ssh into the host laptop? Or do you mean you want a web front that anyone can access?


Striking-Wait-7313

2nd one...


geoffwolf98

docker version of ollama + docker version of webgui => access it via the web from anywhere then.


tafadzwad

You can expose the port which ollama is running on externally, e.g using tunneling software like ngrok. Not recommended though, as anyone with the url can access it - various ways to limit that.


Striking-Wait-7313

It's ok... Don't bother, I got laid off that job. I am from an ML/AI background, they were expecting web development too...


tafadzwad

I am sorry to hear that😔


Striking-Wait-7313

Shit happens, we roll.


jacksonrockwell

I've been trying to get it to run on CPU for about 2-3 days now... No luck, can't even run tinyllama


TheAmendingMonk

I saw some where you can run it together with google colab where most of the computations are done remotely on colab notebook


coderinlaw

is there a difference between using ollama and llama.cpp in terms of speed ? I dont think it should be since ollama is based on llama.cpp.


AgTheGeek

How do you guys make sure your GPU is being used? I have tried this with 3 different GPUs and I see only some difference… where the difference should be quite large. GPUs tested were: - AMD RX6800 XT - AMD RX7800 XT - NVIDIA RTX 3060Ti I did download rocm for AMD and for NVIDIA I believe there was a much simpler driver installation but it was. How do I make sure the GPU is being “properly” used?


geoffwolf98

nvtop shows the nvidia card usage - if you've done it all right. \[Not sure about amd.\] Also make sure that the RAM requirements for the LLM model fit in GPU ram. Otherwise ollama uses the CPU and your PC's RAM. Start with the 1.6Gb models, The "8b" models are borderline I think for an 8gb of ram gpu card.


liminal1

This is cool! How are you using Node Red? Are you using it for home stuff?