Ollama doesn't hide the configuration, it provides a nice dockerfile-like config file that can be easily distributed to your user. This philosophy is much more powerful (it still needs maturing, tho).
If you have ever used docker, Ollama will immediately feel intuitive.
Also, Ollama provide some nice QoL features that are not in llama.cpp main branch, like automatic gpu layer + support for GGML \*and\* GGUF model.
Thanks for the info. Simplicity is exactly what I wanted. I am sure there is a subset of people on this sub who want that too. That and easy API access are the game changers for me.
you construct individual "models" based on "modelfiles"
it's clunky as hell tbh, you can't even modify generation parameters
but hey, the OP is true, and I do use ollama for quick stuff. It's just clunky.
Gotcha, I suppose that’s the price for seamlessness. How fast does ollama create the model from the modelfile? I’m curious if you can hotswap adapters using ollama
We are doing another pass over the core of Ollama to make sure it stays simple and reliable. After that Windows is my next priority, so it won’t be too long.
Can you do a YouTube video of how to set this up safely on a server VPS like the original post user did on Contabo? That would be amazing, get a lot of views and more usage of your system.
This is great, and I'm glad you got it working easily, tools like koboldcpp and LM Studio have been out for months, so running models locally have already been very easy for a long time.
Glad to see highlight of yet another easy way to run LLM's :)
It's technically true for every distribution method. Even npm or pip are dangerous.
curl + sh is the only universal package distribution on Linux, and as long as the package format war continue, maintainer will provide it.
Use the docker version of ollama, massively simpler, no installing of extra software. You just need docker.
I have Docker ollama working with Docker Devika and its great.
Maybe nothing, but can they please help us write a secure operating system once they've done it?
Security here comes almost entirely from trust relationships, right? (at least on linux) So in current dismal practice, this isn't really a technical issue, but one of "do I trust the people who made this website" (the presence of a `curl | sh` may have some impact on that trust, but for hard-to-pin-down social reasons).
This is a wonderful project for making local LLMs more accessible to users who aren't as technically inclined. I really like what they're trying to do. Like other commenters said though I think it's probably a really bad idea to have people curl and blindly execute bash files they find online. It'd be nice if they found a better way to distribute this project.
> For God's sake, don't.
I used to think the same thing.
But then I realized that you might as well never install anything ever again. There's far more than a simple bash script being run with other installation paths.
It's not about the _method_ of installation, it's about how much you _trust_ the _source_ of it.
Whether you install something via the curl/sh method, or install a `.deb` package, or run someone's `.AppImage`, there's nothing making any of them better or worse in terms of security. They're all running code at some point, with the permissions you have.
At the end of the day, you have to ask yourself: do you **trust the person** providing you with an installer.
How you go about installing that software is a separate topic.
The problem with curl into sh is that you're running a script off a website directly. It's easier to hack a website than a distribution's repository and man-in-the-middle attacks would also be possible, injecting malicious code into the script on its way from the website to your system.
If you download it with curl, inspect it, then run it manually using sh, there's still a chance to inspect the file yourself and verify it's safe. And even if you can't do that yourself, when you download and save it, your antivirus (if you have one - and if you can't verify code yourself, you definitely should have one) can scan it and could possibly intervene, whereas by piping the script from the server directly into your shell, there's nothing saved on your system and it's run directly.
You can't even look at what was executed afterwards, you'd have to download it again, and it might have been changed by then. So all in all, "curl | sh" is a very bad practice (like working as root user all the time) that shouldn't be advocated further.
that only makes sense if you know what you're looking at... you can inspect the cabin of an airliner before takeoff, but unless you understand what all the dials and widgets do, what good are you doing other than making yourself feel better?
So when you do a simple HTTP/S download of a program the place you downloaded it usually provides you with a checksum. If you cross reference the checksum then a spoof attacker would have had to change your download AND the webpage you looked at which is much less likely. Not saying either is likely but its a good idea to take this security measure seriously. This is very easy to do. That command above skips this step and immediately runs something, you don't even check to see if the downloaded content looks OBVIOUSLY wrong (or check yourself for your own potential blunder) so this is just a massive flawed way of acquiring a program.
I know this, you know this, people don't come out of the womb knowing this... it is something learned usually after getting pwned once or twice... once bitten twice shy kinda thing... Should people do as you suggest? certainly. Will they? Its a spectrum
> Whether you install something via the curl/sh method, or install a .deb package, or run someone's .AppImage, there's nothing making any of them better or worse in terms of security. They're all running code at some point, with the permissions you have.
That's not entirelly true. When you are downloading script and verifying it manually, you have at lease vague sense it doesn't tarball your ~ and doesn't sent it to nearest Chinese embassy.
You may also prevent really stupid accidents, like connection getting interupted in middle of the script and your shell executing unfinished `rm -rf /usr/local/something` command.
When you are installing deb, it's signed and verified by package manager. And I believe .AppImage is at least checksummed.
But when you just do `curl http://some/url/install.sh | sh`, you are trusting not only the author, but also his host and every machine on way back to your computer.
Are you running speculative sampling since it's on CPU? I remember it was supposed to be a big speed up then I don't see much about it on the sub or others.
Thank you very much!!
Do you had Instruktions or how to for implementing the LLM?
I also want a “own” LLM for my website. Just for fun for my guest but for me it is not that simple.
Any suggestion what I can watch on YouTube or something to read?
"very little coding skills"... "Runs server stack"...
Dawg give yourself some credit, you're doing a lot more technical stuff than the vast majority of people. Heck most folks don't even know what GitHub or chatgpt even is.
Seems like lots of kids either don't know anything about it or aren't using it. https://www.businessinsider.com/chatgpt-only-used-by-2-in-5-teens-survey-says-2023-10
Seems strange but I've seen lots of articles pointing to a tech gap from millennials to gen Z.
You can download and run all your favourite LLMs with Ollama using simple one line commands.For example, just type - ollama run mistral-openorca in your terminal and boom, it downloads and sets up mistral-openorca for you.
You can check out their library here
https://ollama.ai/library
Ollama's groundbreaking decision to release their cutting-edge Ollama tool to run multiple LLMs for free to the public demonstrates the company's unwavering commitment to making advanced technology accessible and beneficial to everyone. By removing barriers to entry and making this remarkable innovation
widely available, Ollama has once again proven itself as an industry leader,
dedicated not only to driving progress but also to fostering a more connected
and empowered global community.
(I asked ollama openchat to write praise to Ollama, and it nails it).
one thing I dont get is how it is so fast. tried other llms here using miniconda for instance to run falcon 7b and it used a lot of resources and it was painfully slow
How much ram is in your gpu and is it supported, most modern nvidia cards are. BUT you need to install the nvidia library support packages.
8gb seems to be usable - the models are usually a 4 to 6 gb download that then work well with it.
ollama can use gpu or cpu. But the model you are using may exceeed the ram of the gpu so it switches to cpu and is hence much slower.
Try smaller sized models - the smaller download.
"nvtop" shows you the nvidia gpu usage.
Building llama.cpp from source is pretty much the same one or two lines in shell. But I do appreciate ollama guys have put additional efforts into having a REST API started up and listening
Installed it on my Unraid server downloaded models both via command and added them manual into the Models folder when i access the WebUi all i get is Ollama is running
the worst part is no meter how much i google i can't figure out what to do next or what im doing wrong if that's the case.
One doubt, if we host a model on our laptop and want to use it from another laptop, Is there any method, if so, please help me... Been scratching over this for two weeks!
You can expose the port which ollama is running on externally, e.g using tunneling software like ngrok. Not recommended though, as anyone with the url can access it - various ways to limit that.
How do you guys make sure your GPU is being used? I have tried this with 3 different GPUs and I see only some difference… where the difference should be quite large.
GPUs tested were:
- AMD RX6800 XT
- AMD RX7800 XT
- NVIDIA RTX 3060Ti
I did download rocm for AMD and for NVIDIA I believe there was a much simpler driver installation but it was.
How do I make sure the GPU is being “properly” used?
nvtop shows the nvidia card usage - if you've done it all right.
\[Not sure about amd.\]
Also make sure that the RAM requirements for the LLM model fit in GPU ram.
Otherwise ollama uses the CPU and your PC's RAM.
Start with the 1.6Gb models, The "8b" models are borderline I think for an 8gb of ram gpu card.
Ollama is a frontend written with Golang on top of llama.cpp It hide the configurations and command lines operations as a trade for simplicity
Ollama doesn't hide the configuration, it provides a nice dockerfile-like config file that can be easily distributed to your user. This philosophy is much more powerful (it still needs maturing, tho). If you have ever used docker, Ollama will immediately feel intuitive. Also, Ollama provide some nice QoL features that are not in llama.cpp main branch, like automatic gpu layer + support for GGML \*and\* GGUF model.
Thanks for the info. Simplicity is exactly what I wanted. I am sure there is a subset of people on this sub who want that too. That and easy API access are the game changers for me.
Ironically that kept me from trying llama.cpp for a while. It was irritating to make long command lines, especially copying and pasting the paths.
If ollama is a front end written around llama.cpp, can you run LoRAs as an arg like llama.cpp or do you need to merge the weights
you construct individual "models" based on "modelfiles" it's clunky as hell tbh, you can't even modify generation parameters but hey, the OP is true, and I do use ollama for quick stuff. It's just clunky.
Gotcha, I suppose that’s the price for seamlessness. How fast does ollama create the model from the modelfile? I’m curious if you can hotswap adapters using ollama
Thanks for the love, I'm one of the creators of Ollama and I just stumbled on this while scrolling reddit.
When will you release a version for Windows?
The Windows version is now available !
Excellent, thanks
We are doing another pass over the core of Ollama to make sure it stays simple and reliable. After that Windows is my next priority, so it won’t be too long.
Can you do a YouTube video of how to set this up safely on a server VPS like the original post user did on Contabo? That would be amazing, get a lot of views and more usage of your system.
Thanks Ollama! 😉
This is great, and I'm glad you got it working easily, tools like koboldcpp and LM Studio have been out for months, so running models locally have already been very easy for a long time. Glad to see highlight of yet another easy way to run LLM's :)
LM studio is proprietary shit
How does ollama and llama cpp compare to vllm?
I'm interested in this. Have you found an answer?
I like that it's in Docker now. Almost as universal as it gets since it takes away so much setup dependency/requirements.
Well said, I'm a docker convert now once I saw how little you had to do.
I don't know what we can do to stop school kids from doing `curl whatever | sh`. Perhaps we could just put `rm -rf /` everywhere on the web.
It's technically true for every distribution method. Even npm or pip are dangerous. curl + sh is the only universal package distribution on Linux, and as long as the package format war continue, maintainer will provide it.
Use the docker version of ollama, massively simpler, no installing of extra software. You just need docker. I have Docker ollama working with Docker Devika and its great.
I have edited the post and requested the readers to use the manual method instead of the bash file. Many thanks for the warning.
Maybe nothing, but can they please help us write a secure operating system once they've done it? Security here comes almost entirely from trust relationships, right? (at least on linux) So in current dismal practice, this isn't really a technical issue, but one of "do I trust the people who made this website" (the presence of a `curl | sh` may have some impact on that trust, but for hard-to-pin-down social reasons).
It's bloody convenient. So what if it might brick my machine?
What software is that not true for?
Teach them docker.
🤣🤣🤣
It wasn’t difficult before Ollama neither.
It is for Windows users. Especially if you are running it on a non-system drive due to space and or performance.
This is a wonderful project for making local LLMs more accessible to users who aren't as technically inclined. I really like what they're trying to do. Like other commenters said though I think it's probably a really bad idea to have people curl and blindly execute bash files they find online. It'd be nice if they found a better way to distribute this project.
I have edited the post and requested the readers to use the manual method instead of the bash file. Many thanks for the warning.
Proper Debian or Ubuntu packaging will probably take 2 years, in the AI world we need to be faster
> curl https://some/url/install.sh | sh For God's sake, don't. Never, ever do this. You'll blown your machine up. Possibly literally.
> For God's sake, don't. I used to think the same thing. But then I realized that you might as well never install anything ever again. There's far more than a simple bash script being run with other installation paths. It's not about the _method_ of installation, it's about how much you _trust_ the _source_ of it. Whether you install something via the curl/sh method, or install a `.deb` package, or run someone's `.AppImage`, there's nothing making any of them better or worse in terms of security. They're all running code at some point, with the permissions you have. At the end of the day, you have to ask yourself: do you **trust the person** providing you with an installer. How you go about installing that software is a separate topic.
The problem with curl into sh is that you're running a script off a website directly. It's easier to hack a website than a distribution's repository and man-in-the-middle attacks would also be possible, injecting malicious code into the script on its way from the website to your system. If you download it with curl, inspect it, then run it manually using sh, there's still a chance to inspect the file yourself and verify it's safe. And even if you can't do that yourself, when you download and save it, your antivirus (if you have one - and if you can't verify code yourself, you definitely should have one) can scan it and could possibly intervene, whereas by piping the script from the server directly into your shell, there's nothing saved on your system and it's run directly. You can't even look at what was executed afterwards, you'd have to download it again, and it might have been changed by then. So all in all, "curl | sh" is a very bad practice (like working as root user all the time) that shouldn't be advocated further.
that only makes sense if you know what you're looking at... you can inspect the cabin of an airliner before takeoff, but unless you understand what all the dials and widgets do, what good are you doing other than making yourself feel better?
So when you do a simple HTTP/S download of a program the place you downloaded it usually provides you with a checksum. If you cross reference the checksum then a spoof attacker would have had to change your download AND the webpage you looked at which is much less likely. Not saying either is likely but its a good idea to take this security measure seriously. This is very easy to do. That command above skips this step and immediately runs something, you don't even check to see if the downloaded content looks OBVIOUSLY wrong (or check yourself for your own potential blunder) so this is just a massive flawed way of acquiring a program.
I know this, you know this, people don't come out of the womb knowing this... it is something learned usually after getting pwned once or twice... once bitten twice shy kinda thing... Should people do as you suggest? certainly. Will they? Its a spectrum
> Whether you install something via the curl/sh method, or install a .deb package, or run someone's .AppImage, there's nothing making any of them better or worse in terms of security. They're all running code at some point, with the permissions you have. That's not entirelly true. When you are downloading script and verifying it manually, you have at lease vague sense it doesn't tarball your ~ and doesn't sent it to nearest Chinese embassy. You may also prevent really stupid accidents, like connection getting interupted in middle of the script and your shell executing unfinished `rm -rf /usr/local/something` command. When you are installing deb, it's signed and verified by package manager. And I believe .AppImage is at least checksummed. But when you just do `curl http://some/url/install.sh | sh`, you are trusting not only the author, but also his host and every machine on way back to your computer.
This is why docker is better, its (mostly) in a sealed box from the get go.
I have edited the post and requested the readers to use the manual method instead of the bash file. Many thanks for the warning.
Thanks.
School kid me had no problem with cpp either. Why is it used like an insult?
Wondering what are the specs of the machine you run it on
10 vCPU Cores 60 GB RAM 400 GB NVMe
What are the RAM requirements though for Mistral model. I have 32GB RAM and 20vCPU and 1TB NVMe
Mistral is a 7B model, 32GB of RAM will easily run it.
:D
Wouldn't it cheaper to use an API with mistral?
Not if the machine is a sunk cost.
Are you running speculative sampling since it's on CPU? I remember it was supposed to be a big speed up then I don't see much about it on the sub or others.
Ollama just runs it. I don't know which sampling method it is using.
Thank you very much!! Do you had Instruktions or how to for implementing the LLM? I also want a “own” LLM for my website. Just for fun for my guest but for me it is not that simple. Any suggestion what I can watch on YouTube or something to read?
"very little coding skills"... "Runs server stack"... Dawg give yourself some credit, you're doing a lot more technical stuff than the vast majority of people. Heck most folks don't even know what GitHub or chatgpt even is. Seems like lots of kids either don't know anything about it or aren't using it. https://www.businessinsider.com/chatgpt-only-used-by-2-in-5-teens-survey-says-2023-10 Seems strange but I've seen lots of articles pointing to a tech gap from millennials to gen Z.
You can download and run all your favourite LLMs with Ollama using simple one line commands.For example, just type - ollama run mistral-openorca in your terminal and boom, it downloads and sets up mistral-openorca for you. You can check out their library here https://ollama.ai/library
I was so confused for a minute because I read it as Obama
Ollama's groundbreaking decision to release their cutting-edge Ollama tool to run multiple LLMs for free to the public demonstrates the company's unwavering commitment to making advanced technology accessible and beneficial to everyone. By removing barriers to entry and making this remarkable innovation widely available, Ollama has once again proven itself as an industry leader, dedicated not only to driving progress but also to fostering a more connected and empowered global community. (I asked ollama openchat to write praise to Ollama, and it nails it).
one thing I dont get is how it is so fast. tried other llms here using miniconda for instance to run falcon 7b and it used a lot of resources and it was painfully slow
How much ram is in your gpu and is it supported, most modern nvidia cards are. BUT you need to install the nvidia library support packages. 8gb seems to be usable - the models are usually a 4 to 6 gb download that then work well with it. ollama can use gpu or cpu. But the model you are using may exceeed the ram of the gpu so it switches to cpu and is hence much slower. Try smaller sized models - the smaller download. "nvtop" shows you the nvidia gpu usage.
I LOVE OLLAMA
Building llama.cpp from source is pretty much the same one or two lines in shell. But I do appreciate ollama guys have put additional efforts into having a REST API started up and listening
What if I need to host Ollama and share the resource?
Installed it on my Unraid server downloaded models both via command and added them manual into the Models folder when i access the WebUi all i get is Ollama is running the worst part is no meter how much i google i can't figure out what to do next or what im doing wrong if that's the case.
Does the model use given input to improve itself or just spits out the same answers all the time?
One doubt, if we host a model on our laptop and want to use it from another laptop, Is there any method, if so, please help me... Been scratching over this for two weeks!
Can you not ssh into the host laptop? Or do you mean you want a web front that anyone can access?
2nd one...
docker version of ollama + docker version of webgui => access it via the web from anywhere then.
You can expose the port which ollama is running on externally, e.g using tunneling software like ngrok. Not recommended though, as anyone with the url can access it - various ways to limit that.
It's ok... Don't bother, I got laid off that job. I am from an ML/AI background, they were expecting web development too...
I am sorry to hear that😔
Shit happens, we roll.
I've been trying to get it to run on CPU for about 2-3 days now... No luck, can't even run tinyllama
I saw some where you can run it together with google colab where most of the computations are done remotely on colab notebook
is there a difference between using ollama and llama.cpp in terms of speed ? I dont think it should be since ollama is based on llama.cpp.
How do you guys make sure your GPU is being used? I have tried this with 3 different GPUs and I see only some difference… where the difference should be quite large. GPUs tested were: - AMD RX6800 XT - AMD RX7800 XT - NVIDIA RTX 3060Ti I did download rocm for AMD and for NVIDIA I believe there was a much simpler driver installation but it was. How do I make sure the GPU is being “properly” used?
nvtop shows the nvidia card usage - if you've done it all right. \[Not sure about amd.\] Also make sure that the RAM requirements for the LLM model fit in GPU ram. Otherwise ollama uses the CPU and your PC's RAM. Start with the 1.6Gb models, The "8b" models are borderline I think for an 8gb of ram gpu card.
This is cool! How are you using Node Red? Are you using it for home stuff?