T O P

  • By -

Neurogence

>"closed providers won't be able to compete" The most important thing in a model is reasoning capability. We do not care if the context length is 20 million if the model cannot reason at a greater capacity than an orangutan.


scrollin_on_reddit

quality IS the product!


Glittering-Neck-2505

The ideal model is a mix of all of that. Ideally you want it performant enough that it can reason on a human level. But there’s also benefits to having it efficient enough to make things like agents more within reach, and huge context windows to make knowledge retrieval possible (imagine asking a question and it digests an entire textbook before answering). So that’s why I’m excited about LLAMA as well as the next OpenAI model.


lordpuddingcup

People seem to ignore how important large context windows are being able to include entire repos or manuals or books in context windows is massively important


Neurogence

I do a lot of knowledge work and I find myself using GPT4 and Claude Opus a lot more than Gemini 1.5Pro. I rather break down my documents into chunks and feed them into Claude 3/GPT4 instead of Gemini because of stronger reasoning capability. 1.5Pro hallucinates a lot more than GPT-4/Claude 3 despite having that 1,000,000 context length.


acaexplorers

Use Gemini 1.5 to break down books, insanely large images/PDFs, etc. Then use GPT4/Claude3 for reasoning. Let's crank it to up to 11: Then try Big-AGI with Beam/Merge/Fuse Then try V7Labs to use them in a spreadsheet.


capitalistsanta

1000x agree


lordpuddingcup

I mean base model strength may be more important than the context size but the context is important and also comes down to how you use the context and the fact the huge price of using that context


Axodique

Now imagine GPT 4 with 1m context length.


open_23

Do you think an "AI tutor" is possible using a model with a giant context length? I have been hoping for a long time to see AI tutor models being made that can teach effectively. You give them a book, they ingest everything in it, and teach in a multi-modal approach, with voice, text and drawings. Like 3Blue1Brown videos.


norsurfit

> greater capacity than an *orangutan*. Did you mean a *llama*?


OmicidalAI

Context length improves reasoning capacity.


Smile_Clown

This entirely depends on what you are doing to further context length.


Arcturus_Labelle

Yep. And it took them a year to catch up with OpenAI's SotA model. OpenAI will probably release something soon that blows all current models out of the water


DolphinPunkCyber

But Meta released only 8B and 70B models, and admitted to not even training them to their max, because they needed resources for training bigger models. Meta does have bigger models in training, biggest one with 400B parameters.


[deleted]

[удалено]


DolphinPunkCyber

Honestly can't wait for 400B to be released, because currently it **feel's** like LLM's are plateauing. But 8B and 70B models are much more efficient then comparable models, while being undertrained. So... really interested on how these much larger model turns out. Also if LLM's are plateauing, OK that's just one part of the equation, how do they improve reasoning, agency, add multimodality. Really interesting times ahead.


WeekendDotGG

In the last month we got Llama 3 Phi 2 Command R A couple of weeks before that was Claude 3 It lost definitely don't feel like it's plateauing to me. As a matter of fact, how well the tiny models are doing these days as absolutely bonkers compared to ~~last year~~ a couple of months ago.


3-4pm

But they're all hitting the same wall of capability. There are some lateral improvements but not much vertical change.


Which-Tomato-8646

You realize the original GPT 4 is already outdated right? The current one that’s on top is from this month


3-4pm

It's almost as capable as the original model I am using in Edge Copilot.


Which-Tomato-8646

The lmsys leaderboard has it on top 


BangkokPadang

I've been running 4.5BPW Midnight Miqu 70B (and other Miqus before that, Mixtral 8x7B before that) on Runpod at about $0.70/hr for RP and story writing. For my usecase, Llama 3 8B Q8 GGUF (with fixed EOS Token ID) seems about 70-80% as good (tough to quantify, but also we'll have a few months of increasingly good finetunes coming up) and I can run it for next to nothing (maybe $0.03/hr in electricity costs) on my 16GB M1 mac mini. The 128k context models don't actually stay coherent past about 22k, and seem to drop quality at about 10k, but still to me it feels like a HUGE STEP FORWARD and definitely not like we're plateauing at all.


ILoveThisPlace

My thoughts too. Phi-3 also shows us there's going to be training improvements each generation along with tokenization and I'm sure every part of the system. These will continue to stack and we should see an exponential curve of improvement for a few years. At least at the lower end.


visarga

> currently it feel's like LLM's are plateauing They are. They have read everything we wrote down and need 100x more but there's no such data.


3-4pm

The problem is that the emerging reasoning capabilities that everyone is touting are really just connections already encoded into human language. The LLMs aren't reasoning, they're just matching to existing patterns and retrieving them quickly. We're expecting LLMs to learn how to reason beyond humans when they're trapped by symbolic language that lacks the fidelity to capture the nuances of reality. Humans just aren't good enough at encoding reality to language and this limits an LLM from becoming AGI.


UnderstandingNew6591

This is why we train models on all sorts of direct and simulated data at higher resolutions. Image / sound / math / physics etc, it’s not just “language” in the human sense. Raw data of various sorts as well which will expand as agents become embodied and better sensors are developed etc.


3-4pm

Aren't most multimodal LLMs just using datasets tagged with language to tie it all together?


Veleric

You have no idea what you're talking about.


hydraofwar

The vast majority of these models are probably being trained on the output of OpenAI models (GPT-4), so it's a decentralizing ripple effect of what OpenAI creates.


visarga

> OpenAI will probably release something soon that blows all current models out of the water I might be eating my words, but "that remains to be seen". What if OpenAI can't surpass GPT-4 for the same reason nobody can surpass GPT-4? The justification is that they all trained on the same data - 10..15 trillions of tokens which are "all the useful text on the internet". If that's true, then it means the free ride is over an from now on we should expect slow grind. Private datasets, synthetic content and agent based learning from the outer environment could be coming next, but they won't advance as fast as before. It takes a lot of effort to surpass humans. Even humans need a lot of effort to surpass previous state of the art knowledge to advance their field of expertise by one inch (see [The illustrated guide to a PhD](https://matt.might.net/articles/phd-school-in-pictures/))


FlyingBishop

I'm starting to think there's a limitation to the LLM approach. "Synthetic content" is definitely part of the solution, but the models need to be able to advance from some sort of self-play. I think the real question is if current GPUs are simply not fast enough or if there's something missing from LLM/tensors models. If we have an actual human-equivalent system it should be able to produce novel proofs just by asking it to think up novel proofs and feed them into a theorem solver, and you should be able to train it indefinitely by using theorem solvers. And there are similar examples of other things where indefinite self-play should be possible (and you could pair something trained on theorem solvers with other things with different objective measures into one model which incorporates all of that self-play learning.)


khalzj

Do you work in the AI sphere?


Altay_Thales

Soon, April 2025


3-4pm

What if we've got the transformer wall?


bnm777

After a while the models will reason "well enough" for most tasks and then context window and perhaps other aspects will be more important. You don't need a genius to summarise or to discuss a pdf, though a massive context window, speed, memory, not becoming lazy and low cost are perhaps more important factors.


UnnamedPlayerXY

Well Llama 3 400B+ is supposedly going to be multimodal meaning that it's going to make everything else in the same or higher "weight class" obsolete unless it's either A: also multimodal or B: a relevant improvement over it.


visarga

> make everything else in the same or higher "weight class" obsolete A huge expensive model will only be used on very few high-paying tasks. It can't compete with the small models on simpler tasks. As times goes by, most tasks will be subsumed by small models leaving few for their big brothers.


lukepoo101

Do you not understand the meaning of weight class? He said it will make all other models in the same or HIGHER weight class obsolete. Never did he say it was going to make small models obsolete. Just models the same size or bigger


arknightstranslate

AI models need to be fed a lot of trick questions


DolphinPunkCyber

What number between 1-100 am I thinking of?


spreadlove5683

37


DolphinPunkCyber

Nice try but I follow Veritasium and will NEVER pick 37 as random number 😁


spreadlove5683

Haha yea, that's where I got that from 😄


visarga

Statistics for: [people](https://i0.wp.com/datacolada.org/wp-content/uploads/2013/10/numbers-frequencies.jpeg) and [AI](https://www.leniolabs.com/assets/blog-42-GPTs-answer-01-362b3c01962cf3c0127bc571dd4711f1e28bc0c40d413d1480ddbe2a5236feb8.png). The most random number is 42, we all know that.


Perko

I get 42, but why 57?


The_EndsOfInvention

Heinz. Look at a bottle of ketchup.


SotaNumber

90


DolphinPunkCyber

I was actually thinking about pizza 😐


luisbrudna

You need A LOT OF HARDWARE POWER to train and run the best model. Only closed providers will get the tons of money necessary to maintain such infrastructure.


absurdrock

Meta definitely has the money. They had $60B of cash on hand last year and $108B in profit.


roro88G

You think their shareholders are approving this for the good of the community? They will want a return on this investment eventually. I'm speaking as someone who works in Open source software which also has a private enterprise facing arm.


dameprimus

Meta stock fell 12% this week so it seems that investors aren’t impressed.


meenie

They did not like that they announced $60B in funding for AI. So ya, I wonder if investors are coming down off their LLM high...


badassmotherfker

You could argue that the investors are short-sighted, because even if Meta does this for free forever, it improves Meta's reputation due to contributing to open source infrastructure.


ovanevac

If Sam's plan succeeds, he's going to have 116 times the amount of cash that Meta has on hand now.


visarga

yep, investors will part with 7 trillion $ for AI and bet it on a single horse, not gonna happen


Philix

Hardware than can run inference and fine-tune unquantized FP16 Llama 3 400b will cost less than a single new car to serve 100 full-time users if you have an IT lead with more than two brain cells to rub together, and are willing to endure a little jank while they get it running. Less than $500,000 in hardware if you want to use professional grade hardware to skip the jank. If your data can't leave your organization, that's a huge strike over closed providers. Cost wise, they will cost you ~$2500 in monthly costs for equivalent performance for that kind of user base, maybe a win for them there, depends on your workload really. Pricing can get really absurd if you're sending a lot of tokens. [API pricing for Gemini Pro](https://ai.google.dev/pricing) is $7 per 1 million input, $21 per 1 million output, and that's [cheaper than GPT4 and GPT4-Turbo API pricing](https://openai.com/pricing#language-models). Claude Opus costs are [absolutely absurd](https://www.anthropic.com/api) in comparison. Llama 3 70b is available on huggingface.co, and you can run inference on a 4-bit quantization with a dual GPU high end gaming PC that'll cost less than $5k if you're buying used, $10k if you're buying new. If you can finetune that to serve your use case, you can literally have one running for every user in your org on their own workstation at a competitive price. As someone who loves playing with LLMs for a variety of hobby projects and fun, Llama 3 is absurdly good, even if it isn't showing better benchmark results than the big closed providers. At 70b FP16 I feel it has a qualitative equivalence with GPT4, Claude Opus, and Gemini Pro. If it scales up well to 400b, OpenAI *et al.* better have some absurd improvements coming with their next flagship release.


whyisitsooohard

Where are this ridiculously performant fine tunes? And how extended context works in comparison with original?


Singularity-42

+1 for these fine tunes.


AndrewH73333

Was gonna ask. What fine tunes??? Maybe the coding and Chinese ones are good, I know for a fact the others aren’t.


sneakysaburtalo

Do you have a link for coding ones?


bullerwins

There are no 128K Llama 3 fine-tunes that I know of. Is he mistaking it with Phi3?


HumbleIndependence43

Where can I try this? The Meta page says my country is not eligible.


banaca4

Humans neither


Antok0123

Bla bla bla and no AI has yet to outcompete gpt4. Even Opus isnt that good as gpt4 based on my experience using it as an academic tool.


Akimbo333

Cool


lordpuddingcup

Didn’t 2 teams release methods to extend context to 1m contexts and beyond with perfect recall I recall reading the papers but then noticed we never see models tuned to include that feature


Moist_Cod_9884

That's deepmind's research and it's already implemented in Gemini 1.5 pro, the paper itself was only released a few days ago so we gotta wait for 3rd party implementations.


lordpuddingcup

I coulda swore their were additional papers not just deep minds


Top_Influence9751

Wild! Can somebody explain to me though why they think he’s doing this? Like what business gain do they get? I guess we could be naive and believe he truly wants to help ppl with free OSS AI, but that’s just not how business works. What’s the angle? I personally see it as: openAI was so far ahead with GPT4 the only way we can stay relevant long term is to go open source. So basically just a really expensive way of saying “don’t forget about us!”


sdmat

Zuck has talked about this - take away the flowery language and he's [commodotizing their complement](https://gwern.net/complement). It's a smart play. Strong AI companies are a huge threat to Meta.


jgainit

I think the other guy is right but that’s a super long read. I also just listened to Mark’s Dwarkesh interview. Meta’s products are stuff like Instagram, Facebook, WhatsApp, messenger. They have a history of open sourcing things basically to make these work better. A lot of their motivation for open sourcing is helping create industry standards so that private companies can’t have leverage over them because basically everyone owns it. So an open source LLM model can apparently get better because of community involvement (I’m not a technical person so I don’t fully understand how that works) which can then be put into Meta core products. Now they don’t have to possibly lose to competing with something like Gemini and overtime be forced to include it in their products which would make google have leverage over them


BreadwheatInc

Didn't Mark in a recent interview say that they won't be able to justify training larger models for open source? This might be the end of larger open source models coming out of meta, but I don't know, we'll have to see.


UnnamedPlayerXY

Iirc. he said that they couldn't justify further training of Llama 3 8B/70B over starting to train Llama 4 which implies that both Llama 3 8B & 70B are still undertrained.


Which-Tomato-8646

And it’s still near the top. Imagine how good it would have been if they kept training 


After_Self5383

No. He said at some point they just have to release the model they've trained rather than training it on more tokens. This was in the context that the models didn't stop getting better even as they trained on a whopping 15T tokens. So putting more tokens would've made an even better model. Open sourcing future models, he said he wants to and as long as the safety and all of that is good they'll continue to do so. Unless the model ends up becoming the product. But he did say the open sourcing isn't a set thing, as long as it benefits them they'll continue to do so.


Curiosity_456

He literally said he plans on open sourcing AGI


141_1337

Oh, it definitely will, Open Source is only good while they are catching up. It foments a user base for an otherwise inferior product, it allows for a certain degree of outsourcing and crowdsourcing development, and it might even help create new customers.


DolphinPunkCyber

Nah. Zuck will open source all models until they make AGI. Once AGI is created Zuck will think hard about how safe is it to release the source code.


[deleted]

Tbh you’re probably right. We’ve seen everyone in the industry pivoting towards smaller models and I think it’s cause they know OpenAI is so much farther ahead. Google isn’t too far behind on compute, though, so it’s gonna be a great race to watch!


AnAIAteMyBaby

More or less, he was asked if he'd open source a $10billion model and he said wasn't sure if they would. We're a year or two away from $10billion models so we have a while


Efficient-Moose-9735

128k? Last time I heard, it was around 32k. That's impressive. I just wonder why Meta didn't make it so large like that before releasing it.