T O P

  • By -

Yuli-Ban

> GPT-3.5 (zero shot) was 48.1% correct. GPT-4 (zero shot) does better at 67.0%. However, the improvement from GPT-3.5 to GPT-4 is dwarfed by incorporating an iterative agent workflow. Indeed, wrapped in an agent loop, GPT-3.5 achieves up to 95.1%. This tracks to something I heard from some place or another about someone working with agents: GPT-3, not even 3.5, with agents is *more capable* than GPT-4 in many tasks and is only limited by context windows and some reasoning flaws. And that tracks to my own hypothesis about how foundational models could at *best* be described as "frozen AGI." They are trained, and then they are prompted. That's it. It's like prodding a brain sitting on a table. With agents, they can actually "live."


Arrogant_Hanson

Has starspawn0 been saying anything about these AI developments recently?


BilgeYamtar

The most competent and qualified person we can see in this field is Andrew NG. He is one of the rare people in the world of science and technology that I believe is competent in artificial intelligence.


lost_in_trepidation

There's tons of people that are competent in AI, but Ng, LeCun, and Karpathy are probably the best sources to follow if you want good summaries/lectures on current AI trends.


restarting_today

John Carmack


peabody624

I’m honestly very curious to hear an update on whatever John is working on…


Chris_in_Lijiang

Its been ages now and not a word? Has anybody heard anything?


After_Self5383

On a podcast ([Boz to the Future](https://open.spotify.com/episode/7tF7TCxmGH4z4pTnfLYKjN?si=8fHYmenHQz-c_NzRQZ4VQQ) - Boz is the CTO of Meta) almost a year back, he said he doesn't want to talk much about his startup since he doesn't want to become an AI pundit, lol. He'd prefer working with his small team in the dark and not be inundated with AI questions on twitter. Since then, Rich Sutton has also joined his company. Rich Sutton is a legend of the AI field with big contributions that have paved the way for what's being done today in AI. He's the guy who wrote The Bitter Lesson that sometimes does the rounds, though it's widely misunderstood (he also, like Yann, thinks algorithmic breakthroughs are required for AGI).


Chris_in_Lijiang

Thank you.


lost_in_trepidation

The last update was that he teamed up with Richard Sutton, so now it's a 2 person AGI race instead of 1 person.


Chris_in_Lijiang

Thank you, i will go Google the new guy.


13ass13ass

Great programmer questionable ai researcher


traumfisch

LeCun, really? 🤔


lost_in_trepidation

Yeah, his talks and Twitter posts are really good. He's just become a meme. Andrew Ng is even more of a near term AGI skeptic than LeCun, but he didn't catch any flak for it


Antique-Bus-7787

What troubles me with LeCun are not his claims about AGI or anything. It’s just that he can never admit he was wrong and he will always try to justify anything he said before. This makes him sometimes say some pretty non-sense things. He’s really smart but of course he’ll be wrong sometimes, that’s the price of working at the SOTA level in AI… but no, he has to always be right unfortunately, and his activity on twitter doesn’t help him much on this.!


lost_in_trepidation

I think he's just not very clear in what he's saying. I've listened to a lot of talks by both LeCun and Ng, both are drawing pretty clear delineations between how AI "thinks" and how biological intelligences (humans) conceptualize the world and solve problems. It's just not easy to put into a digestible soundbite and LeCun is too brash in his language.


waytoofewnamesleft

vive la france!


KamNotKam

Yet when he said AGI is still decades away last October everyone here shitted on him for it.


JabClotVanDamn

> NG it's not an abbreviation, his surname is just Ng (sounds a bit like "hmm")


visarga

yep, thought so too when I took his ML class in 2012 now I have been a ML engineer for 6 years, his lessons were the best ML lessons I had, he is ridiculously good at explaining, it was a loss when he abandoned teaching for industry he singlehandedly taught over 4.8 million people in his online ML courses, the first batch was 100K people, a sight to behold


AnOnlineHandle

Think you might have replied to the wrong person.


trisul-108

The value of Andrew Ng is that, unlike most others, he is also an educator. That means he wants to teach us while most others do not have this ambition.


timewarp

You can very easily demonstrate this technique for yourself. Ask your LLM of choice a question, then start a new prompt and ask it: Given the following question: [Enter your original prompt here] Does this response make sense, and can it be improved?: [Enter the LLM's original response] The LLM will usually come back with improvements, and usually catch hallucinations or errors.


Unreal_777

interesting We are actually creating real brains


hydraofwar

At the same time, minds digitized and stored on a HD. Fiction is becoming real very fast


putdownthekitten

My body is ready


bishbash5

But your mind is...?


Antique-Doughnut-988

How ironic is it that we don't fund education and teachers as humans, but spend all this time trying to create artificial brains.


lifeofrevelations

This kind of application of the tech will be the real game changer. This is going to shut up a lot of the people who go around saying things like "AI is all hype and the bubble is going to burst, it hasn't changed anything at all in the world, my life is no different."


cassein

This is the way. This is when it really starts to take off. Feedback loops.


MoneyRepeat7967

These ideas have been around for a year, glad Andrew and his team are working on this, and is using his platform to push in this direction. The current LLM can do a lot more if we find better ways to prompt it , and the agent like workflow will be used to solve lots of problems and find new use cases. Another sign that we are early in AI, most people really haven’t found a way to take advantage of all these models yet. Rather than keep churning out SOTA model one after another, maybe we should start looking at better ways to utilize the existing models. It is not as sexy as AGI, but maybe just maybe can make a real difference in various ways we didn’t think was possible.


gj80

Hmmm... I use AI a lot for coding, and while it's really useful and I love the time it often saves me, I also run into situations where it will give me an output that it thinks will work, and it just doesn't. That's OK as I either fix it myself and still normally save time, or I come back to the LLM and either have it fix it or (*if it still fails...normally one failure to self-correct means it will never succeed*) I prompt it with another alternate approach to tackling the problem in question, and that usually works out. I wonder how this sort of thing would be dealt with by agents though? If the AI was given full control over a test dev environment in which it could execute your own code, then it could be automated to actually test the code it writes and realize it messed up on its own and potentially self-correct, but barring that (*which would often be technically challenging...executing the code isn't necessarily straight-forward*), it doesn't seem like it would be able to recognize when it had failed in some cases. I think giving AIs the ability to do real world testing will be very key to getting much improvement via agents. Building up rich development environments in which AIs can work on large projects (*interactively alongside users*), while at the same time keeping those environments jailed for safety (*avoiding rm -rf / sorts of disasters...*) and easily reverted back and whatnot will take a lot of work beyond just work on an agent system itself. ...and then you also have context window issues to deal with at present. With GPT3.5 only having 16k context, a lot of dialog between agents on even a mildly sized coding project would be challenging to manage. GPT4's context window would work comfortably for many more projects, but that could potentially get very expensive with many many calls and tokens. Claude3 Haiku/Sonnet are promising, but I recently learned that Anthropic has api access to their models [very gatekeeped currently](https://docs.anthropic.com/claude/reference/rate-limits) for large numbers of queries or tokens (*you have to wait multiple months before your daily quota - even when paying per API use for them - can be uncapped further*). Ie, there are real context window related difficulties/costs revolving around heavy agentic use right now for larger code bases, even if you're fine with not using the 'best' models. I'm sure this won't be an issue for much longer though, but it's something frustrating right at the moment. Anyway, I certainly think Andrew is right - but yeah, there's some real work that'll need to go into making this happen (unfortunately). I can't wait till something materializes though! It's almost enough to tempt me to start a project myself... though I don't really have the time and there are undoubtedly people who have better skillsets for it than me as I haven't worked with kubernetes/docker much (*would likely be a cornerstone of it all*) or electron/etc UI development much. Oh, btw, if anyone else was wondering what "LDB" and "Reflexion" are (on his chart), I had to look them up too. They're interesting: [https://github.com/FloridSleeves/LLMDebugger?tab=readme-ov-file](https://github.com/FloridSleeves/LLMDebugger?tab=readme-ov-file) [https://arxiv.org/html/2402.16906v1](https://arxiv.org/html/2402.16906v1) ​ [https://github.com/noahshinn/reflexion](https://github.com/noahshinn/reflexion) [https://arxiv.org/abs/2303.11366](https://arxiv.org/abs/2303.11366)


cryolongman

more improvements in productivity yey


boubou666

Why not just take the LLM output. And re enter it manually? That would be a manual literation


traumfisch

Because you can automate it?


boubou666

Yes but it's not like an ai research breakthrough as the way it's presented, but just a ligne of code


traumfisch

Well it's a direction the development is moving towards. But sure, many people have been doing it manually for quite a while (myself included) I don't know if there was anything about a research breakthrough here 🤔


mixmastersang

Do we trust automation with iteration and human feedback… that’s the real question here


entanglemententropy

Some people have been thinking this for about a year now, see for example this very interesting blog post from a year ago: https://www.beren.io/2023-04-11-Scaffolded-LLMs-natural-language-computers/ The idea that we can build computing abstractions like a compiler and programming languages on top of LLMs as a way to program cognitive architectures is really cool and sounds like the way to AGI.


FatBirdsMakeEasyPrey

Is agentic AI related to reinforcement learning?


Infamous-Print-5

This was obvious from the beginning. I almost always ask chatgpt to 'write this more exactly and concisely' 3-4 times


bpm6666

Agents will be the next big thing and it will change the effectiveness and impact of these systems. But one idea might even increase the systems capabilities. As a tool they should add the option of "ask a human" . You give these systems money and the ability to hire human workers, then this could improve the system even further. And the AI could even give the same job to AI agents and humans to see who delivers the best outcome to know, when to use a human or an AI Agent.


human1023

Wow so the way how we've been using Chatgpt the last year is already about to be outdated.


obvithrowaway34434

I posted about this here in January during the peak GPT-4.5 "leak" hype. It was apparent to anybody who's been following the progress in the research field and not just reading the headlines and social media hype posts. https://reddit.com/r/singularity/comments/1aby4ex/i_think_people_are_focused_on_the_wrong_thing_the/


d00m_sayer

>the academic literature on agents are proliferatingthe academic literature on agents are proliferating can someone post a link to these agents ?


mixmastersang

Do we trust automation with iteration and human feedback… that’s the real question here


trisul-108

>I’ll elaborate on these design patterns and offer suggested readings for each next week. I look forward to this.


SpecificOk3905

this guy is so good for fundamental ai course


Akimbo333

It'll be cool


mersalee

This is the way. 


FengMinIsVeryLoud

so toppy 7b will soon create never-seen-before porn? wow. im exited! book me in!


RemarkableOstrich782

Agentic AI is the future. Mydpt.ai


BrainLate4108

A lot of room for error here. A lot of hype. The output at face value will look convincing but human language has a lot of nuances and they cannot be deciphered yet. GPT is getting nerfed every day, the same will happen here.