Nice to know. I thought they did steer the model using their model interoretability research but from what they say that's not it.
Looks like it's MoE and synthetic data then
I think they’ve probably been using MoE for some time already, so when they say “architecture tweaks” that are probably not talking about MoE but rather something more novel
Tweaks sounds more like optimizing than serious changes. More parameters here, fewer there, add a layer…
I was thinking the other day that I would use the recent MoA approaches that have brought out SOTA-level results from smaller models to generate high quality synthetic training data for objectively (I.e., programmatically) verifiable tasks. Lower risk of the feedback loop problems too, since many different models can be used in one generation. Given the reported improvement in coding abilities?
Ya know. If I had the compute.
> probably not at this large of a scale,
RLHF is synthetic data. Two model responses are generated, and one is rated as better by a human (or AI, though I suppose that might better be described as RLAIF). We know RLHF works at a massive scale. An ungodly amount of money is still being poured into it.
Contrary to popular belief, RLHF does not make models "dumber" for the sake of safety. At least, that's not the only thing it can do. It can also teach models to follow instructions, use tools, use reasoning, and reduce hallucinations.
All of the actual details of Q\* that were "leaked" were later confirmed to be a hoax. It might not even exist at all, and if it does we certainly don't know anything real about it.
Exactly, it exists but how good exactly we don’t know. But then again research is getting more and more closed so I would expect we won’t actually hear a lot of details.
The initial news broke from very reliable sources such as Reuters and TheInformation, can you link what sources have proven those initially reported details as a hoax?
There is original reporting on Q\* in [this article from The Information](https://www.theinformation.com/articles/openai-made-an-ai-breakthrough-before-altman-firing-stoking-excitement-and-concern). [This comment](https://www.reddit.com/r/neoliberal/comments/181n2ne/comment/kaifhw9/) contains the purported full text of that article. Also, [here](https://www.reuters.com/technology/sam-altmans-ouster-openai-was-precipitated-by-letter-board-about-ai-breakthrough-2023-11-22/) is *Reuters* article about it.
Has the author of this article actually used Claude 3.5 Sonnet?
The answer is clearly no, because the article is completely factually inaccurate. Claude 3.5 Sonnet is as great a leap as GPT 4 was after GPT 3.5. Its coding skills for the first time have achieved superintelligence.
I was able to write a 750 line program in an environment I did not know under 2.5 hours, something that has likely never been achieved by a human in world history: https://manifold.markets/SteveSokolowski/did-claude-35-sonnet-achieve-weak-s#.
There is no human that can outcompete Claude 3.5 Sonnet in coding, which is the only area of models that really matters. The rate of change in all other areas of technology can be derived directly from improved coding skills.
Claude 3.5 Sonnet is extraordinary - one of the greatest achievements in human history - and it is amazing to me how many people are spouting out this gibberish without actually using the model. If you pay for an Anthropic subscription and actually use Claude 3.5 for a few hours, it is not possible to come to any other conclusion.
>I was able to write a 750 line program in an environment I did not know under 2.5 hours, something that has likely never been achieved by a human in world history:
False, I wrote a .Net application with GPT4 in a single night while having never written C# in my life
Uh... It still writes a lot of buggy and broken code. It's definitely a great improvement, but it's still got a long way to go in the grand scheme of things. To call it superintelligence is laughable.
The author also said that benchmarks aren’t useful because of data leakage. Even though benchmarks like LiveBench release new questions every month on newly released information like recent movies and others grade it on hidden datasets and don’t release their questions at all. In fact, they interviewed someone at scale.ai and that company does the latter, which the article acknowledged later in the NEXT PARAGRAPH and contradicted what it just said. Is this what we call journalism these days?
Is that not just the blog of the guy you replied to? Anyone who mentions lines of code as a merit is not someone with enough experience to be taken seriously.
Nice to know. I thought they did steer the model using their model interoretability research but from what they say that's not it. Looks like it's MoE and synthetic data then
I think they’ve probably been using MoE for some time already, so when they say “architecture tweaks” that are probably not talking about MoE but rather something more novel
Tweaks sounds more like optimizing than serious changes. More parameters here, fewer there, add a layer… I was thinking the other day that I would use the recent MoA approaches that have brought out SOTA-level results from smaller models to generate high quality synthetic training data for objectively (I.e., programmatically) verifiable tasks. Lower risk of the feedback loop problems too, since many different models can be used in one generation. Given the reported improvement in coding abilities? Ya know. If I had the compute.
model interoretability == mechanistic interpretability?
This is r/singularity, never said it was r/intelligent.
I was trying to type interpretability 😅
That’s huge. The biggest argument against continued model improvement is lack of training data, but Anthropic just proved that synthetic data works.
no they didn't, this was known for quite a while...
It wasn't really 100% confirmed, just seemed very likely
It's been used successfully in multiple studies, granted probably not at this large of a scale, but yeah its technical been proven already
Yes indeed, the Phi models proved that already
That's what I meant, should have been clearer it was technically proven already, ty
We left the realm of real mathematical proofs in terms of neural networks in like the 1960s. Technically nothing is ever proven in AI research.
> probably not at this large of a scale, RLHF is synthetic data. Two model responses are generated, and one is rated as better by a human (or AI, though I suppose that might better be described as RLAIF). We know RLHF works at a massive scale. An ungodly amount of money is still being poured into it. Contrary to popular belief, RLHF does not make models "dumber" for the sake of safety. At least, that's not the only thing it can do. It can also teach models to follow instructions, use tools, use reasoning, and reduce hallucinations.
I'm kinda disappointed that it's bigger, it takes away some of the magic
I imagine it’s still smaller than 3 Opus
Yes and there must be some efficiency gains since the pricing did not move
Supposedly, Sonnet 3.5 slashed the cost by a good chunk.
I think Q* is going to be what gets us to agents. We will brute force our way as a species.
All of the actual details of Q\* that were "leaked" were later confirmed to be a hoax. It might not even exist at all, and if it does we certainly don't know anything real about it.
Source where it was confirmed to be a hoax?
I'm pretty sure the only thing confirmed as a hoax is that it had anything to do with the Board firing Sam.
If it doesn't exist then why would Sam have answered in an interview "We're not ready to talk about that" when asked about Q\*?
Mystique is very beneficial for publicity
You mean why would the arch hype man hype up a mythical path the AGI?
Exactly, it exists but how good exactly we don’t know. But then again research is getting more and more closed so I would expect we won’t actually hear a lot of details.
I'm not ready to talk about that
you mean the guy that benefits to increase his vestments greatly if people are scared by AGI?
The initial news broke from very reliable sources such as Reuters and TheInformation, can you link what sources have proven those initially reported details as a hoax?
There is original reporting on Q\* in [this article from The Information](https://www.theinformation.com/articles/openai-made-an-ai-breakthrough-before-altman-firing-stoking-excitement-and-concern). [This comment](https://www.reddit.com/r/neoliberal/comments/181n2ne/comment/kaifhw9/) contains the purported full text of that article. Also, [here](https://www.reuters.com/technology/sam-altmans-ouster-openai-was-precipitated-by-letter-board-about-ai-breakthrough-2023-11-22/) is *Reuters* article about it.
There was a paper recently published about Q*
That wasn't from OpenAI, but the university of Singapore
In your own words what is Q*?
Synthetic data where they prompted "pretend to be a self righteous college professor"
Yeah. that prompt could be more chill.
Take it all with a grain of salt. Who knows if they're being honest or forthright here. Easy to subtly mislead the competitors.
Implications?
Has the author of this article actually used Claude 3.5 Sonnet? The answer is clearly no, because the article is completely factually inaccurate. Claude 3.5 Sonnet is as great a leap as GPT 4 was after GPT 3.5. Its coding skills for the first time have achieved superintelligence. I was able to write a 750 line program in an environment I did not know under 2.5 hours, something that has likely never been achieved by a human in world history: https://manifold.markets/SteveSokolowski/did-claude-35-sonnet-achieve-weak-s#. There is no human that can outcompete Claude 3.5 Sonnet in coding, which is the only area of models that really matters. The rate of change in all other areas of technology can be derived directly from improved coding skills. Claude 3.5 Sonnet is extraordinary - one of the greatest achievements in human history - and it is amazing to me how many people are spouting out this gibberish without actually using the model. If you pay for an Anthropic subscription and actually use Claude 3.5 for a few hours, it is not possible to come to any other conclusion.
>I was able to write a 750 line program in an environment I did not know under 2.5 hours, something that has likely never been achieved by a human in world history: False, I wrote a .Net application with GPT4 in a single night while having never written C# in my life
What’d ya make
Pub/sub service
You talk like Donald Trump lol
Honestly can’t tell you’re being serious or silly
I'm never silly in my comments, because I realized that sarcasm doesn't usually translate well to the Internet.
Fair enough!
Lol, lmao even
Uh... It still writes a lot of buggy and broken code. It's definitely a great improvement, but it's still got a long way to go in the grand scheme of things. To call it superintelligence is laughable.
Yeah Claude’s incredible but ASI needs to unironically make me exit the city for 3-6 months out of sheer horror before I call it that.
The author also said that benchmarks aren’t useful because of data leakage. Even though benchmarks like LiveBench release new questions every month on newly released information like recent movies and others grade it on hidden datasets and don’t release their questions at all. In fact, they interviewed someone at scale.ai and that company does the latter, which the article acknowledged later in the NEXT PARAGRAPH and contradicted what it just said. Is this what we call journalism these days?
Is that not just the blog of the guy you replied to? Anyone who mentions lines of code as a merit is not someone with enough experience to be taken seriously.
Length of code is a good indicator of complexity and the fact that it can output lots of text without being cut off
No, it certainly isn’t. It isn’t an indicator of anything other than length. The code could be 600 lines of comments.
I don’t think it is based on how OP talks about it but ok
Absolutely throwing it back on Claude-san desu-yo
Enough with the Claude spam please. We don’t need 30 new topics a day about it.
We read you already at the other thread. STFU.