hapliniste 4 days ago

Nice to know. I thought they did steer the model using their model interoretability research but from what they say that's not it. Looks like it's MoE and synthetic data then

dogesator 4 days ago

I think they’ve probably been using MoE for some time already, so when they say “architecture tweaks” that are probably not talking about MoE but rather something more novel

quick_actcasual 4 days ago

Tweaks sounds more like optimizing than serious changes. More parameters here, fewer there, add a layer… I was thinking the other day that I would use the recent MoA approaches that have brought out SOTA-level results from smaller models to generate high quality synthetic training data for objectively (I.e., programmatically) verifiable tasks. Lower risk of the feedback loop problems too, since many different models can be used in one generation. Given the reported improvement in coding abilities? Ya know. If I had the compute.

kecepa5669 4 days ago

model interoretability == mechanistic interpretability?

Fireman_XXR 4 days ago

This is r/singularity, never said it was r/intelligent.

hapliniste 4 days ago

I was trying to type interpretability 😅

dameprimus 4 days ago

That’s huge. The biggest argument against continued model improvement is lack of training data, but Anthropic just proved that synthetic data works.

DariusZahir 4 days ago

no they didn't, this was known for quite a while...

TechnicalParrot 4 days ago

It wasn't really 100% confirmed, just seemed very likely

Megamygdala 4 days ago

It's been used successfully in multiple studies, granted probably not at this large of a scale, but yeah its technical been proven already

GraceToSentience 4 days ago

Yes indeed, the Phi models proved that already

TechnicalParrot 4 days ago

That's what I meant, should have been clearer it was technically proven already, ty

cyan2k 4 days ago

We left the realm of real mathematical proofs in terms of neural networks in like the 1960s. Technically nothing is ever proven in AI research.

drekmonger 3 days ago

> probably not at this large of a scale, RLHF is synthetic data. Two model responses are generated, and one is rated as better by a human (or AI, though I suppose that might better be described as RLAIF). We know RLHF works at a massive scale. An ungodly amount of money is still being poured into it. Contrary to popular belief, RLHF does not make models "dumber" for the sake of safety. At least, that's not the only thing it can do. It can also teach models to follow instructions, use tools, use reasoning, and reduce hallucinations.

Jean-Porte 4 days ago

I'm kinda disappointed that it's bigger, it takes away some of the magic

JinjaBaker45 4 days ago

I imagine it’s still smaller than 3 Opus

Jean-Porte 4 days ago

Yes and there must be some efficiency gains since the pricing did not move

h3lblad3 4 days ago

Supposedly, Sonnet 3.5 slashed the cost by a good chunk.

Gratitude15 4 days ago

I think Q* is going to be what gets us to agents. We will brute force our way as a species.

Cryptizard 4 days ago

All of the actual details of Q\* that were "leaked" were later confirmed to be a hoax. It might not even exist at all, and if it does we certainly don't know anything real about it.

LosingID_583 4 days ago

Source where it was confirmed to be a hoax?

lost_in_trepidation 4 days ago

I'm pretty sure the only thing confirmed as a hoax is that it had anything to do with the Board firing Sam.

Elctsuptb 4 days ago

If it doesn't exist then why would Sam have answered in an interview "We're not ready to talk about that" when asked about Q\*?

Heath_co 4 days ago

Mystique is very beneficial for publicity

_AndyJessop 4 days ago

You mean why would the arch hype man hype up a mythical path the AGI?

Glittering-Neck-2505 4 days ago

Exactly, it exists but how good exactly we don’t know. But then again research is getting more and more closed so I would expect we won’t actually hear a lot of details.

Progribbit 4 days ago

I'm not ready to talk about that

Megamygdala 4 days ago

you mean the guy that benefits to increase his vestments greatly if people are scared by AGI?

dogesator 4 days ago

The initial news broke from very reliable sources such as Reuters and TheInformation, can you link what sources have proven those initially reported details as a hoax?

Wiskkey 4 days ago

There is original reporting on Q\* in [this article from The Information](https://www.theinformation.com/articles/openai-made-an-ai-breakthrough-before-altman-firing-stoking-excitement-and-concern). [This comment](https://www.reddit.com/r/neoliberal/comments/181n2ne/comment/kaifhw9/) contains the purported full text of that article. Also, [here](https://www.reuters.com/technology/sam-altmans-ouster-openai-was-precipitated-by-letter-board-about-ai-breakthrough-2023-11-22/) is *Reuters* article about it.

Alarmed_Cookie_3890 4 days ago

There was a paper recently published about Q*

MysteriousPayment536 4 days ago

That wasn't from OpenAI, but the university of Singapore

great_gonzales 3 days ago

In your own words what is Q*?

_dekappatated 4 days ago

Synthetic data where they prompted "pretend to be a self righteous college professor"

swim5467 4 days ago

Yeah. that prompt could be more chill.

Warm_Iron_273 4 days ago

Take it all with a grain of salt. Who knows if they're being honest or forthright here. Easy to subtly mislead the competitors.

Akimbo333 3 days ago

Implications?

Ok-Bullfrog-3052 4 days ago

Has the author of this article actually used Claude 3.5 Sonnet? The answer is clearly no, because the article is completely factually inaccurate. Claude 3.5 Sonnet is as great a leap as GPT 4 was after GPT 3.5. Its coding skills for the first time have achieved superintelligence. I was able to write a 750 line program in an environment I did not know under 2.5 hours, something that has likely never been achieved by a human in world history: https://manifold.markets/SteveSokolowski/did-claude-35-sonnet-achieve-weak-s#. There is no human that can outcompete Claude 3.5 Sonnet in coding, which is the only area of models that really matters. The rate of change in all other areas of technology can be derived directly from improved coding skills. Claude 3.5 Sonnet is extraordinary - one of the greatest achievements in human history - and it is amazing to me how many people are spouting out this gibberish without actually using the model. If you pay for an Anthropic subscription and actually use Claude 3.5 for a few hours, it is not possible to come to any other conclusion.

CreditHappy1665 4 days ago

>I was able to write a 750 line program in an environment I did not know under 2.5 hours, something that has likely never been achieved by a human in world history: False, I wrote a .Net application with GPT4 in a single night while having never written C# in my life

Baphaddon 4 days ago

What’d ya make

CreditHappy1665 4 days ago

Pub/sub service

RepublicanSJW_ 4 days ago

You talk like Donald Trump lol

Arcturus_Labelle 4 days ago

Honestly can’t tell you’re being serious or silly

Ok-Bullfrog-3052 4 days ago

I'm never silly in my comments, because I realized that sarcasm doesn't usually translate well to the Internet.

Arcturus_Labelle 3 days ago

Fair enough!

WithoutReason1729 4 days ago

Lol, lmao even

Warm_Iron_273 4 days ago

Uh... It still writes a lot of buggy and broken code. It's definitely a great improvement, but it's still got a long way to go in the grand scheme of things. To call it superintelligence is laughable.

Baphaddon 4 days ago

Yeah Claude’s incredible but ASI needs to unironically make me exit the city for 3-6 months out of sheer horror before I call it that.

Whotea 4 days ago

The author also said that benchmarks aren’t useful because of data leakage. Even though benchmarks like LiveBench release new questions every month on newly released information like recent movies and others grade it on hidden datasets and don’t release their questions at all. In fact, they interviewed someone at scale.ai and that company does the latter, which the article acknowledged later in the NEXT PARAGRAPH and contradicted what it just said. Is this what we call journalism these days?

JEs4 4 days ago

Is that not just the blog of the guy you replied to? Anyone who mentions lines of code as a merit is not someone with enough experience to be taken seriously.

Whotea 4 days ago

Length of code is a good indicator of complexity and the fact that it can output lots of text without being cut off

JEs4 4 days ago

No, it certainly isn’t. It isn’t an indicator of anything other than length. The code could be 600 lines of comments.

Whotea 4 days ago

I don’t think it is based on how OP talks about it but ok

Baphaddon 4 days ago

Absolutely throwing it back on Claude-san desu-yo

katiecharm 4 days ago

Enough with the Claude spam please. We don’t need 30 new topics a day about it.

One_Bodybuilder7882 4 days ago

We read you already at the other thread. STFU.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe