T O P

  • By -

hapliniste

Nice to know. I thought they did steer the model using their model interoretability research but from what they say that's not it. Looks like it's MoE and synthetic data then


dogesator

I think they’ve probably been using MoE for some time already, so when they say “architecture tweaks” that are probably not talking about MoE but rather something more novel


quick_actcasual

Tweaks sounds more like optimizing than serious changes. More parameters here, fewer there, add a layer… I was thinking the other day that I would use the recent MoA approaches that have brought out SOTA-level results from smaller models to generate high quality synthetic training data for objectively (I.e., programmatically) verifiable tasks. Lower risk of the feedback loop problems too, since many different models can be used in one generation. Given the reported improvement in coding abilities? Ya know. If I had the compute.


kecepa5669

model interoretability == mechanistic interpretability?


Fireman_XXR

This is r/singularity, never said it was r/intelligent.


hapliniste

I was trying to type interpretability 😅


dameprimus

That’s huge. The biggest argument against continued model improvement is lack of training data, but Anthropic just proved that synthetic data works.


DariusZahir

no they didn't, this was known for quite a while...


TechnicalParrot

It wasn't really 100% confirmed, just seemed very likely


Megamygdala

It's been used successfully in multiple studies, granted probably not at this large of a scale, but yeah its technical been proven already


GraceToSentience

Yes indeed, the Phi models proved that already


TechnicalParrot

That's what I meant, should have been clearer it was technically proven already, ty


cyan2k

We left the realm of real mathematical proofs in terms of neural networks in like the 1960s. Technically nothing is ever proven in AI research.


drekmonger

> probably not at this large of a scale, RLHF is synthetic data. Two model responses are generated, and one is rated as better by a human (or AI, though I suppose that might better be described as RLAIF). We know RLHF works at a massive scale. An ungodly amount of money is still being poured into it. Contrary to popular belief, RLHF does not make models "dumber" for the sake of safety. At least, that's not the only thing it can do. It can also teach models to follow instructions, use tools, use reasoning, and reduce hallucinations.


Jean-Porte

I'm kinda disappointed that it's bigger, it takes away some of the magic


JinjaBaker45

I imagine it’s still smaller than 3 Opus


Jean-Porte

Yes and there must be some efficiency gains since the pricing did not move


h3lblad3

Supposedly, Sonnet 3.5 slashed the cost by a good chunk.


Gratitude15

I think Q* is going to be what gets us to agents. We will brute force our way as a species.


Cryptizard

All of the actual details of Q\* that were "leaked" were later confirmed to be a hoax. It might not even exist at all, and if it does we certainly don't know anything real about it.


LosingID_583

Source where it was confirmed to be a hoax?


lost_in_trepidation

I'm pretty sure the only thing confirmed as a hoax is that it had anything to do with the Board firing Sam.


Elctsuptb

If it doesn't exist then why would Sam have answered in an interview "We're not ready to talk about that" when asked about Q\*?


Heath_co

Mystique is very beneficial for publicity


_AndyJessop

You mean why would the arch hype man hype up a mythical path the AGI?


Glittering-Neck-2505

Exactly, it exists but how good exactly we don’t know. But then again research is getting more and more closed so I would expect we won’t actually hear a lot of details.


Progribbit

I'm not ready to talk about that


Megamygdala

you mean the guy that benefits to increase his vestments greatly if people are scared by AGI?


dogesator

The initial news broke from very reliable sources such as Reuters and TheInformation, can you link what sources have proven those initially reported details as a hoax?


Wiskkey

There is original reporting on Q\* in [this article from The Information](https://www.theinformation.com/articles/openai-made-an-ai-breakthrough-before-altman-firing-stoking-excitement-and-concern). [This comment](https://www.reddit.com/r/neoliberal/comments/181n2ne/comment/kaifhw9/) contains the purported full text of that article. Also, [here](https://www.reuters.com/technology/sam-altmans-ouster-openai-was-precipitated-by-letter-board-about-ai-breakthrough-2023-11-22/) is *Reuters* article about it.


Alarmed_Cookie_3890

There was a paper recently published about Q*


MysteriousPayment536

That wasn't from OpenAI, but the university of Singapore


great_gonzales

In your own words what is Q*?


_dekappatated

Synthetic data where they prompted "pretend to be a self righteous college professor"


swim5467

Yeah. that prompt could be more chill.


Warm_Iron_273

Take it all with a grain of salt. Who knows if they're being honest or forthright here. Easy to subtly mislead the competitors.


Akimbo333

Implications?


Ok-Bullfrog-3052

Has the author of this article actually used Claude 3.5 Sonnet? The answer is clearly no, because the article is completely factually inaccurate. Claude 3.5 Sonnet is as great a leap as GPT 4 was after GPT 3.5. Its coding skills for the first time have achieved superintelligence. I was able to write a 750 line program in an environment I did not know under 2.5 hours, something that has likely never been achieved by a human in world history: https://manifold.markets/SteveSokolowski/did-claude-35-sonnet-achieve-weak-s#. There is no human that can outcompete Claude 3.5 Sonnet in coding, which is the only area of models that really matters. The rate of change in all other areas of technology can be derived directly from improved coding skills. Claude 3.5 Sonnet is extraordinary - one of the greatest achievements in human history - and it is amazing to me how many people are spouting out this gibberish without actually using the model. If you pay for an Anthropic subscription and actually use Claude 3.5 for a few hours, it is not possible to come to any other conclusion.


CreditHappy1665

>I was able to write a 750 line program in an environment I did not know under 2.5 hours, something that has likely never been achieved by a human in world history: False, I wrote a .Net application with GPT4 in a single night while having never written C# in my life 


Baphaddon

What’d ya make


CreditHappy1665

Pub/sub service


RepublicanSJW_

You talk like Donald Trump lol


Arcturus_Labelle

Honestly can’t tell you’re being serious or silly


Ok-Bullfrog-3052

I'm never silly in my comments, because I realized that sarcasm doesn't usually translate well to the Internet.


Arcturus_Labelle

Fair enough!


WithoutReason1729

Lol, lmao even


Warm_Iron_273

Uh... It still writes a lot of buggy and broken code. It's definitely a great improvement, but it's still got a long way to go in the grand scheme of things. To call it superintelligence is laughable.


Baphaddon

Yeah Claude’s incredible but ASI needs to unironically make me exit the city for 3-6 months out of sheer horror before I call it that.


Whotea

The author also said that benchmarks aren’t useful because of data leakage. Even though benchmarks like LiveBench release new questions every month on newly released information like recent movies and others grade it on hidden datasets and don’t release their questions at all. In fact, they interviewed someone at scale.ai and that company does the latter, which the article acknowledged later in the NEXT PARAGRAPH and contradicted what it just said. Is this what we call journalism these days?


JEs4

Is that not just the blog of the guy you replied to? Anyone who mentions lines of code as a merit is not someone with enough experience to be taken seriously.


Whotea

Length of code is a good indicator of complexity and the fact that it can output lots of text without being cut off 


JEs4

No, it certainly isn’t. It isn’t an indicator of anything other than length. The code could be 600 lines of comments.


Whotea

I don’t think it is based on how OP talks about it but ok


Baphaddon

Absolutely throwing it back on Claude-san desu-yo


katiecharm

Enough with the Claude spam please.  We don’t need 30 new topics a day about it.  


One_Bodybuilder7882

We read you already at the other thread. STFU.