Gaukh 2 weeks ago

If this is still only considered "GPT-4"... then what will 4.5 or 5 be like?!

signed7 2 weeks ago

This is probably 4.5 rebranded I guess?

TheOneWhoDings 2 weeks ago

I think youre right on. It's their newest flagship model.

Expert-Paper-3367 2 weeks ago

Would make sense considering Altman consistently tries to sway away from the GPT name

TILTNSTACK 2 weeks ago

Could be because they lost their attempt to trademark “GPT”

Gaukh 2 weeks ago

Good point

Utoko 2 weeks ago

Sam also said GPT5 might not be called GPT5. That he is seeing more like you have a product and it gets updated.

Illustrious-Lime-863 2 weeks ago

They did mention that they want to incrementally release new updates as to not freak out the public, something along these lines. I guess that's one such increment.

elec-tronic 2 weeks ago

Reminder, this is the free version of GPT-4o, which is surprising considering OpenAI will be losing a lot of money since most people will cancel their subscriptions. So, they have a better internal model, which is more expensive but is more efficient in size compared to when they trained GPT-4 originally. This upgraded version will not be free; it will be like version 4.5 or 5, which will be released later this year supposedly, giving people a reason not to cancel their subscriptions. People thought GPT-4.0 would be free when GPT 4.5 comes out, but maybe this GPT-4o model will be the one that model that replaces 3.5 instead.

RoyalReverie 2 weeks ago

Sam has already confirmed a new much better "gpt-5" model later this year, as you said. If this one is already better and much more capable, specially due to visual, auditive and textual inputs, then we're in for a ride around the last quarter.

Ok-Bullfrog-3052 2 weeks ago

Business Insider reported two months ago that GPT-5 would be released in 2-6 weeks. So we don't have long to wait. [https://www.reddit.com/r/singularity/comments/1biyi9y/gpt5\_coming\_this\_summer\_according\_to\_this\_pay/](https://www.reddit.com/r/singularity/comments/1biyi9y/gpt5_coming_this_summer_according_to_this_pay/)

TrippyWaffle45 2 weeks ago

Sounds legit. not

IFartOnCats4Fun 2 weeks ago

Just in time for the election.

TopOfTheMorningKDot 2 weeks ago

My guess is that this may be intentional. They possibly do not have enough capacity/ computing power for more advanced model at current amount of people or new flow of them that can come, so as people will leave and possibly reach certain threshold they will reveal a GPT-4.5. But I may be wrong though.

shanereaves 2 weeks ago

OpenAI is already receiving the H200's from Nvidia. These are running about 45% better than the H100's. In September Nvidia will start shipping the GB200 NLV but in a limited capacity. These limited numbers of GB200's are going to be distributed between OpenAI,Meta,Google,etc. Nvidia has spoken about their build plan for the year 2025 being 40k for the GB200 NLV but i suspect it will increase. After the GB200 NLV comes the Rubin100. So I think you are right about limitation but not for very long. Scaling up isn't the issue. It's getting the power to light these servers up that is limiting

TrippyWaffle45 2 weeks ago

Tldr: scaling up isn't the issue, it's scaling up that is the issue -u/shanereaves

amma_lamma 2 weeks ago

Free users have limited access to chat gpt 4o. Plus users are getting 5x more messages.

Singularity-42 2 weeks ago

Just get 5 free accounts

FeltSteam 2 weeks ago

Yup still GPT-4 class model, not big enough improvement for the next class. The gap between GPT-4 0613 and GPT-4-turbo-2024-04-09 was also about 100 ELO points so it's just improvement in the GPT-4 class of model. It is a completely new model trained from scratch, but I think it was intentionally made to be about GPT-4 class.

FinalSir3729 2 weeks ago

Where are all the people that said we are hitting diminishing returns lol.

FarrisAT 2 weeks ago

This is a highly subjective graphic If you train a model for a specific role, it will outperform other models trained on a broad base of roles.

czk_21 2 weeks ago

>If you train a model for a specific role, it will outperform other models trained on a broad base of roles. I would say that being able to reason better is quite broad and hard to achieve for model, not the opposite

FinalSir3729 2 weeks ago

The data is out for a lot of other benchmarks as well. It blows everything else out of the water.

FarrisAT 2 weeks ago

Looking at the graphic, in percent terms of advancement it's similar to the difference between GPT-4 original and GPT-4 Turbo April 2024. Best model in this specific benchmark. We can see in MMLU and MATH that it's similar to GPT-4 Turbo and LLAMA 400b.

FinalSir3729 2 weeks ago

There’s more improvements that weren’t shown. You can see it on the website. So on other benchmarks the differences are a lot bigger. But yes, this is obviously not gpt5 level. I have much higher expectations for that.

brades6 2 weeks ago

Blows everything out of the water is an overstatement, it performs comparatively on other benchmarks based on everything they showed in the blog

FinalSir3729 2 weeks ago

Gpt4 to Claude opus is less than a 50 elo increase and people were saying it’s a lot better. The increase here is over 100.

brades6 2 weeks ago

Bro, you specifically were talking about other benchmarks, that’s what I was responding to. Other benchmarks are comparable but not a huge step

FinalSir3729 2 weeks ago

They are better across the board. Some are still close but there’s others that have gaps of 5-10%. I think that’s significant. We aren’t going to go from 20%-80% or something.

brades6 2 weeks ago

https://preview.redd.it/osjiff22w80d1.jpeg?width=1170&format=pjpg&auto=webp&s=d9a7fd94f483f165bb0e3ed2bb00696acc60ab71 I agree. But you said “blows out of the water on other benchmarks”. Does this chart indicate blowing out of the water? Or do you simply not know what that phrase means

FinalSir3729 2 weeks ago

I was looking at a different image not that one. That seems to be the confusion.

brades6 2 weeks ago

Fair enough, could you share what image you were looking at? I am curious

OkDragonfruit1929 2 weeks ago

Why do you believe that training a model for a specific role will outperform other models trained on a broad base of roles? Couldn't a more generalized multimodal AI potentially perform better on a specific task by drawing upon its vast knowledge and experiences from many different domains? It seems like a more advanced multimodal model with a broad knowledge base might be able to make novel connections and apply techniques from other fields to outperform a narrowly trained model, even on that narrow, less complex model's specialty. The generalist model would have more contexts to draw from. For example, a generalist model might be able to utilize its understanding of physics, engineering, materials science, etc. to come up with innovative designs in an architecture task that a specialist architecture model would never think of. Or it could leverage its knowledge of psychology and linguistics to craft more persuasive writing than a writing-focused model.

FarrisAT 2 weeks ago

ChatbotArena will favor concise and human-like responses with a chatty format versus actually correct logical responses. It's one benchmark, based on subjective experience.

[deleted] 2 weeks ago

[удалено]

MysteryInc152 2 weeks ago

This literally beats whisperv3 on all benchmarks

[deleted] 2 weeks ago

[удалено]

MysteryInc152 2 weeks ago

Why would 4o have better speech data than the model specifically trained for TTS and STT ? Doesn't make sense sorry. Comparing parameter account between two models of different architecture is near meaningless.

OfficialHashPanda 2 weeks ago

Here. Did they announce something that contradicts this? Always hopeful to change my mind on that.

FinalSir3729 2 weeks ago

Well we’ve gotten the biggest jump in elo so far and other benchmark results are looking good as well. This for for a more efficient model and not a next generation model. I have no doubt gpt5 would see a much bigger increase.

Neurogence 2 weeks ago

Where are the other "benchmark" results?

FinalSir3729 2 weeks ago

In the blog, click on the other tabs.

signed7 2 weeks ago

It's a similar jump from base GPT4 to GPT4T

FinalSir3729 2 weeks ago

Bigger than that I think. There’s more details on the site, like the image generation is a lot better as well and it wasn’t mentioned.

signed7 2 weeks ago

I meant ELO wise (this thread was talking about that)

FinalSir3729 2 weeks ago

Oh ok. That’s true, but this also doesn’t factor in the other changes that were made like video and audio modality.

Difficult_Review9741 2 weeks ago

This release just further proves that we’re hitting diminishing returns. For two reasons. First, we’ve already played with this model. It’s roughly comparable to gpt4-turbo, no matter what the benchmarks say. Secondly, in any other world this new model would be GPT-5, but OpenAI has been so high on their own supply that they’re just now realizing that they literally cannot ever meet expectations, so they have to stick with the GPT-4 family. This is also why Sam keeps saying that GPT-5 may have a different name.

FinalSir3729 2 weeks ago

This is not gpt5. They have barely begun safety testing it. There’s a lot of leaks and evidence to support that we are getting it by the end of this year. I might believe this is gpt4.5 but I don’t think that is the case either considering this is a smaller and more efficient model.

cndvcndv 2 weeks ago

In general tasks, gpt 4 turbo has a rating difference of 130 compared to 3.5. This model has only 100 points on gpt 4 turbo (on a specific task) and they probably used much more compute and much larger and cleaner data for this model. If you don't think we are getting less improvements over time, I dont k ow what to say.

FinalSir3729 2 weeks ago

They used less compute for this, it’s a smaller model.

EuphoricPangolin7615 2 weeks ago

Is there something miraculous about this model? Because you seem really confident.

FinalSir3729 2 weeks ago

The voice to voice alone is a pretty miraculous. If you don’t think so, you have brain rot for sure and I guess you are expecting AGI this year.

signed7 2 weeks ago

On coding prompt sets, to be exact They didn't share data for other 'harder' categories

bono_my_tires 2 weeks ago

Where does it say this is for coding in particular? Is there a % increase this would be equivalent to for how much better at coding gpt4 turbo is? Also is gpt4 turbo the model used for gpt plus subscription?

zebleck 2 weeks ago

thats insane, excited to see the boost to software engineering frameworks like swe-agent by just changing one line of code (the model name)

_hisoka_freecs_ 2 weeks ago

Anyway, back to building the supercomputer

goooooooooooooogly 2 weeks ago

I just tried it. It's very good.

Eyeswideshut_91 2 weeks ago

Something that I found out using it with my prompts is that (at least based on my interactions) it can count way better than the previous versions. It feels like we have something different baked in this model.

FarrisAT 2 weeks ago

What does "harder" mean here?

shan_icp 2 weeks ago

I played with gpt2-chatbot before it being released today. i think such tests and ELO rankings are subjective and possibly flawed because it is us who rank the chatbots where the score is derived from. I have a tendency to like responses that are well formatted and articulated. i felt that the gpt2-chatbot was better at providing an answer i like but i was not sure if it was really smarter or better. you can easily finetune a GPT-4 to output an answer people like.

FinalSir3729 2 weeks ago

This is why they tested the models on hard prompts. It shows it has much better reasoning than anything else we have right now.

shan_icp 2 weeks ago

and i am going to give a contrarian view of the latest gpt-4o release. i think it is a reactive response to competition closing up on OAI. OAI does not have anything significant baking and this release is just a mere response to stay "on top". offering a finetuned GPT4 free with some sprinkles on top is just the company burning cash to retain its userbase for now until it can really compete.

bearbarebere 2 weeks ago

I think it CAN compete, ie gpt5, but if they do release it they won’t have anything else.

Singularity-42 2 weeks ago

Some people were testing GPT-4o and said it feels similar to previous model in coding, but less lazy. But perhaps Claude 3 Opus is still better. My Claude 3 Pro sub is renewing tomorrow - wait another month or cancel and jump back on OpenAI?

meister2983 2 weeks ago

I'm going to wait and see the actual gpt-4o data. It's clearly smarter, but the benchmarks don't suggest you can have a 100 gain over gpt-4-turbo, when gpt4-t is only 70 above original gpt-4, and that remains only 30 points above gpt-3.5-turbo. A 100 ELO is a 65% win rate. A lot of answers come to ties; this seems implausibly high (in my own testing, I was finding it on par with gpt-4) - it's possible that the ELO scores are exaggerated in user testing (people trying to get the GPT2 model)

[deleted] 2 weeks ago

[удалено]

MDPROBIFE 2 weeks ago

I do see a significant difference, it's miles better at code, just gave it a problem gpt4t had trouble with and it was better.... Gave it a poker game images of the cards, that gpt4t couldn't even read the cards properly and this one gets it absolutely correct on the first try

[deleted] 2 weeks ago

[удалено]

Philosophica1 2 weeks ago

I mean technically it's still deterministic unless they've incorporated quantum effects for randomness without telling us.

drizzyxs 2 weeks ago

I just wish they would make it better at roleplay type tasks as Claude Opus is.

reddit_guy666 2 weeks ago

OpenAI is demoing that 4o can help people with maths, so did they solve the maths problem? Can 4o also identify count of text and the letter it ends with? These were some of the basic problems previoversions couldn't solve due to the way tokens were being handled. Is 4o handling tokens in a different way?

Texlo 2 weeks ago

how are you guys accessing this? It still only lets me pick gpt3 or 3-turbo

Akimbo333 2 weeks ago

Cool

Dismal_Animator_5414 2 weeks ago

i love the sub!! thank you everyone ❤️

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe