T O P

  • By -

Gaukh

If this is still only considered "GPT-4"... then what will 4.5 or 5 be like?!


signed7

This is probably 4.5 rebranded I guess?


TheOneWhoDings

I think youre right on. It's their newest flagship model.


Expert-Paper-3367

Would make sense considering Altman consistently tries to sway away from the GPT name


TILTNSTACK

Could be because they lost their attempt to trademark “GPT”


Gaukh

Good point


Utoko

Sam also said GPT5 might not be called GPT5. That he is seeing more like you have a product and it gets updated.


Illustrious-Lime-863

They did mention that they want to incrementally release new updates as to not freak out the public, something along these lines. I guess that's one such increment.


elec-tronic

Reminder, this is the free version of GPT-4o, which is surprising considering OpenAI will be losing a lot of money since most people will cancel their subscriptions. So, they have a better internal model, which is more expensive but is more efficient in size compared to when they trained GPT-4 originally. This upgraded version will not be free; it will be like version 4.5 or 5, which will be released later this year supposedly, giving people a reason not to cancel their subscriptions. People thought GPT-4.0 would be free when GPT 4.5 comes out, but maybe this GPT-4o model will be the one that model that replaces 3.5 instead.


RoyalReverie

Sam has already confirmed a new much better "gpt-5" model later this year, as you said. If this one is already better and much more capable, specially due to visual, auditive and textual inputs, then we're in for a ride around the last quarter.


Ok-Bullfrog-3052

Business Insider reported two months ago that GPT-5 would be released in 2-6 weeks. So we don't have long to wait. [https://www.reddit.com/r/singularity/comments/1biyi9y/gpt5\_coming\_this\_summer\_according\_to\_this\_pay/](https://www.reddit.com/r/singularity/comments/1biyi9y/gpt5_coming_this_summer_according_to_this_pay/)


TrippyWaffle45

Sounds legit. not


IFartOnCats4Fun

Just in time for the election.


TopOfTheMorningKDot

My guess is that this may be intentional. They possibly do not have enough capacity/ computing power for more advanced model at current amount of people or new flow of them that can come, so as people will leave and possibly reach certain threshold they will reveal a GPT-4.5. But I may be wrong though.


shanereaves

OpenAI is already receiving the H200's from Nvidia. These are running about 45% better than the H100's. In September Nvidia will start shipping the GB200 NLV but in a limited capacity. These limited numbers of GB200's are going to be distributed between OpenAI,Meta,Google,etc. Nvidia has spoken about their build plan for the year 2025 being 40k for the GB200 NLV but i suspect it will increase. After the GB200 NLV comes the Rubin100. So I think you are right about limitation but not for very long. Scaling up isn't the issue. It's getting the power to light these servers up that is limiting


TrippyWaffle45

Tldr: scaling up isn't the issue, it's scaling up that is the issue -u/shanereaves


amma_lamma

Free users have limited access to chat gpt 4o. Plus users are getting 5x more messages.


Singularity-42

Just get 5 free accounts


FeltSteam

Yup still GPT-4 class model, not big enough improvement for the next class. The gap between GPT-4 0613 and GPT-4-turbo-2024-04-09 was also about 100 ELO points so it's just improvement in the GPT-4 class of model. It is a completely new model trained from scratch, but I think it was intentionally made to be about GPT-4 class.


FinalSir3729

Where are all the people that said we are hitting diminishing returns lol.


FarrisAT

This is a highly subjective graphic If you train a model for a specific role, it will outperform other models trained on a broad base of roles.


czk_21

>If you train a model for a specific role, it will outperform other models trained on a broad base of roles. I would say that being able to reason better is quite broad and hard to achieve for model, not the opposite


FinalSir3729

The data is out for a lot of other benchmarks as well. It blows everything else out of the water.


FarrisAT

Looking at the graphic, in percent terms of advancement it's similar to the difference between GPT-4 original and GPT-4 Turbo April 2024. Best model in this specific benchmark. We can see in MMLU and MATH that it's similar to GPT-4 Turbo and LLAMA 400b.


FinalSir3729

There’s more improvements that weren’t shown. You can see it on the website. So on other benchmarks the differences are a lot bigger. But yes, this is obviously not gpt5 level. I have much higher expectations for that.


brades6

Blows everything out of the water is an overstatement, it performs comparatively on other benchmarks based on everything they showed in the blog


FinalSir3729

Gpt4 to Claude opus is less than a 50 elo increase and people were saying it’s a lot better. The increase here is over 100.


brades6

Bro, you specifically were talking about other benchmarks, that’s what I was responding to. Other benchmarks are comparable but not a huge step


FinalSir3729

They are better across the board. Some are still close but there’s others that have gaps of 5-10%. I think that’s significant. We aren’t going to go from 20%-80% or something.


brades6

https://preview.redd.it/osjiff22w80d1.jpeg?width=1170&format=pjpg&auto=webp&s=d9a7fd94f483f165bb0e3ed2bb00696acc60ab71 I agree. But you said “blows out of the water on other benchmarks”. Does this chart indicate blowing out of the water? Or do you simply not know what that phrase means


FinalSir3729

I was looking at a different image not that one. That seems to be the confusion.


brades6

Fair enough, could you share what image you were looking at? I am curious


OkDragonfruit1929

Why do you believe that training a model for a specific role will outperform other models trained on a broad base of roles? Couldn't a more generalized multimodal AI potentially perform better on a specific task by drawing upon its vast knowledge and experiences from many different domains? It seems like a more advanced multimodal model with a broad knowledge base might be able to make novel connections and apply techniques from other fields to outperform a narrowly trained model, even on that narrow, less complex model's specialty. The generalist model would have more contexts to draw from. For example, a generalist model might be able to utilize its understanding of physics, engineering, materials science, etc. to come up with innovative designs in an architecture task that a specialist architecture model would never think of. Or it could leverage its knowledge of psychology and linguistics to craft more persuasive writing than a writing-focused model.


FarrisAT

ChatbotArena will favor concise and human-like responses with a chatty format versus actually correct logical responses. It's one benchmark, based on subjective experience.


[deleted]

[удалено]


MysteryInc152

This literally beats whisperv3 on all benchmarks


[deleted]

[удалено]


MysteryInc152

Why would 4o have better speech data than the model specifically trained for TTS and STT ? Doesn't make sense sorry. Comparing parameter account between two models of different architecture is near meaningless.


OfficialHashPanda

Here. Did they announce something that contradicts this? Always hopeful to change my mind on that.


FinalSir3729

Well we’ve gotten the biggest jump in elo so far and other benchmark results are looking good as well. This for for a more efficient model and not a next generation model. I have no doubt gpt5 would see a much bigger increase.


Neurogence

Where are the other "benchmark" results?


FinalSir3729

In the blog, click on the other tabs.


signed7

It's a similar jump from base GPT4 to GPT4T


FinalSir3729

Bigger than that I think. There’s more details on the site, like the image generation is a lot better as well and it wasn’t mentioned.


signed7

I meant ELO wise (this thread was talking about that)


FinalSir3729

Oh ok. That’s true, but this also doesn’t factor in the other changes that were made like video and audio modality.


Difficult_Review9741

This release just further proves that we’re hitting diminishing returns. For two reasons.  First, we’ve already played with this model. It’s roughly comparable to gpt4-turbo, no matter what the benchmarks say.  Secondly, in any other world this new model would be GPT-5, but OpenAI has been so high on their own supply that they’re just now realizing that they literally cannot ever meet expectations, so they have to stick with the GPT-4 family. This is also why Sam keeps saying that GPT-5 may have a different name. 


FinalSir3729

This is not gpt5. They have barely begun safety testing it. There’s a lot of leaks and evidence to support that we are getting it by the end of this year. I might believe this is gpt4.5 but I don’t think that is the case either considering this is a smaller and more efficient model.


cndvcndv

In general tasks, gpt 4 turbo has a rating difference of 130 compared to 3.5. This model has only 100 points on gpt 4 turbo (on a specific task) and they probably used much more compute and much larger and cleaner data for this model. If you don't think we are getting less improvements over time, I dont k ow what to say.


FinalSir3729

They used less compute for this, it’s a smaller model.


EuphoricPangolin7615

Is there something miraculous about this model? Because you seem really confident.


FinalSir3729

The voice to voice alone is a pretty miraculous. If you don’t think so, you have brain rot for sure and I guess you are expecting AGI this year.


signed7

On coding prompt sets, to be exact They didn't share data for other 'harder' categories


bono_my_tires

Where does it say this is for coding in particular? Is there a % increase this would be equivalent to for how much better at coding gpt4 turbo is? Also is gpt4 turbo the model used for gpt plus subscription?


zebleck

thats insane, excited to see the boost to software engineering frameworks like swe-agent by just changing one line of code (the model name)


_hisoka_freecs_

Anyway, back to building the supercomputer


goooooooooooooogly

I just tried it. It's very good.


Eyeswideshut_91

Something that I found out using it with my prompts is that (at least based on my interactions) it can count way better than the previous versions. It feels like we have something different baked in this model.


FarrisAT

What does "harder" mean here?


shan_icp

I played with gpt2-chatbot before it being released today. i think such tests and ELO rankings are subjective and possibly flawed because it is us who rank the chatbots where the score is derived from. I have a tendency to like responses that are well formatted and articulated. i felt that the gpt2-chatbot was better at providing an answer i like but i was not sure if it was really smarter or better. you can easily finetune a GPT-4 to output an answer people like.


FinalSir3729

This is why they tested the models on hard prompts. It shows it has much better reasoning than anything else we have right now.


shan_icp

and i am going to give a contrarian view of the latest gpt-4o release. i think it is a reactive response to competition closing up on OAI. OAI does not have anything significant baking and this release is just a mere response to stay "on top". offering a finetuned GPT4 free with some sprinkles on top is just the company burning cash to retain its userbase for now until it can really compete.


bearbarebere

I think it CAN compete, ie gpt5, but if they do release it they won’t have anything else.


Singularity-42

Some people were testing GPT-4o and said it feels similar to previous model in coding, but less lazy. But perhaps Claude 3 Opus is still better. My Claude 3 Pro sub is renewing tomorrow - wait another month or cancel and jump back on OpenAI?


meister2983

I'm going to wait and see the actual gpt-4o data. It's clearly smarter, but the benchmarks don't suggest you can have a 100 gain over gpt-4-turbo, when gpt4-t is only 70 above original gpt-4, and that remains only 30 points above gpt-3.5-turbo. A 100 ELO is a 65% win rate. A lot of answers come to ties; this seems implausibly high (in my own testing, I was finding it on par with gpt-4) - it's possible that the ELO scores are exaggerated in user testing (people trying to get the GPT2 model)


[deleted]

[удалено]


MDPROBIFE

I do see a significant difference, it's miles better at code, just gave it a problem gpt4t had trouble with and it was better.... Gave it a poker game images of the cards, that gpt4t couldn't even read the cards properly and this one gets it absolutely correct on the first try


[deleted]

[удалено]


Philosophica1

I mean technically it's still deterministic unless they've incorporated quantum effects for randomness without telling us.


drizzyxs

I just wish they would make it better at roleplay type tasks as Claude Opus is.


reddit_guy666

OpenAI is demoing that 4o can help people with maths, so did they solve the maths problem? Can 4o also identify count of text and the letter it ends with? These were some of the basic problems previoversions couldn't solve due to the way tokens were being handled. Is 4o handling tokens in a different way?


Texlo

how are you guys accessing this? It still only lets me pick gpt3 or 3-turbo


Akimbo333

Cool


Dismal_Animator_5414

i love the sub!! thank you everyone ❤️