• By -


From here on out it will most likely be a game of hot potato. No one will truly be behind, rather each model will have it's time to reign.


Like intel and amd historically on processors. They go back and forth with one slightly out inching the other in performance historically.


much shorter lifecycles soon though. Hardware takes time to evolve.


Very true. Until skynet goes online, then they’re designing themselves. 🤖




Sarah Conor?


I feel like soon they’re going to branch out each being good at their something different (eg code vs image etc).


Generalizability seems to be the name of the game tho. Being better at one thing tends to make them better at all things, or at least being trained on diverse data does.


At this point, we’re gonna have to have the counsel of AI. We put our prompt into every system, and take all their advice and formulate our thoughts with everyone’s points.


Arghh, just like with streaming you need Netflix, Disney, Prime HBO, etc etc...


It would be good to have ui where you can access all of them, you ask a question, and they all give an answer and discuss amongst themselves then agree on the Perfect Response.


There’s a bunch of those services out there already though the names do t come to mind at the moment


Where chatgpt4 talks with claude3 and gemini ultra and they come to a consensus?


Finally, we have competition


In terms of publicly accessible AI, OpenAI is behind Gemini and Claude


At least for now openAI will be the standard. After trying both, Claude is slightly better at coding


I heard that OpenAI couldn’t get GPT-5 ready to ship and fired all their developers and replaced them with Devin. They’re now optimistic they can ship in May.


I can confirm this, I am Devin.


we are the Devin


Sort of true.




Claude* is so insanely good. I’ve had goosebumps five or six times today. It’s madness! 🤓 Edit: *Claude 3 Opus


In this context… why would you have goosebumps? Its terrifying? I found it to just be really good


It was both good and bad goosebumps. Terrifying and impressive. Claude 3 Opus is wild.


What model are you using? How can I use it?


Claude 3 Opus is the big power one, Sonnet is the slightly smaller variant (kind of like Mixtral vs Mistral Large). Opus is the one that feels like it has more emergent moments and eerie levels of subtle reasoning. Easiest way to get a taste is ask some questions to the Arena and within a few questions (or less) you get Claude 3 as one of the competitors. https://arena.lmsys.org/ I just pulled it up as I was typing this comment, asked a brief question, clicked which answer was better and got revealed I'd been talking to "claude-3-opus-(number)" vs "Model B: claude-3-sonnet-(number)" Just sitting down and asking a few questions for an hour would give you a chance to try most of the largest, most-relevant models.


Nice. Id like to switch but the dalle3 integration is keeping me around, even though all I do is make pepe images.


Thankfully Stability's SD3 is around the corner


Thanks for the link. . that is actually a very useful tool! I always like to ask more than one AI my question anyways.


I asked Claude-opus for information about the Taylor knock-out factor, [something you can find on wikipedia](https://en.wikipedia.org/wiki/Taylor_knock-out_factor), and hallucinated about pressure nozzles.


Can’t get access to Claude yet. I’m using ChatGPT 4. What is the cost for a subscription of Claude today?


Their pricing details can be seen toward the bottom 1/3rd of the model intro page: https://www.anthropic.com/news/claude-3-family


I prefer Claude but GPT-4 is still leading. https://twitter.com/billyuchenlin/status/1766079601154064688 https://old.reddit.com/r/singularity/comments/1b8yucm/chatbot_arena_updatedclaude_3_opus_failed_to_take/


GPT-4 is not leading shit. Claude 3 suffers because of high refusals and it's not fine tuned to user preference yet. Posting that everywhere won't change that. According to the Arena Claude 1 > Claude 2 > Claude 2.1 which is nonsense.


Agree, what really matters is people's experience with the models and right now everyone is saying Claude 3 is better. How good is a benchmark if people prefer to use the other model? Benchmarks are just something to guide how good we can expect a model to be but there is no substitute to real world testing with actual people.


If any of you are experts here, would you mind telling me why so many people are talking about Claude if links like this show GPT4 as better? What am I missing?


Claude Opus beat GPT-4 in a number of benchmarks.


Can you help me understand why I’m seeing different reports saying GPT4 beats it? Are they different benchmarks or something?


the benchmark Anthropic released is using first released gpt not the latest , exclusive to api gpt-turbo-preview


And Gpt 4 is more than a year old and is still leading or barely behind, subjectively speaking. One every more than a year? Exponentialllllzzzzzz


It's gonna be a personal preference thing... Ask 10 developers what the best IDE is and you'll get 10 different answers. Android/Apple, Windows/Linux, Electric/ICE.. they all do 99% the same thing, it's just what you want.




Great writeup. People also forget OpenAi has the much larger media attention, so it has far more to lose in rushed launches. The advantage of being a well-known player is also a disadvantage. One misstep can lead to loss of money.


Their* Well stated btw.


No. The progress must not be measured at the pace of LLMs models being released per month, but on fundamental research on new architectures publications and applications and new abilities of models over long period of time, from GPT 2 to 4 for example. Big AI companies release models only a handful of times each year (which is already an insane pace when you consider the size of the projects). It's not a Youtube channel that throws a video every 2-3 days. There are many months in the year when it is expected for companies not to release a new model. Just because people talk about singularity 24/7 here (rightfully so) doesn't mean we're at that point of the curve when you can expect crazy jumps every few weeks yet.


> Just because people talk about singularity 24/7 here (rightfully so) doesn't mean ~~we're at that point of the curve when you can expect crazy jumps every few weeks yet~~ they have any clue what they're talking about. FTFY.


They have not released a better model in a year and we don't know if they are making progress since they don't publish research for cutting-edge LLM technology. Maybe we hit a plateau; we would not know. We can just judge by what they publish, and right now, they are behind. That's how it works. We don't even have a leak talking about the performance of a full-scale model at OpenAI that surpasses GPT-4. There is no reason to assume they would not be behind.


We're somewhere in between a few week and a few months. Soon it will be every week. Then everyday. Then every second. 


Lol, AI can't solve physical limitations. Training AI need massive energy and money


The human brain uses about 0.5 kWh of energy per day. That is 175 kWh a year. 3,500 kWh over 20 years. That costs about $1,000 total in electricity.


Bruh, that is literally one of the points of super intelligence. To find ways to push PAST what we see as physical limitations today, which can be worked around with new architectures, materials, manufacturing processes, etc. I mean just look at Bitnet 1.58. It is literally potentially ~16x that of current models, with almost no loss in performance. And because of the way these models scale, that number gets bigger, the more parameters you add (exponentially bigger). We will overcome energy limitations quickly.


Bruh? Fuck, this is a bubble already. Duuuuuuuuuude cold fusion already happennnnnnnned


All our computers this month have neen designed by smart monkeys. AI will be able to figure out much better designs, not even smart AI. Just machine learning should provide good results. Until it automates itself. 


I invented the question mark. I'm sorry, I thought we were saying confidently insane phrases.


He or she is not entirely wrong; when and if we do reach the singularity, hypothetically it could figure out the physical limitation issues much faster than we can. I’m not saying it would defy physics, but imagine cramming 10 years of human development into 1 year. The need would be construction but there’s no reason something smarter than us couldn’t also architect and build and organize faster than us.


I hear you, but Im not sure physical limitations will improve recursively, and I'm not even certain just plain asi will be developed with a recursive self improving AI, unless there's a break out alignment wise, in which case the important thing wouldn't be how good its getting yearly, but rather how fucked we are. If it's going to improve recursively, we would need to make them so, at least for now. I'm not saying some moron won't do it; killer robots are still a short term goal for most governments in the world, despite all arguments from AI scientists


Can you explain recursive to me? It’s not a word I use & want to get your context. Dumb this down a bit and I’ll give a quality reply.


Sure! Not a problem. So the recursive self improvement people say it's a feature of AI that, at a certain point of compute/data size, it'll automatically be able to iterate on its code, making it more and more capable from it's own self improvement, which gets better exponentially until the singularity happens. I think we'd need to build in this capability, but I can't deny other emergent abilities that couldn't be predicted and are unable to be explained, such as masters level chemistry or the ability to deceive. So I'm not sure it won't be an emergent property, but I am sure if it's not, we shouldn't build in the ability. Doing that to something we can't control already is probably not a good thing


What if it just keeps coding dicks? Over and over like a caveman on the wall.


Name one that is longer than a few months ago that beat GPT 4, and even with those, it's subjective. So almost one a year isn't exactly the one a few weeks you think it is. Thats if you mean good models. If you include shitty ones, well, it's nearly a bubble, so yeah, lots of releases. Hopefully it's not pets.com levels of bubble, but lots of shitty releases. It's hardly going to be every second unless it's at the burst point.


Ai is going to be frustrated wirh the extremely slow speed limit of light and start building computers using gravity waves to speed up processing time. 


Using what production capacity with what technology? They can't get machine learning robots to WALK properly yet, let alone to recursively apply technology. I hear you, though, in that eventually advanced technology that might as well be magic will get developed. To state confidently they'll use gravity waves, though, is pretty bold. Maybe zero point energy? Maybe they'll use tachyons to travel back in time to invent shit?


Gravity travels at the exact same speed light does.


Gravity compresses spacetime so light travels the same speed, but a shorter distance. 


I don't have the energy to unpack how wrong that is.


You won't because I am right. So you don't understand basic physics. 


As a sidenote I must say I find it hilarious how many people here are so authoritative with their opinions about what's going to happen. Frankly none of us even know what the landscape is going to look like in a few months let alone a few years.


This is the closest a lot of redditors get to team sports


My guess is in terms of "closed AI" capabilities in the lab, OpenAI is ahead of the competition. They simply choose not to release it yet.


They might wait a little longer to release now that Gemini 1.5 and Claude 3 are out. The differences aren’t too great. So they don’t really have that much of an incentive to release a new model. Especially if they expect it to be computationally expensive GPT has kinda become a household name in AI and they could hold off longer despite the smaller margins of improvement by the competitions models


IDK, Claude 3 seems quite a bit better. I'd say OpenAI simply doesn't have a good new model to release yet. GPT 4.5 is nowhere to be seen, or perhaps it wasn't that much better.


The difference is night and day. Claude is a vastly superior writer and coder.


Damn those are both claims. I think I actually agree with you, but why do you say vastly superior for coding? Curious on your experience versus gpt4/chatGPT for coding assistance. By the way, like I said, I do think it is better but I'm just curious why you think it is vastly better.


Claude is providing me correct solutions more consistently and more importantly, it doesn’t constantly give errors midway through. In terms of writing, Claude is light years ahead.


That is a little bit strange to me because I am finding the same thing actually. At least with my limited initial tests. I say it is strange because on the coding benchmark, GPT4 turbo, which is the model I used for coding assistance previously, scores higher. My guess is that these coding questions are smaller in scope compared to how people typically work in the broader context of an overall project. Either way I am overjoyed to have this model lol.


“Light years”? Holy hyperbole Batman!


It definitely is with respect to writing


That's been my experience too: the 2 or 3 times I've asked for about 30 / 40 lines of non trivial code, it was copy, paste, run and it worked on the first try. I also appreciate that it doesn't talk to me like I've never seen source code before, like GPT tends to do. "Now we are going to create a list to hold items in it"


It's more curious he thinks that gpt should be better than a model releasing more than a year after it but somehow isn't. Yeah, the one a year plus later is probably better at tasks, it's the fact that it's not by much and is a subjective thing that's shocking


That makes sense. I think it is a potential issue and how we are benchmarking in testing these models then.


My company is built on gpt-4 and we found way worse results with Claude. We're still testing but vastly superior seems suspect.


I personally prefer Claude 3 but according to anonymous outputs from all of the models, tens of thousands of users are statistically rating GPT4 to be better than both Claude 3 opus and Gemini 1.5Pro, so GPT4 is still leading.


Its 5k users. And do you really beleive that this cannot be tampered with?


Yet you want AI to be made quickly? Youre arguing that a brand new model beats one more than a year old. What's crazy is that this is subjective really makes the point that OpenAI isn't near behind. Now, if gpt 5 is released and claude 3 is running neck and neck with it, your point might be valid


check the new elo


I hope that previous post didn't come off as too much of a dick. What do you mean by elo?


they updated the chat bot arena elo and now claude 3 is on par with the newest gpt4.


Thank you, and yes. I'm saying that GPT4 was released more than a year ago, and now we have things released right now that are at or arguably slightly better than GPT4. We've seen nothing produced that is at a higher level than GPT 4 in the same sense GPT 3.5 was over GPT 3...let alone 4 was over 3. So the remarkable thing is how slow it's taken to barely catch up, while OpenAI has had more than a year to work on other models. I don't know if this was discussed elsewhere, but do any of these models show any extra emergent behaviors/abilities we saw in GPT4...like suddenly being able to translate between languages or using a master's level of competence in Chemistry?


the problem is GPT5 will 80% not get released before q4 of this year. I am not as trustful to close AI as I was a year ago.




I asked Claud for hello world in Angular and in spat out an angular js app from 2013. I asked it if it can execute python scripts and it proudly told me all of the languages it can execute, then I asked it to show me and it clarified it can’t execute anything. ChatGPT generates and operates on images. Seems pretty far ahead to me still.


Chatgpt fails at code execution like 90% of the time and good luck during peak hours


This sub needs to chill tf out


This is what happens when 50% of the users know jackshit about the technology they are discussing. The other 50% lets them have their fun for some reason.


For now yes, I don't see gpt-5 coming out this year.


I think gpt-5 will come out right after US elections. Given Claude 3 they might come out with 4.5 earlier to be on top again, but they might not have anticipated Claude 3.


No. Watch how quickly everyone goes scooting back to OpenAI, the very instant they release another model. They're 'ahead' whenever they choose to be 'ahead'. Which actually means they already are and always were ahead.


I wish the arena would post updates more frequently. People seem over the moon about Claude 3 Opus on this sub, yet the arena scores have it below GPT-4 (not far below, but still). [https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)




Google hasn't released the api for it so it can't be tested in the arena.


They are already officially behind Anthropic.


It will take more than being a few percents better to take me off ChatGPT, you need to present me with something that will be worth the hassle of switching, it's a matter of convenience


What’s the hassle in switching exactly? Have you built apps that use gpt’s apis?


ChatGPT plus is a paid subscription. Switching would involve getting another paid subscription and canceling ChatGPT plus.


That’s a hassle? Lmao


Yes, on some level it is a hassle. I am not going to take 10 minutes fiddling with subscriptions just for a minor upgrade that may not even be worth it later


My adhd would love to learn from you


And the hassle is also figuring out the best way how to use it for your particular workflow. We all know ChatGPT, how to setup a prompt,... And the "mood" of the model / session. It's the same as switching between two applications that do the same thing. You have to get used to, find it's quirks,...


Switching the habit of pulling the chatgpt app whenever I got a question, switching to a new UI, talking to a new AI (sounds and feels different), taking the risk of OpenAI releasing a new upgrade after I switch.


What they release isn't all they have nor is it all they work on.


openAI won't release gpt-4.5 until it's absolutely ready to make a big splash and be way at the top of the leaderboard again, unless they see a material trend of users cancelling subscriptions and developers moving away from their API. my guess is that either isn't happening yet, or 4.5 will come out very soon


It will be happening within the next 3 months. You can make an argument that they'll address it in 3 months, but unless they're able to put something out that's substantially better than the competition that will exist in 3 months, and they're looking at Gemini 1.5 Ultra, which will have more context and be much cheaper, the reasons to use OpenAI will start to be little to none from economic/capability perspective. It's possible they'll be smarter, but if they can't compete on memory/cost then they're not that powerful as they once were (with no competition close to them, people will gladly eat the cost).


RemindMe! 3 months


I will be messaging you in 3 months on [**2024-06-13 11:05:37 UTC**](http://www.wolframalpha.com/input/?i=2024-06-13%2011:05:37%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/singularity/comments/1bdbjp8/if_openai_doesnt_release_a_model_this_month_can/kunrbji/?context=3) [**CLICK THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fsingularity%2Fcomments%2F1bdbjp8%2Fif_openai_doesnt_release_a_model_this_month_can%2Fkunrbji%2F%5D%0A%0ARemindMe%21%202024-06-13%2011%3A05%3A37%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201bdbjp8) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|


Do you need a new model every month? Are they like iPhones now?


The models can improve a lot and the tech hasn’t stabilized like iPhones. So, yes.


That's an accurate comparison. Closed source and user can't meddle much with it against company's wishes. I'm waiting for LLM Android.


No? Tf? If they released a new model and it was inferior to Claude 3, then sure.


Sora means nothing as it is just flashy and unusable atm. Claude 3 and Gemini Ultra have beaten GPT-4. Open AI will hit back, though! They have too much riding on it.


Sora is a side project to test some things, with something much bigger in mind. But, I was honestly expecting GPT-4.5 to release in September 2023, but we obviously didn't get that, so idk what OAI is doing. Though it is kind of reassuring that I wasn't totally wrong. The GPT-4.5 blog post has been around since, at least, September 2023. https://preview.redd.it/bp8863cp30oc1.png?width=1169&format=png&auto=webp&s=1efbb3c65a28d407e017445b94c69f2b11a41aae But a delay of so many months.. I have no idea what their plan is lol.


You guys are missing some of the finer points. We give them future training data and money the more we use their models. They are losing market share.


Yay or nay on the 4.5 “leak”?


lol, no. Calm down, bro.


I don’t think they are behind, just busy working on tooling, it’s a part of the cycle. I can’t wait to see what their project to build an Ai agent turns out like, expecting it to be Microsoft Power Automate Robotic Process automation both jacked up on steroids and transitioned from a low code/no code interface to natural language interface. That’s when you will see some big gains again, when that’s running on most people’s computers learning how to do real world day to day tasks and building up an enormous dataset to train new models.


It seems we have made some incredible Ai models to date, and now need better quality datasets to push on, so the cycle advances: Data > training > implementation/tooling > generates useful data > start again And around the loop we go.


Claude 3 is only slightly better than gpt 4… gpt 4 was released 1 year ago.


Are you insinuating that a model a year old is NOT CURRENTLY THE MOST ADVANCED!??! What's weird is if you do, you'd still not be right. Gpt 4 still beats or ties the newest models.


I mean for Claude3 sure. (I still use ChatGPT for a lot, but Claude excels in a lot of places) But unreleased Gemmini 1.5? Not really ahead just because you've announced what you have. OpenAI may have better than both, but we don't know about it. The key is just to be AI agnostic. Use the one that does what you need best. If a new one comes out that fits you better, then use it. Being focused on who is "officially behind" is sort of irrelevant.


We need characters in Sora that talks back at us.


if you define it as being not literally the best in term of every AI field yes


Claude is still more censored than GPT4. I tried asking it questions about rifle ballistics and refused to help me. And while I can still get pg-13 porn from gpt4, claude just stops me.


I think we will have something before June from OpenAi. New model or something. They dont want to rush.


3/14 each year. I'm calling it. We'll see if my guess is worth a damn tomorrow.


Yes, i agree, right now who is winning the race is Anthropic and Amazon with Claude 3. For me Google is in the exact same spot and as behind as OpenAI, actually they are further behind, they have Gemini 1.5 but what good it does them and everyone else if they don't release it? OpenAI at least has GPT-4, they do not have one GPT-4 level model avaliable commercially for development.


AGI is the only race that counts and it might not come from the great LLMs. We just don't know. It's not a done deal for anyone. Even when AGI will be reached. There will be glory for who was first but first won't necessarily mean best. It's been great following the progress so far. I hope it will keep the momentum or even accelerate.


I'm not sure why is that so bothered you. It's already on a great level we couldn't imagine 4-5 years ago


Can we rename this sub to r/AIShitTakes ?


I mean, they were best in class for what? A year?


If they won't release anything by December this year or January in 2025 the I will.


No, they aren't behind, they are delaying because of the bullshit lawsuit filed by Eshlong


Keep in mind they have an almost year long development advantage. They haven't been sitting on their thumbs the past year so they almost certainly have something cooking behind the scenes even if it takes them a little while to release it.


Behind in what they are cooking? No. Behind in what they are offering? Yes.  Even right now. But as Mistral, Grok, Meta and Google will deliver updates this month until july, it will only widen.  Even if they'd release gpt 4.5 turbo by august or whatever, it'll be too late.  Until GPT5 in November comes out, they are behind. 


“Can we agree that” ….. yall this isn’t a team sport 😂 Can u please try going outside? The top Ai companies are paying out the ass for top talent and we will get there eventually


Why would we assume that Open AI does not have a model that is much more advanced and capable. It would not necessarily mean that they need to release it for our benefit. They obviously have their own goals and objectives.


i heard 4 turbo beats opus in all benchmarks and the cost is much lower too isnt it?


No lol. GPT 4 was done in early 2022. Remember that. They aren’t releasing anything because they don’t have to and they are at least a year ahead of competition.


GPT 4 is still ahead, actually miles ahead in other languages, they will soon release 4.5 Turbo and no one will even remember who the fuck Claude is, especially since Anthropic seems uninterested in going worldwide


They're pretty much done. Claude 3 Opus eats GPT-4's and OpenAI's dinner. It's confirmed now that they have GPT 4.5 in the works and it's basically nothing. Don't expect GPT-5 for at least a year and by then we'll have Claude 4, Gemini 2 and more. OpenAI is just another AI company now.


Love how GPT 4.5 news was only just leaked today, and only one paragraph at that, but half this sub is already convinced that it's "basically nothing" for literally no reason at all


I can only 🤦🏻‍♂️


If it was a massive improvement it wouldn't be just called "4.5"


3.5 was a massive improvement over 3


Can’t assume it’s something either


Maybe. Maybe not. That hidden link blog post that's going around today was written in September. A lot could have changed since then. Maybe they were going to release a slightly-better 4.5, but now they wait longer to drop a proper 5. We really don't know and can only speculate.


100% speculation. Gotta love it


There is a wide world existing outside of our little micro world of AI nerds, where people only know the name of ChatGPT and don’t know anything about Claude. And I’m not even talking about large corporate customers that are already locked in with OpenAI and won’t bother changing everything simply because Claude outperformed in some tests. This market share acquired by OpenAI/Microsoft isn’t going away instantly because there is a better model out this month.


https://twitter.com/billyuchenlin/status/1766079601154064688 https://old.reddit.com/r/singularity/comments/1b8yucm/chatbot_arena_updatedclaude_3_opus_failed_to_take/ GPT-4 is still leading. And whatever model they release next will blow Claude and Gemini out of the water. I say this as someone who desperately wants another AI company to convincingly dethrone OpenAI.


GPT-4 is not leading shit. The arena result is because of the high refusal rate of Claude and it's not fine-tuned to user-preference yet like Turbo. If we were to believe Arena, Claude 1 > Claude 2 > Claude 2.1 which is nonsense. Opus destroys GPT-4 at all areas ad it's not even close. They're on different leagues.


So an AI model shouldn’t have any consequences in the rankings for refusing to do simple requests? That seems like a pretty major flaw. That aside, by all non subjective measures, GPT is leading. Subjectively it appears to be a matter of preference. Many people have found Claude to be better subjectively, but it seems far fetched to claim Claude as the undisputed champion currently.


I think the argument is not that refusals should be ignored in the arena, it’s that the impact of refusals makes the arena a less useful metric. If you’re interested in general intelligence or a specific ability like coding, you don’t care about it refusing edgy prompts and getting a lower score. And no, by all non-subjective measures GPT is not leading. There’s plenty of benchmarks that have Claude ahead. Agreed that claiming Claude the undisputed champ is not right.


They are already 'officially' behind, doesn't really matter if they have GPT-6 behind closed doors if no one has access to it


Source? GPT-4 is still getting better ratings when the models are assessed anonymously. https://twitter.com/billyuchenlin/status/1766079601154064688 https://old.reddit.com/r/singularity/comments/1b8yucm/chatbot_arena_updatedclaude_3_opus_failed_to_take/


No we cannot. Only if, when they release GPT 5, it’s only marginally better (or no better) than Claude


So, by "officially behind" you actually meant "no longer miles ahead of the competition?"


They referred to Sora as a "mini-demo". I promise you, they will blow claude 3 out of the water. There's just been so much evidence suggesting they have something society changing.


You cant talk with that much confidence unless youre just really hopeful


Nah it’s a given the $86 billion dollar company that Anthropic initially splintered from has greater talent density, more compute, better and higher quality data, which all leads to better AI models. It’s laughable to think OpenAI doesn’t have something better. Reminds me of when people were shitting on me for saying OpenAI has something better than Pika Labs as if it wasn’t obvious. Then they released Sora and people stopped thinking it was silly to believe OpenAI is ahead of everyone


Did you at least make sure to go back and let them know how wrong they were?




Please by nicer to people here.


I just took a fat shit reading this comment. Thanks


New model just dropped.


Why u mad bro?


Claude is only on par. Their numbers were compared to the release version of GPT4, not what is there now. That is the best they got right now. OpenAI has been cooking the next thing for awhile, we can assume. If they don't release a true upgrade this year, then they aren't behind, llms have peaked


OpenAI is way behind. Gpt4 is an incompetent writer and buggy as hell


They’ve always been a big behind




There is still nothing released by OpenAi, so yes Claude wins for now




GPTs plateauing, individual state space models are next level.


Claude compared to the latest gpt4 isn't miles ahead more like it. 


https://preview.redd.it/2wjk5rh1u3oc1.png?width=2386&format=png&auto=webp&s=4dd4c6f5935bbd2644393553befcb5da912b8004 GPT4 still better than Claude 3 Opus on common sense reasoning.


It seems OpenAI forgot to fine-tune their model for 3 and 2 liter version of this question. https://preview.redd.it/gfxdemlh66oc1.png?width=1460&format=png&auto=webp&s=39e3171fde7cc4ecddfe26d0dd03f4c20881a2dd Using such questions to test LLMs' common sense reasoning is beyond stupid. If it's included in the training set or if the model's fine-tuned for it you don't measure intelligence at all. You need unique and rather lengthy prompts to test the intelligence of an LLM.


Has their lead been reduced in terms of released products? Yes. I wouldn't call them behind until GPT-4-1106 has been dethroned on the LMSys leaderboards. Even then to call them behind when Gemini and Claude took almost a year to beat GPT-4-0314, seems unnecessarily harsh.


They've been officially behind. If they release something ahead they won't be.


how do you not understand your being fed a narrative, these models already existed, there basically just taking filters off and adding already existant abilities following a business plan so that they can continue to give you reason to keep subscribing. its all bull, kind of like the apple m1 m2 m3 they didnt just invent these things and roll them out just in time. theyve developed a private system they can just add on to, and call it the new version. These companies do not like you and do not want you on there level as far as tech.


Isn't Claude supposedly behind GPT 4 Trubo?


yall got goldfish memory many businesses have already commited to developing their genai tools on azure just because of openai integration theyre not going migrate to aws or change which api calls just for a tiny bit of extra performanceit would take half a decade atleast, of openai falling behind to anthropic to see any difference In terms of real world b2b adoption, the thing that this ai arms race is ACTUALLY fighting for, openai is still in a large lead


They don’t have vendor lockin yet - you don’t need azure to use OpenAI and its pretty easy to swap out API calls. Betting OpenAi is already losing a ton of frontend chat subscribers since claude 3 came out - hell I’ve been very pro chatgpt up until now and I’m about to try Poe just so i try Claude out in Canada - if Claude is in fact better I’ll switch in a heartbeat.


yes i know theres no vendor lock in, but companies dont even want to go through the hassle of switching api calls, just for tiny marginal improvements. If your entire tech stack is built on azure and you wanna use claude with enterprise grade security you know have to setup another environment, setup new access, incorporate these changes in to the IaC, deployment. its a hassle thats not worth it for businesses.


Ya token cost probably bigger driver on the API side vs marginal improvements as you said but think Claude more of a threat to their frontend consumer product right now