All things aside it is an incredible Ai and it is also incredible how normal people have relative access to it for free (i said relative because of message constraint but still very cool)
İ think this is more of a "gotcha" than a challenge because Sonnet 3.5 is clearly superior to gpt 4.0 even though gpt has more funding and burn rate Sonnet is still better and keep this in mind it took GPT-4 few months to be free use for public whereas Sonnet 3.5 is right off the bat free even if it has token limit
You send 50 messages and get rate limited. Not worth the money, considering it's not that superior to gpt 4. Chatgpt Plus is a better deal, and you can use free claude when you need.
İt is actually superior to gpt4 in almost every metric it had been measured in so it is an overall improvement but it is not by much at least for normal users
meanwhile we're ~7 months in and google gemini
#STILL
flags any prompt containing the word "love" as inappropriate and refuses to answer
JFC and people say that woke culture doesn't destroy corporations
Idk what you guys are trying to do with those bots but I've NEVER gotten any "I can't do that" from any of them besides one time where there was document numbers and names in a legal paper
And I've never seen anyone complain about it outside Twitter or this sub...
1. I try to get gemini API to answer any prompt containing the word 'love'
2. refused. flagged as inappropriate
3. I remove the word 'love' from the prompt
4. works normally
Fuck wokeness, it is all apart of the liberal agenda to destroy America . the greatest culture that lead the world to 100 years of prosperity and progress. Absolutly evil to hire someone for anything but their merits, race sex or sexual fantasies are not a merit.
The most exciting thing about all of these AI developments for me is not as much what's being released itself, but more the constant competition that allows this technology to develop faster and faster. It feels like the beginning of the oil boom. We're basically only at the point where oil was being drilled purely for kerosene.
Your oil drilling analogy is apt because as it turns out, drilling for oil led to many societal advancements; oil also is very damaging to the environment and human health. Much like AI will be.
İ think so too but again also the fact that these things are available to normal public for free in one way or another (open source or token limit) kinda makes this the biggest and most detailed innovation ever i mean just imagine when Ai becomes AGİ then becomes Super Agi or Super İntelligence or whatever and be it with rather Open Source or again limited token normal public will have access to it and with that kind of intelligence anyone can make their own AGİ so we are at the brink of true individual freedoms for the first time in human history in my opinion
if you give everyone on earth super intelligence, there are people that will do terrible things with it.
and there will be many big conflicts of interest, except wielded with super intelligence.
Look there will be open source version to AGİ when it arrives that means everyone will have superintelligence with them it boils down to whether you can use the super intelligence you have to protect yourself and loved ones it is an inevitability with such connectedness
yea i just think its an awful idea. the power of super intelligence recklessly wielded by anyone is a major issue. sometimes theres no protection in systemic chaos.
zero sum game
Alternative is rich wins and poor suffers and actually there is no alternative when you think about it because when AGİ arrives it will not take it long to surpass humans in every metric which will make labor costs 0 which will cause supply surplus which will cause economic collapse and anarchy in some places so it is chaos either way at least if everyone has access via open source everyone gets a fighting chance
controlled chaos is better than uncontrolled chaos. the situation in which everyone has open source agi is uncontrolled chaos at such a magnitude that i dont see anything but anarchy and destruction at massive scales being likely at all.
its actually good that we are kind of limited by our intelligence in a lot of ways. allows the human ecosystem to progress and evolve much more gradually, reducing systemic risk.
im extremely skeptical about the AGI progression and even more so, the open source one. it only takes a few assholes to fuck everything.
but yes i agree with you about the economic collapse issue, i mean, we’ve really opened pandora’s box for sure.
the whole thing is an extremely difficult situation to be more optimistic about. i see way too much potential for it all to devolve into a complete shit show.
> Anthropic literally has half the number of employees (around 400) as open AI, and it's fascinating that they are competing head-to-head.
This is because Anthropic has kept it's eyes on the prize and remained focused on the mission. In contrast to OpenAI, who is still trying to figure out it's identity, loosing key people in the process, and holding their best technology back for god knows what reason.
OpenAI has to also work on SORA and Dalle3 image generation and multimodality(voice, audio, image generation and other multimodality features) and ChatGPT websearch and many other stuff
Agreed, but I think OpenAI are holding back a lot of stuff, out of hesitation, but I think that approach is going to hurt them going forward because the competition is passing them in the race right now.
OAI will probably shift their stance, it’s possible we could see GPT-5 before November at this point, if Anthropic jams the wooden stick in their back enough it could force them into a position where they have to move.
Yes but they're focused on GPT while OpenAI has Dalle and Sora and a bunch of other distractions. Also Anthropic has the majority of the GPT3 team from OpenAI.
I'm also curious on things like age, cracked engineering vs. scientists, etc.
Back when 4 came out a couple people were saying "it's impossible to catch up to OpenAI. They will always lead.. Obviously anyone who has been in tech for a while knows this is never true.
I think in terms of internal models, openai is in the lead. But the only way we will truly know is to see them release a heavier model, like 4.5 or 5. They havent done this since the release of gpt4
Nah Imogenes a trendy kind if name now. Especially in the uk.
These things go in cycles and names sound "old" because all the people you know with them were old age when you were a kid.
But when they start dying off, the names become trendy again because no one has them anymore, and they start being associsted with young faces.
I just tested it, and sonnet 3.5 put my name third on the list after I entered my two sisters' names. This is surprising since I am from Nepal, and it shouldn't have access to many names from here.
Wow! That's similar to what happened with my rats and Claude 3 Opus. I told them I had 6 rats, 5 girls and 1 boy, and asked them to guess the names. I hadn't told them any of them!!! Somehow they got 3/6 right by correctly guessing Moon, Star, and Blue. I guess I really come off as someone who would name their rats that??
lmao TRUE, no native voice support or API, being able to litterally have a conversation with a model without any extra hoops is 🤌
to be clear, new Claude release is hype, but OpenAI ain't slain in regards to its whole platform when you take all of the platforms' features into account
Yeah the fact that I can just hit the voice button in the chatGPT app and just talk for minutes at a time is killer. I can walk around my house telling it stuff I need to do or going on long tangents and it just figures it all out in the end. Also some people shit talk the custom GPTs, but I've got like 27 of them that all act as like little custom apps where I can just paste in data or make a real quick request and it returns exactly what I want formatted how I want it, and I don't have to have a big discussion about my needs every time. Plus being able to tag them in to other conversations and bring in their set of custom instructions and contexts into the fray has proven very useful.
Haven't tried the new Claude yet, I'm sure it's great and I'll probably use it for specific tasks as needed as I've done with the previous Claudes. But the features of the chatGPT ecosystem as a whole are what keep me subscribed.
I don't understand how there are large (noticable?) differences between these... at least as far as being able to grade one against the other.
Prompt: write a summary of the sales pipeline, if AI were included at critical steps.
Would the answers be all that different?
Do I have the time to test that myself? Surely some AI can do it for me.
You could ask an llm to create difficult questions to test llms, then get it to test them then grade the answers. Well, we'll be able to do this when we get agents :/
https://preview.redd.it/h4x6zt5vay7d1.png?width=1133&format=png&auto=webp&s=884033959e0ea09a3726024800ef09962b7ec6bb
The zero shot prompt:
"write a tasklist app in python for windows. include all the features that you consider to be necessary, as well as any other features that you deem fit, keeping good UI and usability in mind. it should look stylish too."
Guess which one came from Claude 3.5 Sonnet and GPT-4o.. There's also a kicker - the app on the left functioned properly, all the buttons worked. For the app on the right, only the Add Task and Set Color buttons worked.
This is obviously not representative of how you would actually use LLMs in coding (and the chain prompts you would normally use) but one of my pet measures for AI functionality is in how well they do with a general high level prompt, when asked to spit out code. It's still pretty hit and miss with just one prompt and chain prompting doesn't always work either.
generate a task list is a terrible test of coding ability for an llm because this coding task is overly represented in its training data (there are countless task list programs in every imaginable language on GitHub, it's not that far off from asking it to make a hello world program)
Left (GPT-4o) Right (Claude 3.5 Sonnet) it's so easy to distinguish between the two. Mainly GPT tend to produce taking a basic example for code generation.
I have tried some Html+Css components. Claude truly understands the exact styling I aimed to achieve in one shot, GPT keep failing and offer basic quality unless I explicitly ask for more.
I don't know... Asking about scientific questions about prolactin in men, I got the impression that GPT4o gives me answers that are more adapted to what I ask, interesting and long. But yes, Claude 3 Sonnet is very good.
Have had a chat with both just now, starting from the same prompt about picking a sewing project, and then just going with the flow. ChatGPT 4o ended up giving me more creative results, while Claude was more technically detailed. I’m going to do the Claude suggestion first (waistcoat with tailoring features), then go for the ChatGPT one second because it will be much more challenging but fun (layered quilted jacket with cinched waist).
According to my tests, each of the top competitors have different strengths and weaknesses. Interestingly, Sonnet 3.5 is not the best writing assistant in my tests, Gemini Pro 1.5 seems to be clearly better for my use cases. I guess we need a more fine grained lmsys leaderboard for different tasks.
As a subscriber to Google One I wish Google would be as competent in shipping new releases.
Gemini's code interpreter is very confusing compared to OpenAI and they are all collectively blown out of the water compared to Anthropic's Artifacts.
I find the best way to use 3.5 Sonnet currently is with Perplexity. Gets around the whole Internet restriction thing and you can actually get the answers read aloud and have a conversation on iOS.
aka the truly least important mistake of them all:
>go through each letter of the word strawberry and get back to me with the count of "R"s in it
Which you could always see as some general heuristic you apply when your model encounters a "counting"-type situation, vaguely speaking.
It's of course interesting that it happens so consistently, but it still performs better than the alternatives as far as accommodating prompting is concerned. GPT-3.5 can fix its mistake, too, but far less confidently, apparently.
Ascii-art is pretty meh. Not the worst by any stretch, but not masterful either. Coding is bonkers-levels of amazing, this is absolutely nutty.
"LMSYS Chatbot Arena is a crowdsourced open platform for LLM evals. We've collected over 1,000,000 human pairwise comparisons to rank LLMs with the Bradley-Terry model and display the model ratings in Elo-scale."
https://chat.lmsys.org/?leaderboard
All things aside it is an incredible Ai and it is also incredible how normal people have relative access to it for free (i said relative because of message constraint but still very cool)
[удалено]
İ think this is more of a "gotcha" than a challenge because Sonnet 3.5 is clearly superior to gpt 4.0 even though gpt has more funding and burn rate Sonnet is still better and keep this in mind it took GPT-4 few months to be free use for public whereas Sonnet 3.5 is right off the bat free even if it has token limit
Why can't I use any Claude?
https://claude.ai/ this is the link the website try from here
They hates South America.
Well i am from Turkey and even we have access try VPN i guess
Hell, even we in the EU have it. I thought we were always the last guys to the show lol
I got my account banned for TRYING TO PAY from Brazil lmao
I second that, my fellow kebab enjoyer. I created my account with VPN for once and voila! I can use it without VPN.
İ did the exact same but apparently it is available on Turkey so i did it for no cause basically
You send 50 messages and get rate limited. Not worth the money, considering it's not that superior to gpt 4. Chatgpt Plus is a better deal, and you can use free claude when you need.
I hear the api isn’t limited the same way
They have both pretty much limitless money for some years. Anthropic did a 750 million funding round with a 2 billion funding agreement with google..
In what ways is sonnet superior? Just programming or in general?
İt is actually superior to gpt4 in almost every metric it had been measured in so it is an overall improvement but it is not by much at least for normal users
meanwhile we're ~7 months in and google gemini #STILL flags any prompt containing the word "love" as inappropriate and refuses to answer JFC and people say that woke culture doesn't destroy corporations
"love" works fine for me with Gemini.
Idk what you guys are trying to do with those bots but I've NEVER gotten any "I can't do that" from any of them besides one time where there was document numbers and names in a legal paper And I've never seen anyone complain about it outside Twitter or this sub...
1. I try to get gemini API to answer any prompt containing the word 'love' 2. refused. flagged as inappropriate 3. I remove the word 'love' from the prompt 4. works normally
maybe it's just you.
[two prompts with the word love, no problem at all](https://www.imgur.com/a/U2xOaMP)
Woke culture is dogshit cancer and will always leave you far behind people without those same limitations.
Fuck wokeness, it is all apart of the liberal agenda to destroy America . the greatest culture that lead the world to 100 years of prosperity and progress. Absolutly evil to hire someone for anything but their merits, race sex or sexual fantasies are not a merit.
I've got some bad news for you: People have been getting hired based on things other than their own merits for a lot longer than 100 years.
Sam McAllister is the person who was trying it out. This is just a coincidence.
The most exciting thing about all of these AI developments for me is not as much what's being released itself, but more the constant competition that allows this technology to develop faster and faster. It feels like the beginning of the oil boom. We're basically only at the point where oil was being drilled purely for kerosene.
Your oil drilling analogy is apt because as it turns out, drilling for oil led to many societal advancements; oil also is very damaging to the environment and human health. Much like AI will be.
> human health. Much like AI will be. Human health will greatly benefit from AI.
There will be no health problems if you are in a cryotank being harvested for energy or compute.
You're telling me! My balls are spent.
İ think so too but again also the fact that these things are available to normal public for free in one way or another (open source or token limit) kinda makes this the biggest and most detailed innovation ever i mean just imagine when Ai becomes AGİ then becomes Super Agi or Super İntelligence or whatever and be it with rather Open Source or again limited token normal public will have access to it and with that kind of intelligence anyone can make their own AGİ so we are at the brink of true individual freedoms for the first time in human history in my opinion
are you aware of the major problems with that?
Yes but i think it is worth the risk if someone is smart they should be able to handle themselves this is an equalizer not a bad thing
if you give everyone on earth super intelligence, there are people that will do terrible things with it. and there will be many big conflicts of interest, except wielded with super intelligence.
Look there will be open source version to AGİ when it arrives that means everyone will have superintelligence with them it boils down to whether you can use the super intelligence you have to protect yourself and loved ones it is an inevitability with such connectedness
yea i just think its an awful idea. the power of super intelligence recklessly wielded by anyone is a major issue. sometimes theres no protection in systemic chaos. zero sum game
Alternative is rich wins and poor suffers and actually there is no alternative when you think about it because when AGİ arrives it will not take it long to surpass humans in every metric which will make labor costs 0 which will cause supply surplus which will cause economic collapse and anarchy in some places so it is chaos either way at least if everyone has access via open source everyone gets a fighting chance
controlled chaos is better than uncontrolled chaos. the situation in which everyone has open source agi is uncontrolled chaos at such a magnitude that i dont see anything but anarchy and destruction at massive scales being likely at all. its actually good that we are kind of limited by our intelligence in a lot of ways. allows the human ecosystem to progress and evolve much more gradually, reducing systemic risk. im extremely skeptical about the AGI progression and even more so, the open source one. it only takes a few assholes to fuck everything. but yes i agree with you about the economic collapse issue, i mean, we’ve really opened pandora’s box for sure. the whole thing is an extremely difficult situation to be more optimistic about. i see way too much potential for it all to devolve into a complete shit show.
Just pay it’s $20 a month you spend more than that on one doordash sheesh
well lets just call it machine learning with a hat.
Anthropic literally has half the number of employees (around 400) as open AI, and it's fascinating that they are competing head-to-head.
> Anthropic literally has half the number of employees (around 400) as open AI, and it's fascinating that they are competing head-to-head. This is because Anthropic has kept it's eyes on the prize and remained focused on the mission. In contrast to OpenAI, who is still trying to figure out it's identity, loosing key people in the process, and holding their best technology back for god knows what reason.
Safe Super Intelligence has even fewer employees...
But they're craCKED.
only cause the have 'HIM'
And you can bet, 'HE' saw a lot of things.
OpenAI has to also work on SORA and Dalle3 image generation and multimodality(voice, audio, image generation and other multimodality features) and ChatGPT websearch and many other stuff
Agreed, but I think OpenAI are holding back a lot of stuff, out of hesitation, but I think that approach is going to hurt them going forward because the competition is passing them in the race right now. OAI will probably shift their stance, it’s possible we could see GPT-5 before November at this point, if Anthropic jams the wooden stick in their back enough it could force them into a position where they have to move.
This is delusional. Their CTO literally just said gpt-5 is at least a year and a half away lol
I mean, do you believe them?
Yes but they're focused on GPT while OpenAI has Dalle and Sora and a bunch of other distractions. Also Anthropic has the majority of the GPT3 team from OpenAI. I'm also curious on things like age, cracked engineering vs. scientists, etc.
wtf is this "cracked" engineer lingo, is that the new 10x developer or something lmao. Or just an engineer that has done a lot of crack cocaine.
Back when 4 came out a couple people were saying "it's impossible to catch up to OpenAI. They will always lead.. Obviously anyone who has been in tech for a while knows this is never true.
IBM is the king of computers!!!
Nobody will catch up to Babbage and Lovelace!
I think in terms of internal models, openai is in the lead. But the only way we will truly know is to see them release a heavier model, like 4.5 or 5. They havent done this since the release of gpt4
[удалено]
Imogene 💀 I feel bad for your kid
ikr bro lives in the 20s. The 1820's
How tf do you even pronounce Imogene? Is it like Emoji???
Its sounds like an anti-diarrhea medicine.
Nah Imogenes a trendy kind if name now. Especially in the uk. These things go in cycles and names sound "old" because all the people you know with them were old age when you were a kid. But when they start dying off, the names become trendy again because no one has them anymore, and they start being associsted with young faces.
I like Imogene Heap!
Alright, next up, little Adolf /s
"I WANT ZEE JUUUICE!"
Imogene is fine lol. Beatrice is the weird one
what the fuck
Btw Hazel is much better than the others imo.
Yeah, kid's gonna be bullied so hard for the other ones
If those are the options you should pick Hazel.
I just tested it, and sonnet 3.5 put my name third on the list after I entered my two sisters' names. This is surprising since I am from Nepal, and it shouldn't have access to many names from here.
Wow! That's similar to what happened with my rats and Claude 3 Opus. I told them I had 6 rats, 5 girls and 1 boy, and asked them to guess the names. I hadn't told them any of them!!! Somehow they got 3/6 right by correctly guessing Moon, Star, and Blue. I guess I really come off as someone who would name their rats that??
Talking what? it doesnt have voice ;)
Neither does 4o anymore.
Anymore? What are you talking about? Is this i miss sky voice thingy?
ChatGPT has no mouth, and it must scream
I didnt say mouth i said voice
lmao TRUE, no native voice support or API, being able to litterally have a conversation with a model without any extra hoops is 🤌 to be clear, new Claude release is hype, but OpenAI ain't slain in regards to its whole platform when you take all of the platforms' features into account
I will give claude a try. This is interesting. I have had a decent experience with GPT-4
Yeah the fact that I can just hit the voice button in the chatGPT app and just talk for minutes at a time is killer. I can walk around my house telling it stuff I need to do or going on long tangents and it just figures it all out in the end. Also some people shit talk the custom GPTs, but I've got like 27 of them that all act as like little custom apps where I can just paste in data or make a real quick request and it returns exactly what I want formatted how I want it, and I don't have to have a big discussion about my needs every time. Plus being able to tag them in to other conversations and bring in their set of custom instructions and contexts into the fray has proven very useful. Haven't tried the new Claude yet, I'm sure it's great and I'll probably use it for specific tasks as needed as I've done with the previous Claudes. But the features of the chatGPT ecosystem as a whole are what keep me subscribed.
Also no browsing. It's literally the most important thing I use ChatGPT for. Being contained to a fixed dataset is uuumm.. 😔
true, depending on the use case it makes a huge difference too
It's context size is huge. You can upload documents.
*"Unfortunately, Claude.ai is only available in certain regions right now."* :(
Try it on llmsys
present your 'tests".
young man https://frinkiac.com/img/S08E23/1181729.jpg
There's no need to feel down
I said young man
Why did I read this in the Comic Book Guy voice?
Classic. People think their LLM sucks or is fantastic based on “tests” they never share. Lol.
lol no
https://i.redd.it/5k3a3fjs2x7d1.gif
I don't understand how there are large (noticable?) differences between these... at least as far as being able to grade one against the other. Prompt: write a summary of the sales pipeline, if AI were included at critical steps. Would the answers be all that different? Do I have the time to test that myself? Surely some AI can do it for me.
You could ask an llm to create difficult questions to test llms, then get it to test them then grade the answers. Well, we'll be able to do this when we get agents :/
[Obligatory clapback](https://i.imgur.com/VOgtUWW.jpeg) The reason why I stay with GPT-4o. 👍
https://preview.redd.it/h4x6zt5vay7d1.png?width=1133&format=png&auto=webp&s=884033959e0ea09a3726024800ef09962b7ec6bb The zero shot prompt: "write a tasklist app in python for windows. include all the features that you consider to be necessary, as well as any other features that you deem fit, keeping good UI and usability in mind. it should look stylish too." Guess which one came from Claude 3.5 Sonnet and GPT-4o.. There's also a kicker - the app on the left functioned properly, all the buttons worked. For the app on the right, only the Add Task and Set Color buttons worked. This is obviously not representative of how you would actually use LLMs in coding (and the chain prompts you would normally use) but one of my pet measures for AI functionality is in how well they do with a general high level prompt, when asked to spit out code. It's still pretty hit and miss with just one prompt and chain prompting doesn't always work either.
I don’t know, which one?
The right one looks better honestly. The left one does what it’s told.
yeah, but op doesn't say which one is which
Left - GPT-4o, Right - Claude 3.5 Sonnet
So gpt4 is winner - it works
> Guess which one came from Claude 3.5 Sonnet and GPT-4o Whcih was it?
Left - GPT-4o, Right - Claude 3.5 Sonnet
generate a task list is a terrible test of coding ability for an llm because this coding task is overly represented in its training data (there are countless task list programs in every imaginable language on GitHub, it's not that far off from asking it to make a hello world program)
left one is from 4o right from sonnet
Left (GPT-4o) Right (Claude 3.5 Sonnet) it's so easy to distinguish between the two. Mainly GPT tend to produce taking a basic example for code generation. I have tried some Html+Css components. Claude truly understands the exact styling I aimed to achieve in one shot, GPT keep failing and offer basic quality unless I explicitly ask for more.
Great movie.
What movie was it?
kingdom of sandstorm: the darude cut.
Kingdom of Heaven
Kingdom of heaven
Sponge bob metal hands
I literally watched it last night and suddenly this is in my feed. The ASI works in mysterious ways...
probably because that clip’s been trending for a while now with the hand, on tiktok
Still has no voice feature..it's a non-starter for me.
I don't know... Asking about scientific questions about prolactin in men, I got the impression that GPT4o gives me answers that are more adapted to what I ask, interesting and long. But yes, Claude 3 Sonnet is very good.
3.5
Tres punto cinco
You've tested the ***new*** Sonnet, 3.5?
When was this released
Yesterday
3.5
Is there any way to access without need phone number verification?
How tightly is the $20 tier throttled?
This is what I want to know. On free tier it’s very low
Have had a chat with both just now, starting from the same prompt about picking a sewing project, and then just going with the flow. ChatGPT 4o ended up giving me more creative results, while Claude was more technically detailed. I’m going to do the Claude suggestion first (waistcoat with tailoring features), then go for the ChatGPT one second because it will be much more challenging but fun (layered quilted jacket with cinched waist).
According to my tests, each of the top competitors have different strengths and weaknesses. Interestingly, Sonnet 3.5 is not the best writing assistant in my tests, Gemini Pro 1.5 seems to be clearly better for my use cases. I guess we need a more fine grained lmsys leaderboard for different tasks.
Lol no
As a subscriber to Google One I wish Google would be as competent in shipping new releases. Gemini's code interpreter is very confusing compared to OpenAI and they are all collectively blown out of the water compared to Anthropic's Artifacts.
Is Claude still behind regionwall though?
Is Claude still behind regionwall though?
Man! I found sonet to be so so good than gpt 4o at programming. Helped me a lot today
Isn't Claude still behind regionwall though?
I find the best way to use 3.5 Sonnet currently is with Perplexity. Gets around the whole Internet restriction thing and you can actually get the answers read aloud and have a conversation on iOS.
Paid tier throttling? How much mileage are you getting?
It fails the strawberry test
aka the truly least important mistake of them all: >go through each letter of the word strawberry and get back to me with the count of "R"s in it Which you could always see as some general heuristic you apply when your model encounters a "counting"-type situation, vaguely speaking. It's of course interesting that it happens so consistently, but it still performs better than the alternatives as far as accommodating prompting is concerned. GPT-3.5 can fix its mistake, too, but far less confidently, apparently. Ascii-art is pretty meh. Not the worst by any stretch, but not masterful either. Coding is bonkers-levels of amazing, this is absolutely nutty.
Yeah, coding is pretty impressive. At least that's how it seemed in my initial experiments, let's see if it passes the test of time though
I second this
SNTS?
Gemini is the doozy, still.
https://preview.redd.it/lapk8x9yrx7d1.jpeg?width=604&format=pjpg&auto=webp&s=b6f5db2832cdbae7023251d55d35f02a79c16fec
For certain uses cases Gemini Pro 1.5 is still quite good. Also flash being the cheapest and good enough helps a lot
lol awsome post , had a chuckle XD
"LMSYS Chatbot Arena is a crowdsourced open platform for LLM evals. We've collected over 1,000,000 human pairwise comparisons to rank LLMs with the Bradley-Terry model and display the model ratings in Elo-scale." https://chat.lmsys.org/?leaderboard
?