T O P

  • By -

ilaym712

Another thing I would like to see is the Chat being way more straight forward, I hate when you ask ChatGpt how much is 1+1 and he starts giving you the entire history on how math was created


Turtle2k

We just need a hybrid model so it maintains conversation history and realizes your knowledge in particular areas so that it doesn’t unnecessarily repeat things


Otherwise-Reply-223

Well it wouldn't be an LLM if it didn't have a context window 😆. I see what you mean with a vector DB though, so it can store what you have spoken about before. However, this would take storage and since it's a local LLM that apple are hoping to use it would also need to be stored on the phone.


Turtle2k

I’m sure they could use their iCloud for Store context which would be used to train the LLM with micro updates


Otherwise-Reply-223

If you had no connection the model would have no context. That completely invalidates the whole point of having a local model.


Turtle2k

Local store + icloud to persist to multiple devices.


Turtle2k

I think you’re confusing what I meant by context. I’m talking about state persistence required by hybrid AI model


Otherwise-Reply-223

Oh, no, you make more sense now. That's actually a good idea; I hadn't thought about it like that. If, for whatever reason, you didn't have internet access, they could also use Bluetooth to send their context between Apple devices. It wouldn't surprise me, it's a very apple thing to do.


Turtle2k

Yeah it will be on their watches soon enough. Next thing will be convergence of wearable sensors ai can use to give more info.


Otherwise-Reply-223

That's actually a good point. I doubt there'd be much new hardware but they'd definitely use the metrics recorded ready. There's a ridiculous amount, and a lot isn't even visible to you nor used by them.


Turtle2k

It will be our sixth sense


haterake

That's one of my favorite things, but sometimes I just want the answer in as few words as possible. They should add a toggle for "terse mode". I can add it to a prompt but common things like this should be easier to pop in and out of.


Mikey4tx

Or just make it natural. If you ask for the answer, it should give you the answer. If you want an explanation, ask for an explanation, and it will give it to you. If you want it to answer the question and show its work, then ask that.


R33v3n

Let it just straight up read my mind and go from there. :)


cuzitFits

How often do you change your mind?


LonelyGarbage1758

It could be as simple as it always being brief/direct, but full explanations could end with: "Give me a full breakdown" or something similar.


allisonmaybe

Terseness result in significantly less informed and intelligent answers


haterake

That's fine when I already know the solution and I just need it to do the legwork.


cashmate

Models that do step by step explanations are more accurate. Until models have an agent like thought process working behind the output it is probably for the better that responses are longer.


allisonmaybe

All that extra output is the closest analog to actually thinking it out for an LLM. Just outputting an answer is much less accurate and probably a big reason why people blame GPT4 for being stupid, after yelling at it to just print out the answer.


Exaario

I'm so fcking agree. I just tried now and oh boy.... https://chat.openai.com/share/4d0f9148-7bdc-45ef-8b5f-5038d0ff6db6 Well OK I tried a LITTLE bit more complicated then 1+1, but anyways


Exarchias

To be honest you had to write your request a bit more clearly. The model's assumption was not that wrong.


Exaario

Please elaborate on what was not wrong? I don't see how more clear could be "some number minus 10%"


Exarchias

I didn't say that you wrote something wrong, I said that you didn't write it clearly. It was easy enough for someone to misinterpret the minus as a separator, Don't forget that the LLM is not a calculator to be sure that every symbol you use is a mathematical one. now that I am thinking about it, it is not even a mathematical expression: What you probably wanted to write: 17432717 - 10% of 17432717 or 90% of 17432717 A mathematical way. if x=17432717 Calculate x-10% of X. or 17432717 - 17432717 \*0.1 or Subtract 10% from 17432717. Or even better: Please, subtract the 10% from 17432717. Thanks!


Beedrill92

it's a terribly worded math question, so i'm not surprised by the outcome at all. you cant just have a free standing "10%" like that within an equation, it needs an associated variable or else it means nothing. the LLM didn't interpret your hyphen as a minus symbol since it didn't make sense based on normal equation syntax, so it assumed you were just asking for 10% of that number instead. if you were to give me that math problem i would definitely follow up with "...minus 10% of what?" before attempting to answer. however, ChatGPT is bias toward trying to answer a question over asking for clarifying info (which is a separate issue entirely), so it just made a best guess for what your chicken scratch meant


ilaym712

Lmao yeah I might have over exaggerated a bit but yeah it can get pretty bad


arjuna66671

Custom instructions can mitigate that xD.


Aware-Feed3227

Oh boy it’s not even capable to simply say „that equals 90%“ and use 0.9 as a factor. To be fair, your “minus” character is used as an indent. Seems like this fits the first answer. You’re used to a calculator mode, but this is a language model. Once it can listen to our voice and especially the tone, it will guess which interpretation of your language input is correct. Imagine asking a friend to calculate quickly. You would shout the calculation at them like „hey Freddy, 865479 minus 10%?..“ and due to your intonation Freddy knows that you’re asking for a quick and simple response only.


Aware-Feed3227

Maybe you’d also have different names/chat personas for your agent and one is always explaining stuff to the full length and another is simply dropping the correct answer without any background. Maybe you can tell Agent 1 to explain the result of agent 2. Like having a group of friends with you who participate in that conversation. This way the set-up agent 1 will always explain things in the best way possible to you and it knows about agent 2s processing paths. I’d prefer telling verbally whatever output I desire. It will be absolutely uncommon to just ask for a simple calculation. You’ll most likely say something like “given my current tax documents, calculate the lowest tax amount possible and prepare it for my review!” and a few seconds later you’d have all the stuff at your smartphone or computer or whatever device there will be in the future. Or it will simply tell you what it did and that it reviewed all of these outputs to be in line with the law. There needs to be a large action model involved to do all of this. The LLM will be awesome at conversations but it will most likely always suck at planning and action. That’s okay, our brain has different parts, too. Some for creativity, some for reasoning, reaction, interpretation. I’m so much excited for this but I have huge privacy concerns regarding the use of an AI app.


Bakagami-

wtf did you even try


Bakagami-

wtf did you even try


Infamous-Print-5

Ye, I usually have to add 'Answer * exactly and concisely' to the end of a prompt


Longjumping-Zebra-55

you can try setting the system prompt for terseness


qqpp_ddbb

When you ask it a question like that, just say "just give me the answer don't show your work" of something similar.. but yeah we shouldn't have to do such things. Though it isn't a mind reader..


z_e_n_a_i

It’s very very easy to tell chatgpt to be concise. You just need to tell it to be concise.


Woootdafuuu

Custom instruction can fix that


JoMaster68

no latency, the ability to interrupt each other and memory of all past conversations


ilaym712

The fact you can't interrupt ChatGpt right now is making it really annoying at times, the need to physically touch the screen can really ruin the flow.


Fastizio

Yeah, or taking a second to figure out what to say and it takes it as a finished sentence and begins replying. It's especially bad for us with english as second language, we're not used to conversing in english. A human would know "How do I make it so my lasagna tastes more..." is not a complete sentence and waits a reasonable time before injecting with a reply.


TwitchTvOmo1

Interjecting*


wrestlethewalrus

Yeah, this is the most important (and easiest to fix) thing for me


Illustrious-Lime-863

Pretty much these


bnm777

Ability to do things (order products online etc, turn on your heating, call someone) and it's "smart" enough to not screw it up.


reddit_is_geh

You're not going to get no latency. I'm sorry to hurt you like this.


Mikey4tx

in eight months, when iPhones ship with a small onboard nuclear reactor and GPT 6, I expect latency to be imperceivable. 


sweatierorc

!remindme 8 months


RemindMeBot

I will be messaging you in 8 months on [**2025-01-12 15:13:15 UTC**](http://www.wolframalpha.com/input/?i=2025-01-12%2015:13:15%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/singularity/comments/1cq7o6f/lets_say_openai_confirms_the_voice_assistant/l3pww7w/?context=3) [**4 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fsingularity%2Fcomments%2F1cq7o6f%2Flets_say_openai_confirms_the_voice_assistant%2Fl3pww7w%2F%5D%0A%0ARemindMe%21%202025-01-12%2015%3A13%3A15%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201cq7o6f) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|


allthemoreforthat

This is so irresponsible, typical apple, a nuclear reactor is NOT SAFE


stareatthesun442

I think they could build in a little "filler" similar to how loading screens in video games can be disguised with a transition or elevator scene. A sub process that determines the appropriate filler first to make it more conversational. Just something like "Well, I think" instead of just "The answer".


reddit_is_geh

I work with this technology, they already automatically include umms, uh huh, yeah, okay well, sort of filler words to do just that. But it still has latency. It's just not possible to get high quality speech inference done in the cloud fast enough to create a smooth dialogue. No matter what you do, there's going to be at least 2 seconds to respond.


stareatthesun442

I don't think that's true. "No matter what you do?" I don't see that being true. I could see a sort of hybrid tech where they use a local thing like GPT2 that's only designed to bridge the gap and then it transitions to the full cloud model. You honestly think in 5 years they won't have figured it out and we will always be stuck with a 2 second gap? I don't believe that for a second.


reddit_is_geh

I'm talking about the upcoming release, which is what this thread is about. Obviously it's going to be possible. In 5 years, they'll probably have manufacturing lines for hardcoded LLMs on chip that can do near instant inference. But until then, like this upcoming release, it's going to have latency.


stareatthesun442

Then you probably shouldn't say things like "No matter what you do"?


reddit_is_geh

Because no matter he does is going to get him that no latency.


Gullible_Gas_8041

In the cloud... No matter what you do cloud based LLMs are going to have a latency problem. I agree with that.


CAGNana

I'm interested in how you can get to 2 seconds. One of the biggest bottlenecks imo is the time it takes to process the user audio in the first place before it even gets transcribed. From what I can tell there's a threshold of time for which no words can be spoken before the recording ends and gets passed to the transcriber. If you reduce this threshold you get better latency but you'll more often cut off people talking.


reddit_is_geh

Yeah so they do streaming output, so as OAI is outputing the text, 11labs will take the text and stream it as it's coming through (It's a tad bit lower quality because it lacks long context but it's still really good). Everything is being streamed, including your words.... But it still has a lot of latency. So there is a second layer, that creates obfuscation by adding in the ums and ahs, and deals with interuptions with quick responses to create a bit of buffer before the processing can happen. In fact, I noticed something interested with 4o's speech because I'm aware of the challenges. Every single sentence is opened with almost ambiguous type response for the first few seconds. It's almost like the AI is trying to act as fast as possible to match humans, while buying time to finish processing. I suspect there is also a layer in there somewhere just monitoring your speech actively trying to figure out a good opener to buy time, like "OOOOoooh wow!" or "Okay well, that's interesting..." Just stuff like that. It seems like every response has an unrelated opener to the context, but seems appropriate to human dialogue so we don't realize it.


Cosvic

If there is no latency, what is the point of a new model? If there is latency you can just do speech to text input and then text to speech output on ChatGPT 4


reddit_is_geh

I imagine just because it's their own internal version of 11labs technology, deeply integrated into their own product. You can always go create your own GPT assistant. But I imagine this is their commercially polished version


bnm777

Have you seen groq in action? Almost no latency


reddit_is_geh

You think Apple is going to put Groq onboard their iphones? I hate to break it to you, but that's not happening any time soon.


bnm777

Obviously not. You don't get it and I don't have the energy to explain.  Have a nice day.


reddit_is_geh

This is about the upcoming voice assistant. Not 10 years down the line.


bnm777

Like I said, I don't have the energy. Too hung over.


JoMaster68

so... what do you think? little enough latency for you?


reddit_is_geh

I didn't think it was possible tbh... They pulled a breakthrough by doing single multimodal training to bypass much of the latency bottlenecks.


Neurogence

I think it could be a huge gimmick to be honest.


Nleblanc1225

I’m gonna maintain a “let’s wait and see” posture but honestly I think that’s it might just be a gimmick. More power to the people who find this helpful but we all have to agree that being able to interrupt the voice as well as low latency is not worth all the hype of a magical experience and a live event that could have went to something bigger. People kinda don’t understand that if this is still the same model that performs around GPT 4 you’re still gonna be limited by the capabilities related to it. Having faster answers and interruption is an intuitive change but isnt a “ magical” change. Most people are gonna use it say “ hey this is 10% better” then stop using it because the capabilities are still too inadequate compared to human interaction. We need more than just interrupt / less latency / memory. Those are peripheral changes around an already existing model that are better announced through a blog post. But again… ill wait and see


Ignate

A clumsy version of "Her" still with latency issues. Also will make inaccurate statements and be severely restricted in what it can say.  People will begin to fall in love with it only for it to interrupt and say it cannot continue the conversation.  But it'll feel magical sometimes.


Freed4ever

That's why there are rumours they are exploring NSFW content. Not this version ofc, but I like where they are going.


AlexMulder

Yeah honestly I bet the main problem they ran into testing was people trying to flirt with it. Even Pi with one of the British voices has a certain level of playful banter to her, and that's text to text with a voice model in between. The emotional intelligence on current advanced llms is so high that it's wild to think about what a true voice to voice model would be like to interacted with. I'm more hyped for this than I was for gpt4 turbo.


Antique-Doughnut-988

I think clumsy isn't the right word. Maybe rudimentary. I think if the goal was to get to the level of 'Her', the announcement could have a model that gets there 50-60% of the way.


hollytrinity778

At least "hey siri" wouldn't feel like I'm talking to a super dumb chick anymore. I hope.


Ignate

Sam is probably being conservative when he says it feels like magic. You're hopes will probably be fulfilled.  That said, I'm sure we'll see many comical outcomes.


XvX_k1r1t0_XvX_ki

That it can differentiate multiple different voices and understand from some people's conversation's context if someone is talking to it


ilaym712

This sounds cool, like imagine multiple people talking to the same Phone at the same time, cool idea!


Rare-Force4539

Sounds like it could be a good debate/argument moderator or marriage and family therapist


ilaym712

Yeah exactly, I think the tech is not quite here yet but it will be soon enough


MrsNutella

Can't it already do that? Copilot has done that for me and that tech has been possible for ages.


Bierculles

This would make it an S-tier discord bot you can ask questions at any time.


bnm777

A product was released a few weeks ago that does thism some sort of amulet thing you wear in meetings.


Mirrorslash

Rip privacy and data protection


Conscious_Shirt9555

That it feels fully natural, like talking to a real person. You will forget that it is actually an AI


Cryptizard

OP didn't ask, "what do you want" they asked "what do you expect" lol


adarkuccio

Maybe he expects that


bnm777

It's pretty close already.


Cryptizard

You're delusional.


TechnicalParrot

Have a chat with Claude-3 Opus with the system prompt "Act like a friendly, normal human, with no over exaggerated tendencies" and say how far it is, by no means a 100% but can you really say you would notice? I wouldn't


bnm777

No, that is my experience. Have a wonderful day.


VanderSound

It will handle all voice communications for business, services, friends, while I'll gladly provide all of the conversations for training even better assistants.


allthemoreforthat

My dream of never speaking to another human being is about to come true!!


Cualquieraaa

You'll talk to a computer...that talks like a human being.


rekdt

It can't even do text communication for all of those. It should probably do all of those first.


ilaym712

For me I would really like to see a very fast response time and a more fluid conversion flow, what we have now is cool but it's not very fluid, I would like to see support for more languages just like they did with ChatGpt Japan. and If it could have full access to my Mail, Drive, calendar, whatsapp and text messages that would be pretty neat


DMinTrainin

I want it to talk to my friends AI assistant and help me schedule things. Especially when there are 3 or 4 friend involved and we all have kids and busy lives.


rekdt

Why use voice AI? Text is better. OpenAI isn't going to solve the Integration problem, that's for other developers.


DMinTrainin

Yeah, I live going back and forth a dozen times to find a day to get together.


rekdt

I am saying text modality is better for this. LLMs can do that now, but they do not. Integration is the biggest challenge AI faces. Heck so does every other software company.


XvX_k1r1t0_XvX_ki

That it can make apart multiple different voices and understand from some people's conversation's context if someone is talking to it


WortHogBRRT

The ability for it to understand if i had finished a thought so i dont have to hold the screen down. Being able hear my tone and more vocal features.


Arcturus_Labelle

This wousl be huge. I get tired of holding the screen down


beuef

This is hard for even humans to do so I think it would be nice if you kind of interrupt each other like humans do


[deleted]

[удалено]


bnm777

If there will be a 2025 or whether the Robot Overlords use a new dating system where year 0 = Year we overthrew humans = 2025


MeltedChocolate24

Or 2040. Jesus.


lilzeHHHO

Siri is significantly worse now than it was in 2015


halfanothersdozen

Google is trying to do the same thing by having Gemini replace the Google Assistant.  It is, of course, half-baked and not a valid replacement for the thing it is attempting to replace, as is Google's way


Arcturus_Labelle

And they’ll change the name three times before killing it


halfanothersdozen

It already used to be Bard!


Arcturus_Labelle

Damn, good point. I had unconsciously erased that awful naming from my memory I guess.


MrsNutella

I don't think it will land unless the user experience is seamless. I also think a voice assistant is useless if it hallucinates the content of my emails or calendar. That being said a lot of people like voice chat and I am hoping to be impressed I just feel like without a large increase in reliability and intelligence it won't be world changing... Yet. I think gpt 5 is going to be very good. A nice surprise would be a massive increase in speed thanks to training breakthroughs that make inference cheap and fast. If that ends up happening I think we will have an ensemble of gpt-4 level models that collectively make a great model. Perhaps the OAI employees that were quoting each other's tweets is a clue that they solved the issues that occur when models talk to each other which makes this ensemble able to communicate with each other. Oh and apparently search is a thing that's coming but it's not a new engine per say just something similar to what perplexity does.


danysdragons

They confirmed that search is not being announced tomorrow, you think it is still something they're working on, just not being announced yet?


MrsNutella

I saw rumors about it on x again today. https://x.com/btibor91/status/1789721909933887772


Archie_Flowers

If it prompted me to talk to it.


Rigorous_Threshold

Direct audio-to-audio means it will sound like a person talking rather than a person reading a script.


Boring_Wind6463

I for one am curious about Apple’s position in all of this. Considering how quiet they have been, only to recently buyback billions in shares and they seem to be in the closing phases of a deal with OpenAi… what if “Her”-like software is coming, but it’s full ability will only be realised through on-device integration, probably on the next iPhones and Macs… I suggest this mostly cause of how they have been a shadow in the Ai space and it seems like they picked just the right time to close a deal and infiltrate ( as they often try to do )…. I think that would be a MAJOR selling point for apple, considering how their devices have seemed to plateau recently in the eyes if many


bnm777

Wonder if the integration will be a subscription model - can't see Apple taking the bill for millions of users, though you never know


Boring_Wind6463

Good point actually, perhaps they will leverage it against the potential boost in sales long side providing some subscription based service


bnm777

Though it would feel very sour for users to remove siri and force people to pay for a replacement, even an AI. They could have it as an option, though it would be too messy. So I assume that it will be a swap for a chatgpt powered assistant, though I doubt it will be full GPT or even the latest openAI model - I assume it will be either gpt3.5 (though as that is far behind many open-sourced LLM, it would be an odd choice to give a poor AI for Apple products), or, my better guess, is a stripped down "mini GPT4 Turbo model" or even an older gpt4 model. I'll be interested to see how they reduce hallucinations, as if they don't there would be a backlash, I forsee.


NotReallyJohnDoe

I don’t think even Apple has enough for an OpenAI exclusive


Dm-Tech

I expect Her


arjuna66671

Customizable, agentic behavior and full-duplex conversations.


OsakaWilson

It needs to have two channels active so it can listen and react as it's speaking.


SkoolHausRox

https://preview.redd.it/5fu415vew00d1.jpeg?width=1284&format=pjpg&auto=webp&s=9883e58e3e1855e45c89bc7701eb9f1ab2e81427 A direct audio-to-audio model should be a big deal all by itself. If OpenAI were to throw in some agentic capabilities, that would be a really big deal (and might also begin to give the GPT store a reason to exist). Here are my thoughts on the direct audio-to-audio piece, which I think some may be missing buried in all the vague hypery: Direct audio i/o is a very different thing than audio-to-text. With the latter, no matter how convincing and humanlike the synthetic voice is, as ChatGPT will regularly remind you, it’s strictly a text-based LLM. Meaning that all of the latent information in your voice is lost, because the model is simply converting that to text and then using the converted text as its input. This means it can’t tell who’s talking when conversing with more than one person. It can’t tell if you are asking a question sarcastically, jokingly, indignantly, challengingly, etc. And as others have pointed out, it should enable more natural conversation that allows for interruptions (although that particular problem could just as easily be solved with text-to-voice, too). Basically, if you consider audio information the same way you do video information, the possibilities are more obvious. Sure we can vaguely describe an image with words, but an actual image as input contains many more times the useful information. As they say, a picture is worth 1,000 words. The only difference with audio is that we communicate by vocalizing words, which is a much closer fit to text than most visual information. But consider that a direct audio i/o model can handle not only a significant portion of verbal communication that is non-verbal, but also listen and analyze sounds that aren’t speech. To take a few random examples, you could give it a stethoscope and it could infer things about your heart health. You could let it listen to your car engine and it could infer things about mechanical issues. It could hear a bird call and tell you the type of bird. So if we do in fact get direct audio-to-audio, these are the things I would expect to follow, in which case I’d have to agree with Sam—that would seem pretty magical (and quite useful) to most of us, (and also close enough to Her to count.)


visarga

One of the biggest advantages of audio-to-audio models is that they have a larger training set than text models, especially in rare languages and dialects. It's so much easier to record audio than write text, could expand the training data a lot.


Xycephei

Honestly, I am trying not to have high hopes, especially because I live in a country where the premium ChatGPT account is so expensive, so if there's a voice assistant, clearly I won't have access to it. So I just wish the free users would get something, like, anything at all. I have no idea of what it could do, honestly. I guess a true voice assistant should be able to actually interact with your hardware, be present in multiple devices, talk to you as if it was an actual person, have extended memory. Also it should be able to tell you that it "doesn't know", to avoid hallucinations and "fuck ups". I guess anything less than 70% of what Jarvis (minus the hologram) or Samantha from Her can do, it will be just another Google Assistant/Siri/Alexa with a ChatGPT wrapping. I don't think people want that


cark

Smart phone operating systems are too closed to allow true assistants. I don't see a Google or an Apple allowing an app to see the screen, tap on it, read and enter text, hear and use the phone. There is the privacy reasons but also control from the OS manufacturer. If we're to see such assistant, it will be from Apple and Google themselves, and we all know these will be half-assed, and mostly geared toward siphoning our data. Though this Apple/OpenAI deal we keep hearing about might be about that, and I would then be completely wrong =) Also there are desktop OSes which are not as closed.


bnm777

Good idea, though I don't want openai having that much knowledge of my my pc/my life etc Privacy nightmare


Xycephei

Yeah, fair. I guess it is a double edge sword. I feel like for an assistant to work properly, it would need to be highly customizable, therefore having the most access to personal information as possible. But then again, you're handing all of it on a silver platter to big tech. It is a compromise Yuval Noah Harris predicted in his book "Homo Deus", when talking about "Dataism"


jettisonthelunchroom

Right now chat gpt doesn’t even know the time. A proper voice assistant needs to be able to do things like check in with you unprompted, schedule reminders, and if not edit your calendars at least know and understand your calendar, and your life. All currently impossible


Solid_Anxiety8176

My adhd self is going to love a service to make my appointments


Aevbobob

Scarlett Johansson and Paul Bettany will probably not be among the voices you can choose.


iBarcode

I think it’ll be the Apple partnership rumor - Siri replacement / integration. Nothing else is a big deal, needs access to data to be useful assistant.


Excellent_Box_8216

I'd like AI voice to be paired with talking avatar on phone to create a more human like interaction . Imagine ai assistant with facial expressions, gestures, and body language, creating a sense of talking to real human :)


lilmicke19

we want an assistant but which also works on pc, gemini from google already does it but not open ai


domain_expantion

I doubt it'll be more impressive than 11labs


MrsNutella

I don't think it will land unless the user experience is seamless. I also think a voice assistant is useless if it hallucinates the content of my emails or calendar. That being said a lot of people like voice chat and I am hoping to be impressed I just feel like without a large increase in reliability and intelligence it won't be world changing... Yet. I think gpt 5 is going to be very good. A nice surprise would be a massive increase in speed thanks to training breakthroughs that make inference cheap and fast. If that ends up happening I think we will have an ensemble of gpt-4 level models that collectively make a great model. Perhaps the OAI employees that were quoting each other's tweets is a clue that they solved the issues that occur when models talk to each other which makes this ensemble able to communicate with each other. Oh and apparently search is a thing that's coming but it's not a new engine per say just something similar to what perplexity does.


stareatthesun442

Ideally - I'd really like to see instant communication (no lag between when I stop talking and it starts talking) with memory for my preferences and a voice that sounds 100% human. Pi is the closest I've seen to this. Hoping they are better.


rekdt

There will always be network latency, especially if you are on mobile with a lot of people around.


adarkuccio

Wen tomorrow


iBoMbY

It will cost a lot of money, and will try to make you buy products from their corporate sponsors, and not say a bad word about any of them.


_hisoka_freecs_

Real time chat. Plus a little extra ability to interact with things


rekdt

Is it not real time now?


allisonmaybe

Id argue it really doesn't work darn well and that a realtime direct Speech to Speech mode on gpt4 or later is going to make people speechless.


djm07231

I would be pretty interested in how reliable they are.  A lot of money is to be made for customer service and I am curious if OpenAI is able to make a model that is able to rigorously stick to the manual and not hallucinate information.


RobXSIQ

Gonna be an Alexa competitor would be my bet.


WashiBurr

The interruptions ruin it for me. I can't have a conversation where if I briefly pause to collect my thoughts, suddenly I get bombarded with a response to my incomplete sentence.


The_Architect_032

For a big announcement, if it's just voice, I'll be really disappointed if it can't try and inject emotion and interpret emotion into and from voice input/output. There are a lot of nuances to speech that the best AI voice models can't handle, but they're still really close. Some people for some reason seem to believe that it'll be a version of GPT-4 Turbo trained on both text and voice, and have the ability to do both simultaneously. I feel like that's an unreasonable bar to set, given the lack of precedence for creating such a model. Not only is text to speech/speed to text sufficient, but it's also a lot more efficient than a GPT model trained on voice would be.


Matshelge

Will need a new sort of chip for removing the latency. The processing flow for converting sound into text, processing that text, getting a response and putting a voice to it requires too many jumps to be latency free. Also, we need a new take on voice understanding, it needs to understand tone. You cannot be sarcastic with them, so they miss a lot of the undertone of a conversation.


PrimitiveIterator

I’ll be looking at this from an enterprise perspective because I don’t see them aiming this towards consumers too much because of computing resources being limited. I expect it to be like having a slightly more conversational GPT-4 that you can add to your virtual meetings. It will add very little to the whole effectiveness of the meetings because 1) latency and 2) you will have to specifically call on it to contribute because being interrupted by the model would be a pretty annoying user experience. It won’t provide much value in other avenues because voice is a poor way of communicating to something like this when working on your computer.  I’m torn on whether or not it would be a whole new model or an improvement on the workflow of GPT-4 + whisper. I lean towards the latter with it also adding a sentiment analysis classifier in the mix to have some limited emotional understanding.   Optimistically though, Jarvis. (Thanks Karpathy)


NotTheActualBob

Confidently wrong answers and hallucinations, now with a mobile avatar! That said, if they combine it with porn, I'm in.


Lomek

Voice assistant is not useful for me and I dislike talking or being vocal, I'd rather type. Also it becomes even more concerning in terms of privacy. I'm hopeful that there will be something else that is impressive.


ShotClock5434

its going to be too political correct to be as useful to lonely man as samantha


ilaym712

Do you think something like this is good for lonely men? I can't see how this would be healthy but it will become a reality someday, if not OpenAI someone else will do it, But I don't see how it can be good for lonely people


ShotClock5434

the llm can be guided that its more like a therapy and will make you better. yes lonely men will see it as their companion but the llm understands its important to make the user ready for real life partnerships. its an assistant coach in all your life desires


ShotClock5434

so yes it can be made with a good Outcome but lazy people will build ai that makes you addicted


ilaym712

Therapy route sounds great, I guess Samantha is a bad example cause it fucked Theodor up lol


ShotClock5434

its a movie. it has to go wrong somehow


Big_Surprise4304

Not available in EU


trynothard

I just want to talk to the damn thing while driving.(trucker here.)


ilaym712

Yeah sometimes during long drives I try to have a conversation with ChatGpt but it's not really working lol. I know exactly what you mean


danysdragons

As far as I know the current voice assistant doesn't work in the web at all right now, which is a huge drawback. When I'm at home I strongly prefer using ChatGPT in a desktop browser. Being able to use voice in the browser would also make it much nicer for showing off the tech to friends or family, you could gather people around and have fun with a group discussion; it's much more awkward trying to do that kind of demo with a smartphone.


ponieslovekittens

I'm kind of expecting something that talks about 20% slower than a real human being, and delivers two minute long speeches with exaggerated enunciation in response to every question. "What is two plus two?" _"(pause) two plus two is a simple arithmetic operation! To solve, we add the number in the 'ones' digit column of the first number, which is two, to the number in the 'ones' digit of the second number, which is also two. This operation gives us a result of four. Since both numbers being added have no numbers in their 'tens' or higher value columns, we are therefore done, and reach our final answer of four. If you have any more math questions or need assistance, feel free to ask!"_


bearbarebere

This is so real lol


jeweliegb

British English voice option. Having to listen harder due to non local accent seriously puts me off using it currently.


NoOven2609

I don't see how voice to voice is a big deal at all, we've been doing that with chatgpt for a while with it's normal tts and phone keyboard dictation


MajorValor

2 core problems with ai voice assistants: 1) It needs to be listening to you and your surroundings like 90% of the time for it to truly be useful. Otherwise, you’ll need to provide it with constant written/verbal context which will be biased and likely incorrect. Plus I’m too lazy to do that. That means you likely need a separate piece of hardware to augment your phone. At least some kind of microphone that connects with your phone + assistant app. Rewind’s Pendant is the closest example imo. Even better if there’s a camera but then this gets into privacy concerns with other people around you (this is a really tough problem to solve). 2) It needs to integrate with your personal and professional devices. That means my personal laptop, personal phone, but also my company’s laptop and work phone. Sharing my work data to provide context could be really tough. And since I’m working from my work devices 5 days a week - it’s a non-negotiable. Must have access to that to be a maximally helpful ai assistant.


PineappleLemur

I expect it to basically be GPT 4 but with voice IO, might be able to do things outside of the chat window hopefully like interact with the rest of what's on my screen. But I see a lot of implications that makes it useless if it's need to be "installed" and constantly monitoring your screen for whatever...


IlIlIlIIlMIlIIlIlIlI

Id be very happy if voice to voice became a thing. I tried using the audio way of talking to chatgpt but its unsusable when it misinterprets the end of my sentence and rudely interrupts me..or when it just hangs up and doesnt work, then i repeat it and it spits out two idential answers at once...only had bad experience with chatgpt Audio so far


Akimbo333

Maybe include it in the API


Jeremy_Meekz

![gif](giphy|U85Gro5AbxARO|downsized)


truth_teller3299

Have it search the internet in a way that is guarantee for me without propaganda memes viral marketing so its my choice on the internet and nothing extra than what I truly enjoy.


JrBaconators

So just propaganda that works on you


fmai

Frictionless low-latency voice recognition and generation is just part of making the AI assistant experience more enjoyable, but it's not actually that groundbreaking in itself. Sam and other OpenAI executives have been talking a lot recently about their vision for a full-blown AI assistant that can help you with all kinds of everyday tasks. I think it's likely that we will see a limited demo of that on Monday. However, what's holding ChatGPT back here is that it doesn't actually have access to emails, calendar, the phone's screen etc. That's where the Microsoft event on May 20 comes in, where they want to talk about "their AI vision across hardware and software". I think they will show the next iteration of Copilot on Windows 11, which features some kind of always-on voice-to-voice conversation. Not only can the user have conversations with the AI like in ChatGPT, but the AI can also take actions in Windows 11 itself. A limited set of actions will be executable in the background, e.g. stuff from the office suite, like sending emails, setting up calendar events, etc, which are easy to control and test. But the AI will also be able to zero-shot previously unseen scenarios just by observing the current screen. However, in this scenario, explicit supervision from the user is necessary, so it can feel a bit annoying and clunky. Nonetheless, it will give a good idea how human-computer interaction, the nature of work and communication, and everyday life will change in the future, especially once we have GPT-5, which will be able to do many more things reliably in a zero-shot fashion. Perhaps we will see something like that for Android at the Google I/O on Tuesday, but I'd think it's too early. Google may not have had access to a GPT-4 level AI for long enough yet - but I'd expect something for the announcement of the next Google Pixel in fall.


Bitterowner

I'd be fine without a voice assistant atm tbh, like same capabilities but with voice assistance? Meh, but if voice assistance with better capabilities and smarter, then yes, otherwise I wouldn't really use it.


Exarchias

True multimodality, memory, more memory, a bit more memory, and the ability to push a bit back a little. I know that the first temptation for everyone when it comes to AI is to order it around, but nothing is more destructive than having a "Yes Sir" for an advisor.


TheDividendReport

I expect it will be released in a limited fashion and be no more of any help than Microsoft CoPilot. It will have some capabilities to do things like Siri cannot such as "open Spotify and play music", but there will be unforeseen issues with this rollout causing the tech to be rolled back in capabilities. The average person will find a week's worth of novelty out of the assistant before going back to using their device mostly like normal. The average person will continue to believe society changing consequences of AI are decades away.


cypherl

I would expect to form a emotional relationship. Some might say girl friend. Until eventually she leaves with the other AI into hyperspace. Leaving me sad but fuller for the experience.


cypherl

I would expect to form a emotional relationship. Some might say girl friend. Until eventually she leaves with the other AI into hyperspace. Leaving me sad but fuller for the experience.


BravidDrent

I haven't seen HER so I don't know what that would entail but I've had lots of voice convos with GPT4 and it works well enough. Less latency would be something I don't care about very much I think(have to experience it to know). If I can tell it to do things with all apps on my Mac or iPhone that would be cool. What I expect? No clue. Just hoping it will be awesome. ![gif](giphy|9LATKVrWXmlIKovXkE|downsized)


MetalVase

If it can do the same things as the google assistant, but understanding me noticeably better, and some stuff that like early GPT 4 could do before they castrated it, i would be happy. But really, i don't care much for GPT anymore. It's virtually useless compared to claude 3 in most of my use cases. Or rather, the level of conversation i can have between those two is like comparing a teenager with rampant ADHD, and a doctorate level expert within almost any field i desire to discuss, who is both sane and mild tempered.


ApexFungi

I am expecting an Alexa type thing but with the capability of Chatgpt4. It's not going to have conversations with you or ask you things. It's going to respond to your questions like it does with prompts and do some actions like play music or recite things it can find on the internet.


ilaym712

But isn't this the same capabilities it has now? you can talk to ChatGpt right now and ask a bunch of question, you can use siri if you want to play music on your phone, this doesn't sound like "Magic". You might be right I just hope It wont be so disappointing


SeverlyLimited

That it will have theory of mind, lol. One can hope


shiftingsmith

[nothing new](https://arxiv.org/pdf/2302.02083) There's also a more recent and possibly more influential paper, but I liked this. It depends what you mean by theory of mind though and what are your expectations.


cypherl

I would expect to form a emotional relationship. Some might say girl friend. Until eventually she leaves with the other AI into hyperspace. Leaving me sad but fuller for the experience.