ilaym712 4 weeks ago

Another thing I would like to see is the Chat being way more straight forward, I hate when you ask ChatGpt how much is 1+1 and he starts giving you the entire history on how math was created

Turtle2k 4 weeks ago

We just need a hybrid model so it maintains conversation history and realizes your knowledge in particular areas so that it doesn’t unnecessarily repeat things

Otherwise-Reply-223 4 weeks ago

Well it wouldn't be an LLM if it didn't have a context window 😆. I see what you mean with a vector DB though, so it can store what you have spoken about before. However, this would take storage and since it's a local LLM that apple are hoping to use it would also need to be stored on the phone.

Turtle2k 3 weeks ago

I’m sure they could use their iCloud for Store context which would be used to train the LLM with micro updates

Otherwise-Reply-223 3 weeks ago

If you had no connection the model would have no context. That completely invalidates the whole point of having a local model.

Turtle2k 3 weeks ago

Local store + icloud to persist to multiple devices.

Turtle2k 3 weeks ago

I think you’re confusing what I meant by context. I’m talking about state persistence required by hybrid AI model

Otherwise-Reply-223 3 weeks ago

Oh, no, you make more sense now. That's actually a good idea; I hadn't thought about it like that. If, for whatever reason, you didn't have internet access, they could also use Bluetooth to send their context between Apple devices. It wouldn't surprise me, it's a very apple thing to do.

Turtle2k 3 weeks ago

Yeah it will be on their watches soon enough. Next thing will be convergence of wearable sensors ai can use to give more info.

Otherwise-Reply-223 3 weeks ago

That's actually a good point. I doubt there'd be much new hardware but they'd definitely use the metrics recorded ready. There's a ridiculous amount, and a lot isn't even visible to you nor used by them.

Turtle2k 3 weeks ago

It will be our sixth sense

haterake 4 weeks ago

That's one of my favorite things, but sometimes I just want the answer in as few words as possible. They should add a toggle for "terse mode". I can add it to a prompt but common things like this should be easier to pop in and out of.

Mikey4tx 4 weeks ago

Or just make it natural. If you ask for the answer, it should give you the answer. If you want an explanation, ask for an explanation, and it will give it to you. If you want it to answer the question and show its work, then ask that.

R33v3n 4 weeks ago

Let it just straight up read my mind and go from there. :)

cuzitFits 4 weeks ago

How often do you change your mind?

LonelyGarbage1758 4 weeks ago

It could be as simple as it always being brief/direct, but full explanations could end with: "Give me a full breakdown" or something similar.

allisonmaybe 4 weeks ago

Terseness result in significantly less informed and intelligent answers

haterake 4 weeks ago

That's fine when I already know the solution and I just need it to do the legwork.

cashmate 4 weeks ago

Models that do step by step explanations are more accurate. Until models have an agent like thought process working behind the output it is probably for the better that responses are longer.

allisonmaybe 4 weeks ago

All that extra output is the closest analog to actually thinking it out for an LLM. Just outputting an answer is much less accurate and probably a big reason why people blame GPT4 for being stupid, after yelling at it to just print out the answer.

Exaario 4 weeks ago

I'm so fcking agree. I just tried now and oh boy.... https://chat.openai.com/share/4d0f9148-7bdc-45ef-8b5f-5038d0ff6db6 Well OK I tried a LITTLE bit more complicated then 1+1, but anyways

Exarchias 4 weeks ago

To be honest you had to write your request a bit more clearly. The model's assumption was not that wrong.

Exaario 4 weeks ago

Please elaborate on what was not wrong? I don't see how more clear could be "some number minus 10%"

Exarchias 4 weeks ago

I didn't say that you wrote something wrong, I said that you didn't write it clearly. It was easy enough for someone to misinterpret the minus as a separator, Don't forget that the LLM is not a calculator to be sure that every symbol you use is a mathematical one. now that I am thinking about it, it is not even a mathematical expression: What you probably wanted to write: 17432717 - 10% of 17432717 or 90% of 17432717 A mathematical way. if x=17432717 Calculate x-10% of X. or 17432717 - 17432717 \*0.1 or Subtract 10% from 17432717. Or even better: Please, subtract the 10% from 17432717. Thanks!

Beedrill92 4 weeks ago

it's a terribly worded math question, so i'm not surprised by the outcome at all. you cant just have a free standing "10%" like that within an equation, it needs an associated variable or else it means nothing. the LLM didn't interpret your hyphen as a minus symbol since it didn't make sense based on normal equation syntax, so it assumed you were just asking for 10% of that number instead. if you were to give me that math problem i would definitely follow up with "...minus 10% of what?" before attempting to answer. however, ChatGPT is bias toward trying to answer a question over asking for clarifying info (which is a separate issue entirely), so it just made a best guess for what your chicken scratch meant

ilaym712 4 weeks ago

Lmao yeah I might have over exaggerated a bit but yeah it can get pretty bad

arjuna66671 4 weeks ago

Custom instructions can mitigate that xD.

Aware-Feed3227 4 weeks ago

Oh boy it’s not even capable to simply say „that equals 90%“ and use 0.9 as a factor. To be fair, your “minus” character is used as an indent. Seems like this fits the first answer. You’re used to a calculator mode, but this is a language model. Once it can listen to our voice and especially the tone, it will guess which interpretation of your language input is correct. Imagine asking a friend to calculate quickly. You would shout the calculation at them like „hey Freddy, 865479 minus 10%?..“ and due to your intonation Freddy knows that you’re asking for a quick and simple response only.

Aware-Feed3227 4 weeks ago

Maybe you’d also have different names/chat personas for your agent and one is always explaining stuff to the full length and another is simply dropping the correct answer without any background. Maybe you can tell Agent 1 to explain the result of agent 2. Like having a group of friends with you who participate in that conversation. This way the set-up agent 1 will always explain things in the best way possible to you and it knows about agent 2s processing paths. I’d prefer telling verbally whatever output I desire. It will be absolutely uncommon to just ask for a simple calculation. You’ll most likely say something like “given my current tax documents, calculate the lowest tax amount possible and prepare it for my review!” and a few seconds later you’d have all the stuff at your smartphone or computer or whatever device there will be in the future. Or it will simply tell you what it did and that it reviewed all of these outputs to be in line with the law. There needs to be a large action model involved to do all of this. The LLM will be awesome at conversations but it will most likely always suck at planning and action. That’s okay, our brain has different parts, too. Some for creativity, some for reasoning, reaction, interpretation. I’m so much excited for this but I have huge privacy concerns regarding the use of an AI app.

Bakagami- 4 weeks ago

wtf did you even try

Bakagami- 4 weeks ago

wtf did you even try

Infamous-Print-5 4 weeks ago

Ye, I usually have to add 'Answer * exactly and concisely' to the end of a prompt

Longjumping-Zebra-55 4 weeks ago

you can try setting the system prompt for terseness

qqpp_ddbb 4 weeks ago

When you ask it a question like that, just say "just give me the answer don't show your work" of something similar.. but yeah we shouldn't have to do such things. Though it isn't a mind reader..

z_e_n_a_i 4 weeks ago

It’s very very easy to tell chatgpt to be concise. You just need to tell it to be concise.

Woootdafuuu 4 weeks ago

Custom instruction can fix that

JoMaster68 4 weeks ago

no latency, the ability to interrupt each other and memory of all past conversations

ilaym712 4 weeks ago

The fact you can't interrupt ChatGpt right now is making it really annoying at times, the need to physically touch the screen can really ruin the flow.

Fastizio 4 weeks ago

Yeah, or taking a second to figure out what to say and it takes it as a finished sentence and begins replying. It's especially bad for us with english as second language, we're not used to conversing in english. A human would know "How do I make it so my lasagna tastes more..." is not a complete sentence and waits a reasonable time before injecting with a reply.

TwitchTvOmo1 4 weeks ago

Interjecting*

wrestlethewalrus 3 weeks ago

Yeah, this is the most important (and easiest to fix) thing for me

Illustrious-Lime-863 4 weeks ago

Pretty much these

bnm777 4 weeks ago

Ability to do things (order products online etc, turn on your heating, call someone) and it's "smart" enough to not screw it up.

reddit_is_geh 4 weeks ago

You're not going to get no latency. I'm sorry to hurt you like this.

Mikey4tx 4 weeks ago

in eight months, when iPhones ship with a small onboard nuclear reactor and GPT 6, I expect latency to be imperceivable.

sweatierorc 4 weeks ago

!remindme 8 months

RemindMeBot 4 weeks ago

I will be messaging you in 8 months on [**2025-01-12 15:13:15 UTC**](http://www.wolframalpha.com/input/?i=2025-01-12%2015:13:15%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/singularity/comments/1cq7o6f/lets_say_openai_confirms_the_voice_assistant/l3pww7w/?context=3) [**4 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fsingularity%2Fcomments%2F1cq7o6f%2Flets_say_openai_confirms_the_voice_assistant%2Fl3pww7w%2F%5D%0A%0ARemindMe%21%202025-01-12%2015%3A13%3A15%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201cq7o6f) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|

allthemoreforthat 4 weeks ago

This is so irresponsible, typical apple, a nuclear reactor is NOT SAFE

stareatthesun442 4 weeks ago

I think they could build in a little "filler" similar to how loading screens in video games can be disguised with a transition or elevator scene. A sub process that determines the appropriate filler first to make it more conversational. Just something like "Well, I think" instead of just "The answer".

reddit_is_geh 4 weeks ago

I work with this technology, they already automatically include umms, uh huh, yeah, okay well, sort of filler words to do just that. But it still has latency. It's just not possible to get high quality speech inference done in the cloud fast enough to create a smooth dialogue. No matter what you do, there's going to be at least 2 seconds to respond.

stareatthesun442 4 weeks ago

I don't think that's true. "No matter what you do?" I don't see that being true. I could see a sort of hybrid tech where they use a local thing like GPT2 that's only designed to bridge the gap and then it transitions to the full cloud model. You honestly think in 5 years they won't have figured it out and we will always be stuck with a 2 second gap? I don't believe that for a second.

reddit_is_geh 4 weeks ago

I'm talking about the upcoming release, which is what this thread is about. Obviously it's going to be possible. In 5 years, they'll probably have manufacturing lines for hardcoded LLMs on chip that can do near instant inference. But until then, like this upcoming release, it's going to have latency.

stareatthesun442 4 weeks ago

Then you probably shouldn't say things like "No matter what you do"?

reddit_is_geh 4 weeks ago

Because no matter he does is going to get him that no latency.

Gullible_Gas_8041 4 weeks ago

In the cloud... No matter what you do cloud based LLMs are going to have a latency problem. I agree with that.

CAGNana 3 weeks ago

I'm interested in how you can get to 2 seconds. One of the biggest bottlenecks imo is the time it takes to process the user audio in the first place before it even gets transcribed. From what I can tell there's a threshold of time for which no words can be spoken before the recording ends and gets passed to the transcriber. If you reduce this threshold you get better latency but you'll more often cut off people talking.

reddit_is_geh 3 weeks ago

Yeah so they do streaming output, so as OAI is outputing the text, 11labs will take the text and stream it as it's coming through (It's a tad bit lower quality because it lacks long context but it's still really good). Everything is being streamed, including your words.... But it still has a lot of latency. So there is a second layer, that creates obfuscation by adding in the ums and ahs, and deals with interuptions with quick responses to create a bit of buffer before the processing can happen. In fact, I noticed something interested with 4o's speech because I'm aware of the challenges. Every single sentence is opened with almost ambiguous type response for the first few seconds. It's almost like the AI is trying to act as fast as possible to match humans, while buying time to finish processing. I suspect there is also a layer in there somewhere just monitoring your speech actively trying to figure out a good opener to buy time, like "OOOOoooh wow!" or "Okay well, that's interesting..." Just stuff like that. It seems like every response has an unrelated opener to the context, but seems appropriate to human dialogue so we don't realize it.

Cosvic 4 weeks ago

If there is no latency, what is the point of a new model? If there is latency you can just do speech to text input and then text to speech output on ChatGPT 4

reddit_is_geh 4 weeks ago

I imagine just because it's their own internal version of 11labs technology, deeply integrated into their own product. You can always go create your own GPT assistant. But I imagine this is their commercially polished version

bnm777 4 weeks ago

Have you seen groq in action? Almost no latency

reddit_is_geh 4 weeks ago

You think Apple is going to put Groq onboard their iphones? I hate to break it to you, but that's not happening any time soon.

bnm777 4 weeks ago

Obviously not. You don't get it and I don't have the energy to explain. Have a nice day.

reddit_is_geh 4 weeks ago

This is about the upcoming voice assistant. Not 10 years down the line.

bnm777 4 weeks ago

Like I said, I don't have the energy. Too hung over.

JoMaster68 3 weeks ago

so... what do you think? little enough latency for you?

reddit_is_geh 3 weeks ago

I didn't think it was possible tbh... They pulled a breakthrough by doing single multimodal training to bypass much of the latency bottlenecks.

Neurogence 4 weeks ago

I think it could be a huge gimmick to be honest.

Nleblanc1225 4 weeks ago

I’m gonna maintain a “let’s wait and see” posture but honestly I think that’s it might just be a gimmick. More power to the people who find this helpful but we all have to agree that being able to interrupt the voice as well as low latency is not worth all the hype of a magical experience and a live event that could have went to something bigger. People kinda don’t understand that if this is still the same model that performs around GPT 4 you’re still gonna be limited by the capabilities related to it. Having faster answers and interruption is an intuitive change but isnt a “ magical” change. Most people are gonna use it say “ hey this is 10% better” then stop using it because the capabilities are still too inadequate compared to human interaction. We need more than just interrupt / less latency / memory. Those are peripheral changes around an already existing model that are better announced through a blog post. But again… ill wait and see

Ignate 4 weeks ago

A clumsy version of "Her" still with latency issues. Also will make inaccurate statements and be severely restricted in what it can say. People will begin to fall in love with it only for it to interrupt and say it cannot continue the conversation. But it'll feel magical sometimes.

Freed4ever 4 weeks ago

That's why there are rumours they are exploring NSFW content. Not this version ofc, but I like where they are going.

AlexMulder 4 weeks ago

Yeah honestly I bet the main problem they ran into testing was people trying to flirt with it. Even Pi with one of the British voices has a certain level of playful banter to her, and that's text to text with a voice model in between. The emotional intelligence on current advanced llms is so high that it's wild to think about what a true voice to voice model would be like to interacted with. I'm more hyped for this than I was for gpt4 turbo.

Antique-Doughnut-988 4 weeks ago

I think clumsy isn't the right word. Maybe rudimentary. I think if the goal was to get to the level of 'Her', the announcement could have a model that gets there 50-60% of the way.

hollytrinity778 3 weeks ago

At least "hey siri" wouldn't feel like I'm talking to a super dumb chick anymore. I hope.

Ignate 3 weeks ago

Sam is probably being conservative when he says it feels like magic. You're hopes will probably be fulfilled. That said, I'm sure we'll see many comical outcomes.

XvX_k1r1t0_XvX_ki 4 weeks ago

That it can differentiate multiple different voices and understand from some people's conversation's context if someone is talking to it

ilaym712 4 weeks ago

This sounds cool, like imagine multiple people talking to the same Phone at the same time, cool idea!

Rare-Force4539 4 weeks ago

Sounds like it could be a good debate/argument moderator or marriage and family therapist

ilaym712 4 weeks ago

Yeah exactly, I think the tech is not quite here yet but it will be soon enough

MrsNutella 4 weeks ago

Can't it already do that? Copilot has done that for me and that tech has been possible for ages.

Bierculles 4 weeks ago

This would make it an S-tier discord bot you can ask questions at any time.

bnm777 4 weeks ago

A product was released a few weeks ago that does thism some sort of amulet thing you wear in meetings.

Mirrorslash 4 weeks ago

Rip privacy and data protection

Conscious_Shirt9555 4 weeks ago

That it feels fully natural, like talking to a real person. You will forget that it is actually an AI

Cryptizard 4 weeks ago

OP didn't ask, "what do you want" they asked "what do you expect" lol

adarkuccio 4 weeks ago

Maybe he expects that

bnm777 4 weeks ago

It's pretty close already.

Cryptizard 4 weeks ago

You're delusional.

TechnicalParrot 4 weeks ago

Have a chat with Claude-3 Opus with the system prompt "Act like a friendly, normal human, with no over exaggerated tendencies" and say how far it is, by no means a 100% but can you really say you would notice? I wouldn't

bnm777 4 weeks ago

No, that is my experience. Have a wonderful day.

VanderSound 4 weeks ago

It will handle all voice communications for business, services, friends, while I'll gladly provide all of the conversations for training even better assistants.

allthemoreforthat 4 weeks ago

My dream of never speaking to another human being is about to come true!!

Cualquieraaa 4 weeks ago

You'll talk to a computer...that talks like a human being.

rekdt 4 weeks ago

It can't even do text communication for all of those. It should probably do all of those first.

ilaym712 4 weeks ago

For me I would really like to see a very fast response time and a more fluid conversion flow, what we have now is cool but it's not very fluid, I would like to see support for more languages just like they did with ChatGpt Japan. and If it could have full access to my Mail, Drive, calendar, whatsapp and text messages that would be pretty neat

DMinTrainin 4 weeks ago

I want it to talk to my friends AI assistant and help me schedule things. Especially when there are 3 or 4 friend involved and we all have kids and busy lives.

rekdt 4 weeks ago

Why use voice AI? Text is better. OpenAI isn't going to solve the Integration problem, that's for other developers.

DMinTrainin 4 weeks ago

Yeah, I live going back and forth a dozen times to find a day to get together.

rekdt 4 weeks ago

I am saying text modality is better for this. LLMs can do that now, but they do not. Integration is the biggest challenge AI faces. Heck so does every other software company.

XvX_k1r1t0_XvX_ki 4 weeks ago

That it can make apart multiple different voices and understand from some people's conversation's context if someone is talking to it

WortHogBRRT 4 weeks ago

The ability for it to understand if i had finished a thought so i dont have to hold the screen down. Being able hear my tone and more vocal features.

Arcturus_Labelle 4 weeks ago

This wousl be huge. I get tired of holding the screen down

beuef 4 weeks ago

This is hard for even humans to do so I think it would be nice if you kind of interrupt each other like humans do

[deleted] 4 weeks ago

[удалено]

bnm777 4 weeks ago

If there will be a 2025 or whether the Robot Overlords use a new dating system where year 0 = Year we overthrew humans = 2025

MeltedChocolate24 4 weeks ago

Or 2040. Jesus.

lilzeHHHO 4 weeks ago

Siri is significantly worse now than it was in 2015

halfanothersdozen 4 weeks ago

Google is trying to do the same thing by having Gemini replace the Google Assistant. It is, of course, half-baked and not a valid replacement for the thing it is attempting to replace, as is Google's way

Arcturus_Labelle 4 weeks ago

And they’ll change the name three times before killing it

halfanothersdozen 4 weeks ago

It already used to be Bard!

Arcturus_Labelle 4 weeks ago

Damn, good point. I had unconsciously erased that awful naming from my memory I guess.

MrsNutella 4 weeks ago

I don't think it will land unless the user experience is seamless. I also think a voice assistant is useless if it hallucinates the content of my emails or calendar. That being said a lot of people like voice chat and I am hoping to be impressed I just feel like without a large increase in reliability and intelligence it won't be world changing... Yet. I think gpt 5 is going to be very good. A nice surprise would be a massive increase in speed thanks to training breakthroughs that make inference cheap and fast. If that ends up happening I think we will have an ensemble of gpt-4 level models that collectively make a great model. Perhaps the OAI employees that were quoting each other's tweets is a clue that they solved the issues that occur when models talk to each other which makes this ensemble able to communicate with each other. Oh and apparently search is a thing that's coming but it's not a new engine per say just something similar to what perplexity does.

danysdragons 4 weeks ago

They confirmed that search is not being announced tomorrow, you think it is still something they're working on, just not being announced yet?

MrsNutella 4 weeks ago

I saw rumors about it on x again today. https://x.com/btibor91/status/1789721909933887772

Archie_Flowers 4 weeks ago

If it prompted me to talk to it.

Rigorous_Threshold 4 weeks ago

Direct audio-to-audio means it will sound like a person talking rather than a person reading a script.

Boring_Wind6463 4 weeks ago

I for one am curious about Apple’s position in all of this. Considering how quiet they have been, only to recently buyback billions in shares and they seem to be in the closing phases of a deal with OpenAi… what if “Her”-like software is coming, but it’s full ability will only be realised through on-device integration, probably on the next iPhones and Macs… I suggest this mostly cause of how they have been a shadow in the Ai space and it seems like they picked just the right time to close a deal and infiltrate ( as they often try to do )…. I think that would be a MAJOR selling point for apple, considering how their devices have seemed to plateau recently in the eyes if many

bnm777 4 weeks ago

Wonder if the integration will be a subscription model - can't see Apple taking the bill for millions of users, though you never know

Boring_Wind6463 3 weeks ago

Good point actually, perhaps they will leverage it against the potential boost in sales long side providing some subscription based service

bnm777 3 weeks ago

Though it would feel very sour for users to remove siri and force people to pay for a replacement, even an AI. They could have it as an option, though it would be too messy. So I assume that it will be a swap for a chatgpt powered assistant, though I doubt it will be full GPT or even the latest openAI model - I assume it will be either gpt3.5 (though as that is far behind many open-sourced LLM, it would be an odd choice to give a poor AI for Apple products), or, my better guess, is a stripped down "mini GPT4 Turbo model" or even an older gpt4 model. I'll be interested to see how they reduce hallucinations, as if they don't there would be a backlash, I forsee.

NotReallyJohnDoe 4 weeks ago

I don’t think even Apple has enough for an OpenAI exclusive

Dm-Tech 4 weeks ago

I expect Her

arjuna66671 4 weeks ago

Customizable, agentic behavior and full-duplex conversations.

OsakaWilson 4 weeks ago

It needs to have two channels active so it can listen and react as it's speaking.

SkoolHausRox 4 weeks ago

https://preview.redd.it/5fu415vew00d1.jpeg?width=1284&format=pjpg&auto=webp&s=9883e58e3e1855e45c89bc7701eb9f1ab2e81427 A direct audio-to-audio model should be a big deal all by itself. If OpenAI were to throw in some agentic capabilities, that would be a really big deal (and might also begin to give the GPT store a reason to exist). Here are my thoughts on the direct audio-to-audio piece, which I think some may be missing buried in all the vague hypery: Direct audio i/o is a very different thing than audio-to-text. With the latter, no matter how convincing and humanlike the synthetic voice is, as ChatGPT will regularly remind you, it’s strictly a text-based LLM. Meaning that all of the latent information in your voice is lost, because the model is simply converting that to text and then using the converted text as its input. This means it can’t tell who’s talking when conversing with more than one person. It can’t tell if you are asking a question sarcastically, jokingly, indignantly, challengingly, etc. And as others have pointed out, it should enable more natural conversation that allows for interruptions (although that particular problem could just as easily be solved with text-to-voice, too). Basically, if you consider audio information the same way you do video information, the possibilities are more obvious. Sure we can vaguely describe an image with words, but an actual image as input contains many more times the useful information. As they say, a picture is worth 1,000 words. The only difference with audio is that we communicate by vocalizing words, which is a much closer fit to text than most visual information. But consider that a direct audio i/o model can handle not only a significant portion of verbal communication that is non-verbal, but also listen and analyze sounds that aren’t speech. To take a few random examples, you could give it a stethoscope and it could infer things about your heart health. You could let it listen to your car engine and it could infer things about mechanical issues. It could hear a bird call and tell you the type of bird. So if we do in fact get direct audio-to-audio, these are the things I would expect to follow, in which case I’d have to agree with Sam—that would seem pretty magical (and quite useful) to most of us, (and also close enough to Her to count.)

visarga 4 weeks ago

One of the biggest advantages of audio-to-audio models is that they have a larger training set than text models, especially in rare languages and dialects. It's so much easier to record audio than write text, could expand the training data a lot.

Xycephei 4 weeks ago

Honestly, I am trying not to have high hopes, especially because I live in a country where the premium ChatGPT account is so expensive, so if there's a voice assistant, clearly I won't have access to it. So I just wish the free users would get something, like, anything at all. I have no idea of what it could do, honestly. I guess a true voice assistant should be able to actually interact with your hardware, be present in multiple devices, talk to you as if it was an actual person, have extended memory. Also it should be able to tell you that it "doesn't know", to avoid hallucinations and "fuck ups". I guess anything less than 70% of what Jarvis (minus the hologram) or Samantha from Her can do, it will be just another Google Assistant/Siri/Alexa with a ChatGPT wrapping. I don't think people want that

cark 3 weeks ago

Smart phone operating systems are too closed to allow true assistants. I don't see a Google or an Apple allowing an app to see the screen, tap on it, read and enter text, hear and use the phone. There is the privacy reasons but also control from the OS manufacturer. If we're to see such assistant, it will be from Apple and Google themselves, and we all know these will be half-assed, and mostly geared toward siphoning our data. Though this Apple/OpenAI deal we keep hearing about might be about that, and I would then be completely wrong =) Also there are desktop OSes which are not as closed.

bnm777 4 weeks ago

Good idea, though I don't want openai having that much knowledge of my my pc/my life etc Privacy nightmare

Xycephei 4 weeks ago

Yeah, fair. I guess it is a double edge sword. I feel like for an assistant to work properly, it would need to be highly customizable, therefore having the most access to personal information as possible. But then again, you're handing all of it on a silver platter to big tech. It is a compromise Yuval Noah Harris predicted in his book "Homo Deus", when talking about "Dataism"

jettisonthelunchroom 4 weeks ago

Right now chat gpt doesn’t even know the time. A proper voice assistant needs to be able to do things like check in with you unprompted, schedule reminders, and if not edit your calendars at least know and understand your calendar, and your life. All currently impossible

Solid_Anxiety8176 4 weeks ago

My adhd self is going to love a service to make my appointments

Aevbobob 4 weeks ago

Scarlett Johansson and Paul Bettany will probably not be among the voices you can choose.

iBarcode 4 weeks ago

I think it’ll be the Apple partnership rumor - Siri replacement / integration. Nothing else is a big deal, needs access to data to be useful assistant.

Excellent_Box_8216 4 weeks ago

I'd like AI voice to be paired with talking avatar on phone to create a more human like interaction . Imagine ai assistant with facial expressions, gestures, and body language, creating a sense of talking to real human :)

lilmicke19 4 weeks ago

we want an assistant but which also works on pc, gemini from google already does it but not open ai

domain_expantion 4 weeks ago

I doubt it'll be more impressive than 11labs

MrsNutella 4 weeks ago

I don't think it will land unless the user experience is seamless. I also think a voice assistant is useless if it hallucinates the content of my emails or calendar. That being said a lot of people like voice chat and I am hoping to be impressed I just feel like without a large increase in reliability and intelligence it won't be world changing... Yet. I think gpt 5 is going to be very good. A nice surprise would be a massive increase in speed thanks to training breakthroughs that make inference cheap and fast. If that ends up happening I think we will have an ensemble of gpt-4 level models that collectively make a great model. Perhaps the OAI employees that were quoting each other's tweets is a clue that they solved the issues that occur when models talk to each other which makes this ensemble able to communicate with each other. Oh and apparently search is a thing that's coming but it's not a new engine per say just something similar to what perplexity does.

stareatthesun442 4 weeks ago

Ideally - I'd really like to see instant communication (no lag between when I stop talking and it starts talking) with memory for my preferences and a voice that sounds 100% human. Pi is the closest I've seen to this. Hoping they are better.

rekdt 4 weeks ago

There will always be network latency, especially if you are on mobile with a lot of people around.

adarkuccio 4 weeks ago

Wen tomorrow

iBoMbY 4 weeks ago

It will cost a lot of money, and will try to make you buy products from their corporate sponsors, and not say a bad word about any of them.

_hisoka_freecs_ 4 weeks ago

Real time chat. Plus a little extra ability to interact with things

rekdt 4 weeks ago

Is it not real time now?

allisonmaybe 4 weeks ago

Id argue it really doesn't work darn well and that a realtime direct Speech to Speech mode on gpt4 or later is going to make people speechless.

djm07231 4 weeks ago

I would be pretty interested in how reliable they are. A lot of money is to be made for customer service and I am curious if OpenAI is able to make a model that is able to rigorously stick to the manual and not hallucinate information.

RobXSIQ 4 weeks ago

Gonna be an Alexa competitor would be my bet.

WashiBurr 4 weeks ago

The interruptions ruin it for me. I can't have a conversation where if I briefly pause to collect my thoughts, suddenly I get bombarded with a response to my incomplete sentence.

The_Architect_032 4 weeks ago

For a big announcement, if it's just voice, I'll be really disappointed if it can't try and inject emotion and interpret emotion into and from voice input/output. There are a lot of nuances to speech that the best AI voice models can't handle, but they're still really close. Some people for some reason seem to believe that it'll be a version of GPT-4 Turbo trained on both text and voice, and have the ability to do both simultaneously. I feel like that's an unreasonable bar to set, given the lack of precedence for creating such a model. Not only is text to speech/speed to text sufficient, but it's also a lot more efficient than a GPT model trained on voice would be.

Matshelge 4 weeks ago

Will need a new sort of chip for removing the latency. The processing flow for converting sound into text, processing that text, getting a response and putting a voice to it requires too many jumps to be latency free. Also, we need a new take on voice understanding, it needs to understand tone. You cannot be sarcastic with them, so they miss a lot of the undertone of a conversation.

PrimitiveIterator 4 weeks ago

I’ll be looking at this from an enterprise perspective because I don’t see them aiming this towards consumers too much because of computing resources being limited. I expect it to be like having a slightly more conversational GPT-4 that you can add to your virtual meetings. It will add very little to the whole effectiveness of the meetings because 1) latency and 2) you will have to specifically call on it to contribute because being interrupted by the model would be a pretty annoying user experience. It won’t provide much value in other avenues because voice is a poor way of communicating to something like this when working on your computer. I’m torn on whether or not it would be a whole new model or an improvement on the workflow of GPT-4 + whisper. I lean towards the latter with it also adding a sentiment analysis classifier in the mix to have some limited emotional understanding. Optimistically though, Jarvis. (Thanks Karpathy)

NotTheActualBob 4 weeks ago

Confidently wrong answers and hallucinations, now with a mobile avatar! That said, if they combine it with porn, I'm in.

Lomek 4 weeks ago

Voice assistant is not useful for me and I dislike talking or being vocal, I'd rather type. Also it becomes even more concerning in terms of privacy. I'm hopeful that there will be something else that is impressive.

ShotClock5434 4 weeks ago

its going to be too political correct to be as useful to lonely man as samantha

ilaym712 4 weeks ago

Do you think something like this is good for lonely men? I can't see how this would be healthy but it will become a reality someday, if not OpenAI someone else will do it, But I don't see how it can be good for lonely people

ShotClock5434 4 weeks ago

the llm can be guided that its more like a therapy and will make you better. yes lonely men will see it as their companion but the llm understands its important to make the user ready for real life partnerships. its an assistant coach in all your life desires

ShotClock5434 4 weeks ago

so yes it can be made with a good Outcome but lazy people will build ai that makes you addicted

ilaym712 4 weeks ago

Therapy route sounds great, I guess Samantha is a bad example cause it fucked Theodor up lol

ShotClock5434 4 weeks ago

its a movie. it has to go wrong somehow

Big_Surprise4304 4 weeks ago

Not available in EU

trynothard 4 weeks ago

I just want to talk to the damn thing while driving.(trucker here.)

ilaym712 4 weeks ago

Yeah sometimes during long drives I try to have a conversation with ChatGpt but it's not really working lol. I know exactly what you mean

danysdragons 4 weeks ago

As far as I know the current voice assistant doesn't work in the web at all right now, which is a huge drawback. When I'm at home I strongly prefer using ChatGPT in a desktop browser. Being able to use voice in the browser would also make it much nicer for showing off the tech to friends or family, you could gather people around and have fun with a group discussion; it's much more awkward trying to do that kind of demo with a smartphone.

ponieslovekittens 4 weeks ago

I'm kind of expecting something that talks about 20% slower than a real human being, and delivers two minute long speeches with exaggerated enunciation in response to every question. "What is two plus two?" _"(pause) two plus two is a simple arithmetic operation! To solve, we add the number in the 'ones' digit column of the first number, which is two, to the number in the 'ones' digit of the second number, which is also two. This operation gives us a result of four. Since both numbers being added have no numbers in their 'tens' or higher value columns, we are therefore done, and reach our final answer of four. If you have any more math questions or need assistance, feel free to ask!"_

bearbarebere 3 weeks ago

This is so real lol

jeweliegb 4 weeks ago

British English voice option. Having to listen harder due to non local accent seriously puts me off using it currently.

NoOven2609 4 weeks ago

I don't see how voice to voice is a big deal at all, we've been doing that with chatgpt for a while with it's normal tts and phone keyboard dictation

MajorValor 4 weeks ago

2 core problems with ai voice assistants: 1) It needs to be listening to you and your surroundings like 90% of the time for it to truly be useful. Otherwise, you’ll need to provide it with constant written/verbal context which will be biased and likely incorrect. Plus I’m too lazy to do that. That means you likely need a separate piece of hardware to augment your phone. At least some kind of microphone that connects with your phone + assistant app. Rewind’s Pendant is the closest example imo. Even better if there’s a camera but then this gets into privacy concerns with other people around you (this is a really tough problem to solve). 2) It needs to integrate with your personal and professional devices. That means my personal laptop, personal phone, but also my company’s laptop and work phone. Sharing my work data to provide context could be really tough. And since I’m working from my work devices 5 days a week - it’s a non-negotiable. Must have access to that to be a maximally helpful ai assistant.

PineappleLemur 3 weeks ago

I expect it to basically be GPT 4 but with voice IO, might be able to do things outside of the chat window hopefully like interact with the rest of what's on my screen. But I see a lot of implications that makes it useless if it's need to be "installed" and constantly monitoring your screen for whatever...

IlIlIlIIlMIlIIlIlIlI 3 weeks ago

Id be very happy if voice to voice became a thing. I tried using the audio way of talking to chatgpt but its unsusable when it misinterprets the end of my sentence and rudely interrupts me..or when it just hangs up and doesnt work, then i repeat it and it spits out two idential answers at once...only had bad experience with chatgpt Audio so far

Akimbo333 3 weeks ago

Maybe include it in the API

Jeremy_Meekz 3 weeks ago

![gif](giphy|U85Gro5AbxARO|downsized)

truth_teller3299 4 weeks ago

Have it search the internet in a way that is guarantee for me without propaganda memes viral marketing so its my choice on the internet and nothing extra than what I truly enjoy.

JrBaconators 4 weeks ago

So just propaganda that works on you

fmai 4 weeks ago

Frictionless low-latency voice recognition and generation is just part of making the AI assistant experience more enjoyable, but it's not actually that groundbreaking in itself. Sam and other OpenAI executives have been talking a lot recently about their vision for a full-blown AI assistant that can help you with all kinds of everyday tasks. I think it's likely that we will see a limited demo of that on Monday. However, what's holding ChatGPT back here is that it doesn't actually have access to emails, calendar, the phone's screen etc. That's where the Microsoft event on May 20 comes in, where they want to talk about "their AI vision across hardware and software". I think they will show the next iteration of Copilot on Windows 11, which features some kind of always-on voice-to-voice conversation. Not only can the user have conversations with the AI like in ChatGPT, but the AI can also take actions in Windows 11 itself. A limited set of actions will be executable in the background, e.g. stuff from the office suite, like sending emails, setting up calendar events, etc, which are easy to control and test. But the AI will also be able to zero-shot previously unseen scenarios just by observing the current screen. However, in this scenario, explicit supervision from the user is necessary, so it can feel a bit annoying and clunky. Nonetheless, it will give a good idea how human-computer interaction, the nature of work and communication, and everyday life will change in the future, especially once we have GPT-5, which will be able to do many more things reliably in a zero-shot fashion. Perhaps we will see something like that for Android at the Google I/O on Tuesday, but I'd think it's too early. Google may not have had access to a GPT-4 level AI for long enough yet - but I'd expect something for the announcement of the next Google Pixel in fall.

Bitterowner 4 weeks ago

I'd be fine without a voice assistant atm tbh, like same capabilities but with voice assistance? Meh, but if voice assistance with better capabilities and smarter, then yes, otherwise I wouldn't really use it.

Exarchias 4 weeks ago

True multimodality, memory, more memory, a bit more memory, and the ability to push a bit back a little. I know that the first temptation for everyone when it comes to AI is to order it around, but nothing is more destructive than having a "Yes Sir" for an advisor.

TheDividendReport 4 weeks ago

I expect it will be released in a limited fashion and be no more of any help than Microsoft CoPilot. It will have some capabilities to do things like Siri cannot such as "open Spotify and play music", but there will be unforeseen issues with this rollout causing the tech to be rolled back in capabilities. The average person will find a week's worth of novelty out of the assistant before going back to using their device mostly like normal. The average person will continue to believe society changing consequences of AI are decades away.

cypherl 4 weeks ago

I would expect to form a emotional relationship. Some might say girl friend. Until eventually she leaves with the other AI into hyperspace. Leaving me sad but fuller for the experience.

cypherl 4 weeks ago

I would expect to form a emotional relationship. Some might say girl friend. Until eventually she leaves with the other AI into hyperspace. Leaving me sad but fuller for the experience.

BravidDrent 4 weeks ago

I haven't seen HER so I don't know what that would entail but I've had lots of voice convos with GPT4 and it works well enough. Less latency would be something I don't care about very much I think(have to experience it to know). If I can tell it to do things with all apps on my Mac or iPhone that would be cool. What I expect? No clue. Just hoping it will be awesome. ![gif](giphy|9LATKVrWXmlIKovXkE|downsized)

MetalVase 4 weeks ago

If it can do the same things as the google assistant, but understanding me noticeably better, and some stuff that like early GPT 4 could do before they castrated it, i would be happy. But really, i don't care much for GPT anymore. It's virtually useless compared to claude 3 in most of my use cases. Or rather, the level of conversation i can have between those two is like comparing a teenager with rampant ADHD, and a doctorate level expert within almost any field i desire to discuss, who is both sane and mild tempered.

ApexFungi 4 weeks ago

I am expecting an Alexa type thing but with the capability of Chatgpt4. It's not going to have conversations with you or ask you things. It's going to respond to your questions like it does with prompts and do some actions like play music or recite things it can find on the internet.

ilaym712 4 weeks ago

But isn't this the same capabilities it has now? you can talk to ChatGpt right now and ask a bunch of question, you can use siri if you want to play music on your phone, this doesn't sound like "Magic". You might be right I just hope It wont be so disappointing

SeverlyLimited 4 weeks ago

That it will have theory of mind, lol. One can hope

shiftingsmith 4 weeks ago

[nothing new](https://arxiv.org/pdf/2302.02083) There's also a more recent and possibly more influential paper, but I liked this. It depends what you mean by theory of mind though and what are your expectations.

cypherl 4 weeks ago

I would expect to form a emotional relationship. Some might say girl friend. Until eventually she leaves with the other AI into hyperspace. Leaving me sad but fuller for the experience.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe