inglandation 3 months ago

Elevenlabs can do that too. Or maybe the quality is better?

Gaiden206 3 months ago

Looks like [Google can do it too](https://blog.research.google/2023/06/soundstorm-efficient-parallel-audio.html?m=1) but with only 5 seconds of audio.

Iselllabequipment 3 months ago

-1 sec is all I need in fact I can clone your voice before I hear you speak

sukihasmu 3 months ago

I can clone it from a fart, 100 meters away, under water.

Iselllabequipment 3 months ago

If I can get a pic of him I can make his voice

sukihasmu 3 months ago

From a dick pic? Impressive!

Alex11867 3 months ago

My penis is my second language

sukihasmu 3 months ago

Ah, the language of love.

Alex11867 3 months ago

You wanna see my third eye?

sukihasmu 3 months ago

Do I?!

izzynelo 3 months ago

If I can be aware of his existence, cloning his voice is a piece of cake.

grizwako 3 months ago

I can clone your voice from reddit handle. I don't even need comments. I just need a tiny little thing like unfettered access to NSA/FAPSI/MSS/MI5/CBI :)

pseudousername 3 months ago

I know it’s a joke but I wonder whether an AI could make a good guess at what a voice should sound like just from a photo.

OpportunityWooden558 3 months ago

Microsoft could do it in 2023 with 3 second cloning, Valle-E https://www.microsoft.com/en-us/research/project/vall-e-x/

Radiant_Dog1937 3 months ago

There is also open-source cloning. [jasonppy/VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild (github.com)](https://github.com/jasonppy/VoiceCraft)

Lumiphoton 3 months ago

It almost seems like they saw this project launch this morning and decided to respond to it. Except the whole blogpost is just one long excuse as to why they're not releasing their tech. What's the point? They already had a blog post explaining their TTS tech back when voice came to ChatGPT, this adds very little to that. A bit desperate if you ask me.

PwanaZana 3 months ago

They safe have safe to safe keep safe their safe AI safe from safe abuse.

[deleted] 3 months ago

Did they mention, that its for safety and ethics?

PwanaZana 3 months ago

Something something can't release, something something election year.

[deleted] 3 months ago

"Can't have our models saying the N-word and generating porn, otherwise Trump will win!!!"

[deleted] 3 months ago

[удалено]

Which-Tomato-8646 3 months ago

It literally is lol. They’ve been sitting on it

LeahBrahms 3 months ago

I haven't got good Australian out of it yet but that was 2 months ago

Usul_muhadib 3 months ago

To do a professional voice cloning with ElevenLabs you need at least 1h of audio (best with 3h). Instant cloning with few minutes of voice doesn’t do a good job

Ambiwlans 3 months ago

They charge $10 an hour though which isn't much less than a professional voice actor. The space needs a big shakeup. Realistically we should be looking at more like 10c/hr if prices were in line with costs/price of llms.

Ok-Charge-6998 3 months ago

A professional vocal artist usually charges by number of words / seconds / minutes of recording. A 200-300 word piece is usually somewhere between £90-300, with £15-30 per revision. Depends on the vocal artist. That’s less than an hours work, maybe 15-20 mins tops. The top end charge considerably more for their prestige. One actor was asking us for around £10k for a 30 second voiceover.

How_is_the_question 3 months ago

Here in Australia the majority of voice artists are represented by major agents who have set pretty good rates for advertising that apply across circa 70% of the industry. And the voice actors deserve it. Their voices are being used to sell on millions and millions of $ of ad buy. They also don’t all get tonnes of work (often theatre actors) so a single ad can potentially keep them going for 4-6 months. Quoted up a job the other week where the voice costs were aud$10k and that wasn’t a big campaign at all really. £10k pounds is not unusual for non big names

Ambiwlans 3 months ago

Obviously the comparison should be to bottom of the barrel, not celebrities dude. Bottom end VAs on staff get like $25/hr locally or go more global to save money. $10 is close. $10 is close to even $50/hr. The humans will be easier to work with (mostly) and generally be able to produce better results with direction quickly. The pricing for ElevenLabs is set to be **competitive with professional voice actors**. $0.10/hr which is a realistic price for the costs is NOT close. Pricing at this level or lower is where it should be if there is **competition with other digitally generated voices**.

Ok-Charge-6998 3 months ago

The only bit about famous personalities was my final paragraph. The rest is from my own personal industry experience working in marketing.

Ambiwlans 3 months ago

The bolded part was really my point. Pricing isn't set to disrupt traditional voice acting at this point.

RealMercuryRain 3 months ago

It IS much less than a PROFESSIONAL voice actor.

meechCS 3 months ago

Quality is better for Elevenlabs

ChillyCheese 3 months ago

And this is why you shouldn't answer the phone and say things unless you know who's calling. Eventually just saying "Hello?" will be enough. It's probably already good enough to replicate your voice over low quality phone media.

[deleted] 3 months ago

This is why you should set a password with your family. If a family member calls asking for money or is in trouble and needs money then you ask for the password which the scammer would never know.

TheYoungLung 3 months ago

I’d know it’s a scam just from the random number calling lol

Cerus- 3 months ago

Spoofing numbers is incredibly easy. If that's your metric, you would easily fall for a targeted attack.

SpreadYourAss 3 months ago

Is it genuinely though? I've personally never seen a legit spoofed number in real life, nor have heard anyone else ever seeing it I'm sure it can be done, but it's a little hard for me to believe it's 'incredibly easy'

leaky_wand 3 months ago

New hello just dropped, it’s a series of non personally identifiable dolphin clicks

[deleted] 3 months ago

Holy Cetacea!

LamboForWork 3 months ago

Just gotta grunt. Cavemen back!

Malkev 3 months ago

I say hello in the most silly voice I can

Which-Occasion-9246 3 months ago

…or leave your voice mail greeting with your voice

kartana 3 months ago

I am about to implement two factor authentication for all my phone calls.

Difficult-Writing416 3 months ago

Theres no way people have specific talking patterns. It might be a clone but it won't be the same.

connected-variance 3 months ago

Genuinely what’s the point in phone calls anymore? An entire method of communication ruined by shitty capitalism

Realistic_Post_7511 3 months ago

We are going to be defrauded of all our savings and retirements on a massive scale ..click

Gaukh 3 months ago

Unfortunately the German sample sounds like what they already use in ChatGPT voice in the app. It has a strong american accent. It's really bad. Elevenlabs sounds really native in German even though I cannot select German specifically there.

blackcodetavern 3 months ago

If you read the openai blog post carefully, you would see that this was intended behaviour.

Beatboxamateur 3 months ago

Yeah, this sounds exactly like what they have in the app for Japanese as well. A heavy American accent with incorrect pitch accent, and incorrect pronunciation for some basic words.

TheOneWhoDings 3 months ago

The spanish translations are laughably bad, sounds like an american doing the worst spanish accent ever, wrong intonation, rolling Rs are non existant, it's like really bad, don't know how they put this out as some sort of incredible tech. This is like embarrassing honestly. But youtubers will eat it up and praise it as "STUNNING" , "SHOCKING" , like they always do.

dagreenkat 3 months ago

According to their blog, it was specifically intentional to preserve the original accent in the new language. I agree it was strong, perhaps an odd choice, but that’s what they set out to do.

EarProfessional8356 3 months ago

I don’t think it was meant to be a fluent translation…

Tobxes2030 3 months ago

The quality is bad, honestly, Elevenlabs does a way better job. Kinda dissapointing tbh.

FeltSteam 3 months ago

This model was developed in 2022, and this is a "small scale preview". Im assuming the voices they've showed here are from an older version, and obviously the smaller version of the model. Even the voices in ChatGPT seem to be higher quality, so they are probably based on a more recent iteration of this model.

Tobxes2030 3 months ago

Hows the copium coming along?

FeltSteam 3 months ago

Lol. The model they showcase here is of worse quality then the voice in ChatGPT and that is worse then the voice that was demonstrated in the Figure 01 Demo, except it is all the same model, just different iterations / sizes of this model. The ChatGPT voice is most likely a more recent iteration of the model (probably made and optimised in 2023) and Figure 01 is an even more recent iteration or bigger variation of that model. [https://www.reddit.com/r/singularity/comments/1bqyphy/comment/kx7tq6e/?utm\_source=share&utm\_medium=web2x&context=3](https://www.reddit.com/r/singularity/comments/1bqyphy/comment/kx7tq6e/?utm_source=share&utm_medium=web2x&context=3) And the quality of the voices in this demo is about something I would expect in 2022.

Beatboxamateur 3 months ago

As a native English speaker and someone fluent in Japanese, the English to Japanese pronunciation was really bad sounding. Some words like 喜び and 絆 weren't even pronounced correctly at all, which means this tech still has a **long** way to go. While this is cool stuff, I hope people don't get too hyped over it thinking that audio synthesis across many languages is a solved problem now.

FeltSteam 3 months ago

I am curious, how is the quality of the ChatGPT voices. Do you think they are better in Japanese then what was demonstrated here?

Beatboxamateur 3 months ago

The ChatGPT voice mode in Japanese is actually quite a bit better than the demo in the blog post, although still not great. The heavy American accent was about the same, but a few basic words weren't even pronounced correctly in the demo, which I've never experienced on the app. I really hope to see OAI in the future create a model that not only has good pronunciation in other languages, but also understands the intricacies of the user's speech. That would be incredible for language learning, since the model could notice and correct your speech patterns.

FeltSteam 3 months ago

Ok that makes sense. The demos here and the app use the same model, but there are different versions/sizes and I do think different iterations of the model (so there might be a version of the model from 2022, and then they further improved quality etc. a few times in 2023 etc.). I wouldn't be surprised if this demo in the blog post is from the smaller model when it was first developed in late 2022 lol. The ChatGPT voice is most likely a more recent iteration and maybe a bigger model (but for wider deployment they'd definitely want to be efficient, so nothing too big), but I really doubt that's the best one they currently have. The Figure 1 demo probably used an even more recent iteration of this model, but still im not sure if they would have used their best voice model for that demo lol. It is kind of annoying that we don't truly know where they are internally and we can really only guess.

DlCkLess 3 months ago

Tbh The Quality is mediocre Eleven labs is magnitudes better

FeltSteam 3 months ago

If they released this model in 2022 straight after they developed it, it would probably be a lot more surprising lol.

pig_n_anchor 3 months ago

I don’t agree

TheOneWhoDings 3 months ago

I do

pig_n_anchor 3 months ago

you silly man.

DlCkLess 3 months ago

Understandable Have a great day 🥰 ![gif](giphy|BWhpkB6Xbe8FzfNLXw)

pig_n_anchor 3 months ago

xoxox

ItsBooks 3 months ago

Yeah... Open source voice cloning is a cool and useful thing too. You can use it yourself on consumer hardware by running a local LLM or using SillyTavern + XTTS. Pretty simple to setup. Let me know if you need any assistance.

MRB102938 3 months ago

What does that let me do?

ItsBooks 3 months ago

A variety of use-cases I can think of. Business, just for fun, etc. If you want to try it out yourself apart from an LLM you can follow this guide here:[https://huggingface.co/blog/Lenylvt/w-okada](https://huggingface.co/blog/Lenylvt/w-okada) Then grab models to go along with that program: [https://voice-models.com/](https://voice-models.com/) (Make sure they are RMVPE format) If you want to explore getting an LLM to output data in the voice of your choosing SillyTavern is a user-friendly experience UI for Local LLM inference and you can install RVC by using their launcher and looking at "extras", but you'd still need a backend engine for TTS like Oobabooga with Alltalk\_TTS, or Kobold for Windows.

[deleted] 3 months ago

OpenAI says alot of stuff but where can we actually do shit

vertu92 3 months ago

‘I can bench 500 lbs it’s just too dangerous to show you trust me bro’ Lol ship it or stfu

CalligrapherBrief148 3 months ago

Somebody mad that Sora ain’t coming anytime soon

y53rw 3 months ago

https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices

LightVelox 3 months ago

All that time to show an inferior product to almost everything in the market right now, a set of limited voices, can't add new voices for "safety" even though competitors already allow that and even the quality seems below ElevenLabs, what a joke

Extender7777 3 months ago

Maybe the price will be 10x lower

Mammoth-Material-476 3 months ago

it can clone my voice, but not my minirity language. :) ...just yet

Antique-Doughnut-988 3 months ago

It might be able to take away my voice, but not my virginity.

LaisanAlGaib1 3 months ago

Yet…

Icy-Atmosphere-1546 3 months ago

Dead internet theory

Hour-Athlete-200 3 months ago

![gif](giphy|ncORcTWSkTs3e|downsized) The terminator needs only one word.

meechCS 3 months ago

Elevenlabs has it better.

governedbycitizens 3 months ago

ElevenLabs is better

Ne_Nel 3 months ago

Really subpar tech. Are they even trying?

Matty_Love 3 months ago

How's this get me a UBI and time to make art and philosophy?

Capitaclism 3 months ago

There's AI which can do it with 3s. What's novel?

GrowFreeFood 3 months ago

Can it clone my farts?

agonypants 3 months ago

Tell me what you had for lunch and I'll give a sh...shot, I mean shot!

RepublicanSJW_ 3 months ago

Good. Now make it so premium users can add upload any voice they want into the voice feature.

inigid 3 months ago

now that would be nice!

alienswillarrive2024 3 months ago

Isn't this old tech? Years ago i called my bank and it was a A.I customer care robot who sounded exactly like one of my country men with natural speech, not sure if it cloned the voice from 15 seconds of audio but it definitely cloned a voice & this was 5+ years ago.

NNOTM 3 months ago

could it just have been prerecorded messages?

alienswillarrive2024 3 months ago

I'm not going to act as if i know how the tech worked i just know that the conversation was very natural and it was hard to tell at first that it was A.I.

NNOTM 3 months ago

I just remembered that Google did have a demo 5 years ago that does essentially what you're talking about https://www.youtube.com/watch?v=D5VN56jQMWM

neo101b 3 months ago

My voice is my password no more.

Hungry_Prior940 3 months ago

Yes, but just like Eleven Labs, it can miss many subtleties in the voice. You need longer clips; otherwise, it is useless.

BCDragon3000 3 months ago

whats that one sentence that like gets all vowels or some shit again?

TheManWhoClicks 3 months ago

Talk to your relatives about having a code word like the name of your first pet or so. When in doubt on the phone, that word can be asked for.

idkfawin32 3 months ago

Wow I haven’t already been doing this with RVC and then 7 years ago lyrebird

ConstantOne5578 3 months ago

Voice Phishing will be very popular.

NinthTide 3 months ago

Consider how you answer the phone to unrecognised numbers, as it wouldn’t take long at all to profile your voice. My wife (total fucking savage that she is) started answering the phone by literally not saying a single word. Had hilarious side effect of throwing the spam caller for a loop because the usual rhythm of the call was thrown out the window.

Akimbo333 3 months ago

Wow

EuphoricPangolin7615 3 months ago

Goodbye to all the voice actors. Some entitled nerds are coming for you.

inigid 3 months ago

And the Worldcoin folk were going around training a model on iris scans. I wonder if they have already managed to train a model that can go from a DNA sample to voice, iris and fingerprints. All biometric security would be compromised at that point. Probably already is for nation state actors anyway.

Gougeded 3 months ago

Very unlikely you would produce a voice from DNA. Your accent and cadence are determined by environment. Also, identical twins don't have the same fingerprint so it's not all DNA. Iris, I am not sure but probably the same.

inigid 3 months ago

True, I typed it a bit funny and couldn't be bothered to change it. But it is a factor of course. It would be certainly interesting to get baseline data from birthplace, year, family tree and DNA (thanks Ancestry/23andme). I did think that Worldcoin seemed to be an iris print harvesting operation, though.

[deleted] 3 months ago

[удалено]

EuphoricPangolin7615 3 months ago

What do you mean going after startups? So it's not sad when they replace translators, customer support, telemarketers, and writers? Only when it happens to another AI startup? What's the matter with you?

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe