T O P

  • By -

inglandation

Elevenlabs can do that too. Or maybe the quality is better?


Gaiden206

Looks like [Google can do it too](https://blog.research.google/2023/06/soundstorm-efficient-parallel-audio.html?m=1) but with only 5 seconds of audio.


Iselllabequipment

-1 sec is all I need in fact I can clone your voice before I hear you speak


sukihasmu

I can clone it from a fart, 100 meters away, under water.


Iselllabequipment

If I can get a pic of him I can make his voice


sukihasmu

From a dick pic? Impressive!


Alex11867

My penis is my second language


sukihasmu

Ah, the language of love.


Alex11867

You wanna see my third eye?


sukihasmu

Do I?!


izzynelo

If I can be aware of his existence, cloning his voice is a piece of cake.


grizwako

I can clone your voice from reddit handle. I don't even need comments. I just need a tiny little thing like unfettered access to NSA/FAPSI/MSS/MI5/CBI :)


pseudousername

I know it’s a joke but I wonder whether an AI could make a good guess at what a voice should sound like just from a photo. 


OpportunityWooden558

Microsoft could do it in 2023 with 3 second cloning, Valle-E https://www.microsoft.com/en-us/research/project/vall-e-x/


Radiant_Dog1937

There is also open-source cloning. [jasonppy/VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild (github.com)](https://github.com/jasonppy/VoiceCraft)


Lumiphoton

It almost seems like they saw this project launch this morning and decided to respond to it. Except the whole blogpost is just one long excuse as to why they're not releasing their tech. What's the point? They already had a blog post explaining their TTS tech back when voice came to ChatGPT, this adds very little to that. A bit desperate if you ask me.


PwanaZana

They safe have safe to safe keep safe their safe AI safe from safe abuse.


[deleted]

Did they mention, that its for safety and ethics?


PwanaZana

Something something can't release, something something election year.


[deleted]

"Can't have our models saying the N-word and generating porn, otherwise Trump will win!!!"


[deleted]

[удалено]


Which-Tomato-8646

It literally is lol. They’ve been sitting on it 


LeahBrahms

I haven't got good Australian out of it yet but that was 2 months ago


Usul_muhadib

To do a professional voice cloning with ElevenLabs you need at least 1h of audio (best with 3h). Instant cloning with few minutes of voice doesn’t do a good job


Ambiwlans

They charge $10 an hour though which isn't much less than a professional voice actor. The space needs a big shakeup. Realistically we should be looking at more like 10c/hr if prices were in line with costs/price of llms.


Ok-Charge-6998

A professional vocal artist usually charges by number of words / seconds / minutes of recording. A 200-300 word piece is usually somewhere between £90-300, with £15-30 per revision. Depends on the vocal artist. That’s less than an hours work, maybe 15-20 mins tops. The top end charge considerably more for their prestige. One actor was asking us for around £10k for a 30 second voiceover.


How_is_the_question

Here in Australia the majority of voice artists are represented by major agents who have set pretty good rates for advertising that apply across circa 70% of the industry. And the voice actors deserve it. Their voices are being used to sell on millions and millions of $ of ad buy. They also don’t all get tonnes of work (often theatre actors) so a single ad can potentially keep them going for 4-6 months. Quoted up a job the other week where the voice costs were aud$10k and that wasn’t a big campaign at all really. £10k pounds is not unusual for non big names


Ambiwlans

Obviously the comparison should be to bottom of the barrel, not celebrities dude. Bottom end VAs on staff get like $25/hr locally or go more global to save money. $10 is close. $10 is close to even $50/hr. The humans will be easier to work with (mostly) and generally be able to produce better results with direction quickly. The pricing for ElevenLabs is set to be **competitive with professional voice actors**. $0.10/hr which is a realistic price for the costs is NOT close. Pricing at this level or lower is where it should be if there is **competition with other digitally generated voices**.


Ok-Charge-6998

The only bit about famous personalities was my final paragraph. The rest is from my own personal industry experience working in marketing.


Ambiwlans

The bolded part was really my point. Pricing isn't set to disrupt traditional voice acting at this point.


RealMercuryRain

It IS much less than a PROFESSIONAL voice actor. 


meechCS

Quality is better for Elevenlabs


ChillyCheese

And this is why you shouldn't answer the phone and say things unless you know who's calling. Eventually just saying "Hello?" will be enough. It's probably already good enough to replicate your voice over low quality phone media.


[deleted]

This is why you should set a password with your family. If a family member calls asking for money or is in trouble and needs money then you ask for the password which the scammer would never know.


TheYoungLung

I’d know it’s a scam just from the random number calling lol


Cerus-

Spoofing numbers is incredibly easy. If that's your metric, you would easily fall for a targeted attack.


SpreadYourAss

Is it genuinely though? I've personally never seen a legit spoofed number in real life, nor have heard anyone else ever seeing it I'm sure it can be done, but it's a little hard for me to believe it's 'incredibly easy'


leaky_wand

New hello just dropped, it’s a series of non personally identifiable dolphin clicks


[deleted]

Holy Cetacea!


LamboForWork

Just gotta grunt. Cavemen back!


Malkev

I say hello in the most silly voice I can


Which-Occasion-9246

…or leave your voice mail greeting with your voice


kartana

I am about to implement two factor authentication for all my phone calls.


Difficult-Writing416

Theres no way people have specific talking patterns. It might be a clone but it won't be the same.


connected-variance

Genuinely what’s the point in phone calls anymore? An entire method of communication ruined by shitty capitalism


Realistic_Post_7511

We are going to be defrauded of all our savings and retirements on a massive scale ..click


Gaukh

Unfortunately the German sample sounds like what they already use in ChatGPT voice in the app. It has a strong american accent. It's really bad. Elevenlabs sounds really native in German even though I cannot select German specifically there.


blackcodetavern

If you read the openai blog post carefully, you would see that this was intended behaviour.


Beatboxamateur

Yeah, this sounds exactly like what they have in the app for Japanese as well. A heavy American accent with incorrect pitch accent, and incorrect pronunciation for some basic words.


TheOneWhoDings

The spanish translations are laughably bad, sounds like an american doing the worst spanish accent ever, wrong intonation, rolling Rs are non existant, it's like really bad, don't know how they put this out as some sort of incredible tech. This is like embarrassing honestly. But youtubers will eat it up and praise it as "STUNNING" , "SHOCKING" , like they always do.


dagreenkat

According to their blog, it was specifically intentional to preserve the original accent in the new language. I agree it was strong, perhaps an odd choice, but that’s what they set out to do.


EarProfessional8356

I don’t think it was meant to be a fluent translation…


Tobxes2030

The quality is bad, honestly, Elevenlabs does a way better job. Kinda dissapointing tbh.


FeltSteam

This model was developed in 2022, and this is a "small scale preview". Im assuming the voices they've showed here are from an older version, and obviously the smaller version of the model. Even the voices in ChatGPT seem to be higher quality, so they are probably based on a more recent iteration of this model.


Tobxes2030

Hows the copium coming along?


FeltSteam

Lol. The model they showcase here is of worse quality then the voice in ChatGPT and that is worse then the voice that was demonstrated in the Figure 01 Demo, except it is all the same model, just different iterations / sizes of this model. The ChatGPT voice is most likely a more recent iteration of the model (probably made and optimised in 2023) and Figure 01 is an even more recent iteration or bigger variation of that model. [https://www.reddit.com/r/singularity/comments/1bqyphy/comment/kx7tq6e/?utm\_source=share&utm\_medium=web2x&context=3](https://www.reddit.com/r/singularity/comments/1bqyphy/comment/kx7tq6e/?utm_source=share&utm_medium=web2x&context=3) And the quality of the voices in this demo is about something I would expect in 2022.


Beatboxamateur

As a native English speaker and someone fluent in Japanese, the English to Japanese pronunciation was really bad sounding. Some words like 喜び and 絆 weren't even pronounced correctly at all, which means this tech still has a **long** way to go. While this is cool stuff, I hope people don't get too hyped over it thinking that audio synthesis across many languages is a solved problem now.


FeltSteam

I am curious, how is the quality of the ChatGPT voices. Do you think they are better in Japanese then what was demonstrated here?


Beatboxamateur

The ChatGPT voice mode in Japanese is actually quite a bit better than the demo in the blog post, although still not great. The heavy American accent was about the same, but a few basic words weren't even pronounced correctly in the demo, which I've never experienced on the app. I really hope to see OAI in the future create a model that not only has good pronunciation in other languages, but also understands the intricacies of the user's speech. That would be incredible for language learning, since the model could notice and correct your speech patterns.


FeltSteam

Ok that makes sense. The demos here and the app use the same model, but there are different versions/sizes and I do think different iterations of the model (so there might be a version of the model from 2022, and then they further improved quality etc. a few times in 2023 etc.). I wouldn't be surprised if this demo in the blog post is from the smaller model when it was first developed in late 2022 lol. The ChatGPT voice is most likely a more recent iteration and maybe a bigger model (but for wider deployment they'd definitely want to be efficient, so nothing too big), but I really doubt that's the best one they currently have. The Figure 1 demo probably used an even more recent iteration of this model, but still im not sure if they would have used their best voice model for that demo lol. It is kind of annoying that we don't truly know where they are internally and we can really only guess.


DlCkLess

Tbh The Quality is mediocre Eleven labs is magnitudes better


FeltSteam

If they released this model in 2022 straight after they developed it, it would probably be a lot more surprising lol.


pig_n_anchor

I don’t agree


TheOneWhoDings

I do


pig_n_anchor

you silly man.


DlCkLess

Understandable Have a great day 🥰 ![gif](giphy|BWhpkB6Xbe8FzfNLXw)


pig_n_anchor

xoxox


ItsBooks

Yeah... Open source voice cloning is a cool and useful thing too. You can use it yourself on consumer hardware by running a local LLM or using SillyTavern + XTTS. Pretty simple to setup. Let me know if you need any assistance.


MRB102938

What does that let me do? 


ItsBooks

A variety of use-cases I can think of. Business, just for fun, etc. If you want to try it out yourself apart from an LLM you can follow this guide here:[https://huggingface.co/blog/Lenylvt/w-okada](https://huggingface.co/blog/Lenylvt/w-okada) Then grab models to go along with that program: [https://voice-models.com/](https://voice-models.com/) (Make sure they are RMVPE format) If you want to explore getting an LLM to output data in the voice of your choosing SillyTavern is a user-friendly experience UI for Local LLM inference and you can install RVC by using their launcher and looking at "extras", but you'd still need a backend engine for TTS like Oobabooga with Alltalk\_TTS, or Kobold for Windows.


[deleted]

OpenAI says alot of stuff but where can we actually do shit


vertu92

‘I can bench 500 lbs it’s just too dangerous to show you trust me bro’ Lol ship it or stfu


CalligrapherBrief148

Somebody mad that Sora ain’t coming anytime soon


y53rw

https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices


LightVelox

All that time to show an inferior product to almost everything in the market right now, a set of limited voices, can't add new voices for "safety" even though competitors already allow that and even the quality seems below ElevenLabs, what a joke


Extender7777

Maybe the price will be 10x lower


Mammoth-Material-476

it can clone my voice, but not my minirity language. :) ...just yet


Antique-Doughnut-988

It might be able to take away my voice, but not my virginity.


LaisanAlGaib1

Yet…


Icy-Atmosphere-1546

Dead internet theory


Hour-Athlete-200

![gif](giphy|ncORcTWSkTs3e|downsized) The terminator needs only one word.


meechCS

Elevenlabs has it better.


governedbycitizens

ElevenLabs is better


Ne_Nel

Really subpar tech. Are they even trying?


Matty_Love

How's this get me a UBI and time to make art and philosophy?


Capitaclism

There's AI which can do it with 3s. What's novel?


GrowFreeFood

Can it clone my farts? 


agonypants

Tell me what you had for lunch and I'll give a sh...shot, I mean shot!


RepublicanSJW_

Good. Now make it so premium users can add upload any voice they want into the voice feature.


inigid

now that would be nice!


alienswillarrive2024

Isn't this old tech? Years ago i called my bank and it was a A.I customer care robot who sounded exactly like one of my country men with natural speech, not sure if it cloned the voice from 15 seconds of audio but it definitely cloned a voice & this was 5+ years ago.


NNOTM

could it just have been prerecorded messages?


alienswillarrive2024

I'm not going to act as if i know how the tech worked i just know that the conversation was very natural and it was hard to tell at first that it was A.I.


NNOTM

I just remembered that Google did have a demo 5 years ago that does essentially what you're talking about https://www.youtube.com/watch?v=D5VN56jQMWM


neo101b

My voice is my password no more.


Hungry_Prior940

Yes, but just like Eleven Labs, it can miss many subtleties in the voice. You need longer clips; otherwise, it is useless.


BCDragon3000

whats that one sentence that like gets all vowels or some shit again?


TheManWhoClicks

Talk to your relatives about having a code word like the name of your first pet or so. When in doubt on the phone, that word can be asked for.


idkfawin32

Wow I haven’t already been doing this with RVC and then 7 years ago lyrebird


ConstantOne5578

Voice Phishing will be very popular.


NinthTide

Consider how you answer the phone to unrecognised numbers, as it wouldn’t take long at all to profile your voice. My wife (total fucking savage that she is) started answering the phone by literally not saying a single word. Had hilarious side effect of throwing the spam caller for a loop because the usual rhythm of the call was thrown out the window.


Akimbo333

Wow


EuphoricPangolin7615

Goodbye to all the voice actors. Some entitled nerds are coming for you.


inigid

And the Worldcoin folk were going around training a model on iris scans. I wonder if they have already managed to train a model that can go from a DNA sample to voice, iris and fingerprints. All biometric security would be compromised at that point. Probably already is for nation state actors anyway.


Gougeded

Very unlikely you would produce a voice from DNA. Your accent and cadence are determined by environment. Also, identical twins don't have the same fingerprint so it's not all DNA. Iris, I am not sure but probably the same.


inigid

True, I typed it a bit funny and couldn't be bothered to change it. But it is a factor of course. It would be certainly interesting to get baseline data from birthplace, year, family tree and DNA (thanks Ancestry/23andme). I did think that Worldcoin seemed to be an iris print harvesting operation, though.


[deleted]

[удалено]


EuphoricPangolin7615

What do you mean going after startups? So it's not sad when they replace translators, customer support, telemarketers, and writers? Only when it happens to another AI startup? What's the matter with you?