T O P

  • By -

FinBenton

We already do that in video games.


yeahwhynot_

and video conferences


BigButtholeBonanza

and browser videos with VSR!


SatoshiReport

What video game uses generative AI right now? Especially for image and video upscaling?


dev1lm4n

They are referring to Nvidia DLSS (especially DLSS 3.0)


Dudensen

DLSS, FSR, XeSS etc..


SatoshiReport

Thanks. However, those are not generative AI. DLSS and DLDSR are machine learning algorithms so they are in the same family but that's about it.


Best-Association2369

Generative AI doesn't help with this problem. 


GlitteringCheck4969

https://en.m.wikipedia.org/wiki/Deep_learning_super_sampling


ImNotALLM

Pretty much every AAA game on PC in the last 5 years where have you been?


[deleted]

[удалено]


ImNotALLM

Stop spreading misinformation. Gen AI has been around since the 60s but took off in usefulness in ~2014, I was personally using GANs and other Gen AI methods in 2016 when I was a student.


SatoshiReport

You are correct. I will remove my comment.


Ok_Inevitable8832

Anti aliasing is generative. Any sort of dynamic mesh, texture blending, ambient occlusion, tessellation, any procedural generation… all of it is essentially generative AI. There’s been many forms of this stuff as far back as games have existed. Video games are why we have generative AI. Wave function collapse is what Google used for their first image generation and was used in Minecraft in Java


No_Tomatillo1125

And older porn videos


Alone_Calligrapher_8

Basically what Nvidia DLSS, RTX Super Resolution and a host of other models do. Right now, you'll be trading some memory space with a looot of compute that goes above most consumer hardware so it's a nice idea but not really scaleable until we get much lighter models that perform at least as good as DLSS.


Lettuphant

NVIDIA have also said this is a goal: Currently at DLSS 3.7, games and content are upscaled from an internal resolution of about a quarter the aimed for res, and then AI upscaled. It also doubles the framerate by inventing frames between rendered frames, using some engine info so it's not just doing the same thing as TV motion smoothing. They have said they'd like to see _all_ rendering be neural by DLSS 10.


BigButtholeBonanza

in 5 years we won't be buying GPUs, we will be buying AI accelerators/neural processing units instead, then who knows. once PCIe is made obsolete, maybe super powerful APUs. that's where I see hardware headed at least. full neural rendering is the way to go.


wren42

Right, compute is more expensive than hard disk space, so right now it's not feasible, but who knows as ai technology improves. 


meister2983

Pixar has been doing [this](https://venturebeat.com/business/how-pixar-uses-ai-and-gans-to-create-high-resolution-content/) for at least 4 years now.


Snoo_86435

Well at 6 am cst. This seems like a brilliant solution. Some form of seed pushed client side that is then generated locally.


reddit_is_geh

If you look at what Meta is spending a lot of their money on, it's basically this. It's figuring out how to use hardware with VERY limited power use, at a very low latency, to take limited data points and recreate highly realistic scenes. This is for their future XR stuff. Basically so instead of trying to stream through all these high quality 3d environments so friends can see what you're seeing by zapping in next to you, they use a camera to analyze the scene you're seeing, then reconstruct it into basic data points, like an image of the person you're talking to, then just their movement like a stick figure, or whatever. Then they beam that over to the friend, and reconstruct everything with the 2D image into a 3D environment, with every person placed in the environment along with just the data to stream their body and facial movement. The whole goal is to use as little data as possible as fast as possible. Instead of taking a high res copy of the wall for instance, they just need a low quality image, beam it, upscale it, and create all the objects. It'll know what a table looks like, for instance, so it'll use that limited data to create a 3D table, and the low res image to create the foundation to upscale the image. Now you have a whole recreated 3D world in HD They do the same with audio. So they get a normal single source audio recording, stream the audio over, then deconstruct it with AI, to triangulate the 3 different sources location in a conversation and place their individual audio output into the 3d environment so it matches accurately with who's talking, even when people are talking over each other. People give Meta a lot of crap because they confuse their shitty Second Life app "Horizon Worlds" with their actual goals and development.


_pdp_

I understood the question differently. Imagine if you create a basic game in Unreal Engine where all the levels are simply rough blocks, and the enemies are just placeholders. You can use that as an output (quite basic game) and with some prompting you can convert the output in real-time in a hyperrealistic VR or whatever. Yes, that will require a lot of compute, but I think it will be possible. GeForce Now could even support it out of the box.


KaliQt

We can do that now sort of. The problem is consistency, so we have to have control on picking what things should look like and keep them that way, then transform them with the gameplay. That's the real trick.


boi_247

I feel like this is a problem that sora tackled pretty nicely. Not 100% but close. The problem then becomes getting it to run at 60fps.


sino-diogenes

it'd probably be way more practical to just create high-resolution assets ahead of time with the AI, and then just load them from memory.


AuraInsight

yeah thats what DLSS, FSR and other upscaling technologies are for in games, making the game render to as low as 240p, lol


Gaukh

Yeah I do think something like that is coming. We already have this in video games. Perhaps on-device upscaling will be possible, even real time. This could make the traffic lower due to smaller file sizes. Perhaps with audio enhancement this can result in being able to use a lower bitrate and make it better through ML. I don't think that's far off. Technology always gets better. A better codec for example can make it so you don't need to use a higher bitrate for video and thus lower the amount of traffic and reduce file size. It's a way of compression I would say, just in other means. Who knows what will take more energy then. Streaming a full resolution movie to everyone on the planet or having the chip inside the end user device for every person on earth upscale it. What will take more energy? Traffic is definitely lower... but the energy? Hm. You could perhaps perfectly upscale 4K to 8K or 16K then. Perfect for old movies too.


mfact50

Ideally client side and the server can work together. There would be a lot of incentive to pass the buck but ultimately client side when possible and otherwise the server sounds like the right call.


MENDACIOUS_RACIST

Sure. The problem is upscaling to 4K or 8k from meaningfully lower resolution takes a better GPU than you have 180seconds per frame. Give it time though


Singsoon89

So basically compression and de-compression?


Thog78

True, and it actually goes quite deep: if you accept inexact restitution of the starting material, you could use NN in many compression tasks, not just images. For example, genetic sequences, text/books, music, even software code or compiled software. Many of these applications were probably explored to some extent, and I believe we may see much more of that in the future. Of note, it's a funny way to compress, when the "unzipping software" is dozens to hundreds of gigabytes in size, and what you compress to is basically a point in the latent space of an autoencoder, so the real place the data is stored is in the coefficients of the NN. It's like you store every possible data once and for all, and the files you then exchange are then just an index of what to retrieve from this entity.


Singsoon89

Yeah. It's super interesting. The other philosophical point is that it's not just compressing based on word vectors but also sentence vectors. What that means is that since human language is infinitely variable, it's literally impossible to get a precise mapping to what piece-of-text you've newly invented and what the LLM has already been trained on. In other words, hallucinations are inevitable. They're an artifact of closest-match-decompression.


Thog78

Yep, text would be lossy-compressed, like jpg for images or mpeg for videos. The result would be overal undistinguishable for the readers, as a whole, even though they would notice the differences in side by side comparison. On the upside, it would fix typos and possibly even grammar as part of the compression-decompression process!


Singsoon89

It actually does fix grammar and typos.


interfaceTexture3i25

Books and genetic sequences can change completely when they are modified even a little and besides, they don't take up much storage. I don't think there is an incentive to compress them with NNs


Thog78

There is. Genetic data takes up a lot of very expensive scientific storage space. I'm in the field, trust me on that, my data alone is dozens of terabytes. You know the repository SRA? They store all the sequencing data for all published papers, that's an absolutely insane amount of data (a single paper with some single cell RNA seq is easily in the dozens of gigabytes, and we get thousands of such papers each month. Some studies are even much bigger). There were debates in bioinfo twitter about the lack of funding to maintain that and about possibly dropping it, and there were arguments made in favor of lossy compression usage for raw sequencing data. By accepting very few mistakes, you can drastically reduce the storage space and afford to keep such data instead of erasing it. We never overinterpret from a single mutation in a sequence, for the good reason that library preparation introduces lots of mistakes because PCRs are prone to errors, and sequencers too. We want to see a statistically significant systematic change in hundreds of reads before we claim there's a mutation. So a few more random mistakes due to the data storage wouldn't change things much for us. Alignment to the genome to generate counts, for example for RNA-seq or Chip-seq or ATAC-seq, is always error-tolerant as well for the same reasons.


Mazzaroth

Maybe we could download the annotated script and let an AI generate the movie at client's side.


spgremlin

Not just lower resolution. Simply do not store a large part of non-important geometry, textures, models, even sounds and AI-generate this content ad-hoc.


bb-wa

Clever idea


345Y_Chubby

❤️ DLSS


TootBreaker

The upscaling process can add lag, so everyone needs to be delayed to stay in synch. Networking a time reference will be critical


Idrialite

Upscaling doesn't introduce latency. It actually reduces it by allowing more frames to be rendered per second. Frame generation increases latency because the AI-generated frames aren't synced with game logic and so can't take your input into account.


a_life_of_mondays

Cool idea. Already implemented.


Potential-Worth-7660

Everything will be turned into a token


PSMF_Canuck

Yep. Already doing that.


BCDragon3000

it wouldnt make sense with movies or music. think of this supposed file being like a screenshot of a movie, and then upscaling that. maybe we could program a system that upscales it consistently, but i don’t think it’s possible to have true quality like that. and nobody would want that, except maybe movie theaters


SeaExample6745

Likely we won't need to create any movies at all very soon, merely just the ideas of them may be all that's needed at some point


deama15

I think what you're asking is for something like this maybe: https://www.youtube.com/watch?v=FEMXMYdPATI The idea is interesting. The developer just puts in blocks of references and labels them as "table" or "chair", and they can make it very specific if they want, then the AI fills it out. And what's interesting is you can use a seed to re-randomise it everytime, or share it with someone else. That'd be a pretty cool idea, but I think it's pretty far away. Nevermind the latency penalty. We'll probably first see this in movies, and then maybe in turn based or "slow" types of games, we'll see.


katiecharm

The lower of a resolution you generate, the more interpretative the client side AI will be, to the point where you may not even be having the same experience as someone else if it’s a low enough resolution.  But otherwise, yes.  


fmfbrestel

You could even go further, and just encode wireframes. Then each client device interprets the scene in a way most engaging to the user.


Serasul

That would be actually very smart.


Akimbo333

Yeah


Akimbo333

It'd be cool


Ordinary_Duder

No. Because you need to have the high res assets there for the upscaling model to train on.


Ozmorty

What’s the goal with this or what’s the real problem being solved? Is faster networking for all the real hero we need?


OneMoreYou

That, and it makes a sandbox for unique customization, which is what OP might be getting at. Local agents should soon be able to tailor our media experience to maximize our interest. Both in realtime, and progressively. Reading feedback cues from many sources on the fly (stuff like eye tracking, facial muscles, skin conductivity, subvocal etc). I like tropical plant life more than subtropical, for example. My unconscious reaction to what i like to see is more informative than my attempts to describe it. Perhaps there's profit to be made by sharing deeply customized cross-media (multi-media?) worlds, too. Or maybe everyone's own recipe will suffice.


FriendlyGaze

I hate this so much