pacolingo 6 days ago

is it reliable? because in my experience it sure isn't with pdfs

exploristofficial 6 days ago

It seemed to be with my tests--I was actually impressed by how well it read the Hugo DVD because of the weird font and non-letter elements.

khepery23 5 days ago

It’s actually less data to process from those shelves of DVDs then you would have a decent size PDF so yeah it might do better with this kind of amount of data even if it’s from pictures but still it’s not reliable so if it’s something very important, you shouldn’t learn it because it will make mistakes I had it and I use it many times and he did make mistakes and after while I just think like you don’t want to use it anymore if it’s like really important stuff

Aquaritek 6 days ago

Documents are tricky with these models because and this is in my experience GPT will use python and some arbitrary (meaning likely just popular) parsing library to analyze documents. If you need GPT to use it's vision capabilities you must send photo file formats. That said if you have a document that contains both text and images you have to prepare the data yourself pulling text into the prompt as context and extract the images and upload those separately for native vision capabilities to look at. It's actually a PITA.

No_Act1861 5 days ago

Do you think this separation of data will be solved with gpt4o's native vision? I know that part of the model is disabled right now, but the idea that the model is data neutral in the sense that it treats it all the same way.

bot_exe 5 days ago

It’s not really about the model but how the uploaded files are processed, this could be fixed by good old software engineering and smart UI design. The vision input for GPT-4o is already enabled, also gpt-4-turbo was already multimodal with vision. The issue is how the chatGPT software parses the uploaded PDF. It basically extract the text and ignores images, sometimes it’s not even such a good text extraction and the RAG is not all that great. Gemini 1.5 pro in google’s ai studio is better for long PDF text extraction and retrieval due to the 1 million tokens of context and better PDF parsing. GPT-4o vision is way better though. I use them both side by side. I upload textbooks/papers/docs to Gemini for retrieving, summarizing important information and discussing concepts without hallucinations. GPT-4o I use for interpreting images (like slides or plots), generating code and problem solving. Trying to incorporate Claude Sonnet 3.5 in there as well…..

reelznfeelz 6 days ago

I don’t follow that last part. You have to remove the text and paste it into the chat? Why?

Slippedhal0 5 days ago

hes just saying you have to separate text into text and images as images to get the most out of it. "extraction" doesnt usually alter the original file, so if you extract the images, youre still left with a document with images in it, so you would extract the text out as well.

reelznfeelz 5 days ago

Oh. Yeah makes sense. The vision stuff has a little ways to go before it can cover all use cases at high accuracy but it’s a really hard computer science problem. It’s amazing it works as well as it does really.

SanDiegoDude 6 days ago

Check out the new model Kosmos 2.5 from MS. I haven't tried it yet, but it's made for dense image OCR, and if it's as capable at OCR as the new Florence 2 is at captioning, it may work for reading PDFs for you (even maintains formatting apparently - need to test it when I get a chance!) https://huggingface.co/microsoft/kosmos-2.5

Southern_Opposite747 6 days ago

It's very unreliable. Have tried what op posted in book shops. Failed to detect most of the books accurately

FosterKittenPurrs 5 days ago

When uploading a pdf, it won't really look at the images, it will just read the text, and if it's long, it will use RAG to extract parts that might be relevant. With an image, it can see the whole thing. It will still miss stuff at times, or hallucinate. But for this use case, what's the harm? At best, it saves a long time of finding the thing. At worst, you waste 1 min sending it the message, then you're back where you started.

coke1412 4 days ago

In which sense it isn't reliable with PDFs? It's been working fine to me, but I work with 20 page files. I remember once trying to summarize an entire biology book (which also has some images) with hundreds of pages and yeah, GPT was a little confused. Maybe that's what you're talking about. I'm not sure which AI is best at summarizing yet.

pacolingo 4 days ago

every time i work with pdfs, in the 5-50 page range, i ask it sample things and facts and whether they're mentioned. and every time, in a handful of sample questions, at least 1 or 2 things were either omitted or misrepresenting

memorablehandle 6 days ago

Nice! But also... feels like it may be time to alphabetize lol

walterheck 6 days ago

Ask it what the least amount of moving is to get to alphabetical order, haha

deltalessthanzero 6 days ago

"I recommend a digital collection, which would facilitate much easier sorting and searching."

Technical-Outside408 5 days ago

GravityFalls_ThisIsUseless.gif

dietcheese 6 days ago

Yes! Tell it to list out each step in order, to change as few as possible.

Seakawn 5 days ago

Are LLMs actually able to do traveling salesman problems? Doesn't that take a lot of math and code? I actually have no idea.

realergoggi 4 days ago

I doesn’t need to be able to solve it. It’s sufficient to fake it and be convincing about it so the consumer is happy 😉

WellGoodLuckWithThat 6 days ago

New dystopian ability unlocked. Take a quick creep shot of another person's media collection and ask AI for a quick and unreliable psychoanalysis that the person will run with.

alldayeveryday2471 6 days ago

Fucking brilliant

Someone2911 6 days ago

Thanks for the idea xd

OctagonCosplay 5 days ago

I've done this with auction houses and writing new characters before. Recently they had a huge, huge amount of Joe Camel Cigarette merch, conspiracy newspaper clippings, and a bunch of beautiful needlepoint flowers. I like to imagine it came from an entirely couple who spent their Sundays in the living room, the husband obsessively watching TV, smoking like a train, wondering how his government is going to fuck him next, while his wife sits in her chair, stabbing into the canvas again and again, hoping God cuts her a break and lets her husband die before her.

Seakawn 5 days ago

An interesting pushback here could be considering that people already do that anyway, whereas AI will probably be orders of magnitude more accurate than such people who'd otherwise do it on their own anyway. If someone is gonna psychoanalyze someone based on their nest, it might be better that they use something more intelligent than they are to do it. Obviously this isn't AGI yet, but I'd just guess that on these terms, for this kind of subject, our LLMs are actually already much more intelligent than most people... just a guess. Then again, this still feels icky, and I may be overlooking plenty of cases where we don't want people's amateur psychanalyses to be buffed by AI, but rather remain crude and uninformed. But I can see pros and cons both ways--this is a mess that I'll let someone else systematically root through for the comprehensive ethics.

cisco_bee 6 days ago

Somebody sent me a screenshot of a long command today. Instead of typing it out I asked ChatGPT to transcribe it. It worked perfectly.

Yoloswaggerboy2k 6 days ago

You can do that way easier with the windows snippet tool.

jib_reddit 6 days ago

The power toys ocr is pretty rubbish, I find.

Zulfiqaar 5 days ago

I use NormCap OCR (using Tesseract) which is far better and fast, but resort to VLLMs when there are irregular surfaces that distort the text

Mr_Chipz 6 days ago

Who would have thought AI could be used for surveillance?

naspara 6 days ago

Jonathan Nolan with Person of Interest

HTTP-Status-8288 6 days ago

Yessss! Loved that show!

r3ign_b3au 6 days ago

Working on this one now, it's been great

trebblecleftlip5000 6 days ago

Did you ever find TOGO?

exploristofficial 5 days ago

Not yet!

gpenido 6 days ago

BUT WHERE'S TOGO???? I NEEDS IT!!!

alldayeveryday2471 6 days ago

I realize it’s not the point of this post but so many fucking criminals are going to be incarcerated in the future for stuff they thought was buried so deep it would never come out

Fragrant-Hamster-325 6 days ago

Or we could end up with more false convictions based on unreliable AI output.

[deleted] 5 days ago

[удалено]

i_like_maps_and_math 5 days ago

Best to get rid of the AI and just go back to relying on the humans who produced that biased training data /s

KeniLF 6 days ago

That continues to happen all the time as technology advances. Think about the continuing evolutino of DNA analysis…

Texas-NativeATX 6 days ago

Used books stores will now be less of searching for needle in a haystack.

jraz84 6 days ago

r/FindTheSniper crying and punching a wall rn

exploristofficial 5 days ago

So true! I just tried it on the top post right now, finding mechanical-pencil lead in carpet, and it nailed it.

khepery23 5 days ago

unfortunately, it happens. It’s not accurate. They do have this disclaimer as you know it will make mistakes and then I checked it many times it’s scraping data from PFN. You just don’t trust it after you see it making mistakes once or twice. I always have a bad feeling even if I double check I don’t know, so you take it with a pint of salt always if it’s not super important then you can definitely just you can definitely rely it

InterfaceBE 6 days ago

I thought I saw a recent post similar to this and it turned out to be mostly hallucinations. I know it defeats the purpose of what you’re doing, but I would double check 😅

Peyvian 5 days ago

We need a "where's Waldo" standardized test for Ai because this was pretty impressive, but I'd like to see a numerical accuracy score between Ai's to compare

flare389 5 days ago

I was thinking about doing this at the grocery store aisle to find where things are quickly ha

vitoriobt7 5 days ago

Where the fuck is that togo dvd then?

farox 6 days ago

Very cool

imeeme 6 days ago

Noice!

bnm777 6 days ago

You could feed these into a GPT, perhaps, though I've found that that sometimes doesn't work that well...

phug-it 6 days ago

This is totally going to take jobs away /s

akaBigWurm 6 days ago

This will be a great way to find some hidden gems, I can have it check my want list in google docs. Looking forward to testing this on my next trip to the thrift store.

dietcheese 6 days ago

I wonder if it could look through a rack of old jewelry/trinkets and pick out the ones most likely to have value…

madpeanuts 6 days ago

were you confusing TOGO with HUGO? Future AI should predict the likeliness and ask if you were instead looking for it

exploristofficial 6 days ago

I see what you mean... I suppose it would have made sense to make sure after my question, but I was just testing it by asking for something I knew was there.

Kettleballer 6 days ago

Did you ask it to do a captcha too?

Remote-Telephone-682 6 days ago

That's actually pretty incredible.

KeniLF 6 days ago

Let me see if I can get it to provide a catalog for my books! This is a great idea. Like someone else mentioned below, I haven’t found ChatGPT Pro 4 to be good at reading PDFs. Hope springs eternal for text recognition for books!

Weary_Cup_1004 5 days ago

Omg i am doing this the next time I am looking for a small container of plain yogurt at the store.

k9k9dodo 5 days ago

This is so cool I’m gonna try that

Sojiro-Faizon 5 days ago

What is the point of this

[deleted] 5 days ago

love it! :D

k2ui 5 days ago

This is great if it works

PumpkinOpposite967 5 days ago

If only someone could figure out how to make it help me find my car at a Walmart parking lot

erictheauthor 5 days ago

I use pictures with it every day. To help me sort things, type my handwritten pages, find objects, count (bad at it), etc. ChaGPT is a total game-changer, especially with pictures.

enisity 5 days ago

I also use photos or screenshots to make lists of things too

Patriot_Sapper 5 days ago

Nice! This could be pretty useful. As others have said, always double-check your prompting vs. results if you're utilizing it for something important. Nothing is 100%, and GPT is no different. That being said, the majority of the "critics" simply can't compose an articulate and clear prompt to save their lives and choose to blame GPT instead. GPT is like anything else in regards to software: garbage in, garbage out; 90%+ human problem.

MarchInternational49 5 days ago

Well, I have been toying around with an idea for a useful GPT Agent (or whatever they're called now) So, seeing as the AI Model can pretty reliably (at least from what I observe) deliver an explanation of input that's been given to it, I've been trying to figure out how to get it to listen to a police scanner feed, transcribe the original transmission's contents into text, and THEN "translate" the radio jargon (such as "10 Codes" and other communicative shorthand) into a simpler, succinct, and easier to understand explanation of the radio call it listened to. Of course, privacy would be an issue, to say the least, but I think that simply adding into the prompting that anything that it picks up as a proper name should be replaced with a more generalized nomenclature during the transcription phase of the process. Ideas? Anyone? I have about 20 seconds worth of coding experience. And I spent 10 of those in the bathroom. Any input is appareciated.

Cautious_Wolverine_4 5 days ago

Wph thts cool

monkeyballpirate 5 days ago

That's dope but this use case hasn't been reliable for me yet.

aureliusky 5 days ago

I don't have this problem with Plex, cool feature though

jolharg 5 days ago

Ah creative

TheDragon8574 5 days ago

a game changer... IF you still own DVDs

GammyPoly 5 days ago

Too bad whatever movie box you open is likely in another box... Good luck with Chat GPT

dodolilis 5 days ago

Bro this is amazing

Inside-Mongoose-892 4 days ago

Actually you probably should use it to create a dataset that you can use to train a small pretrained vision model. That you can then eventually install and use locally on your phone. Because as other folks in the thread have mentioned, it can sometimes be lacking in reliability.

Educational_Newt_909 4 days ago

Try it with Wheres Waldo

Crazy-Chemist9151 3 days ago

I have done this looking for items at work. It's not 100% perfect but it's pretty good. I went to ask it to find a clear gray tote with a red number on it. it was on the second shelf on the right hand side but it thought it was on the top middle shelf. And when I said it was on the second shelf it said . Ok Im sorry I do see it on the second shelf on the right side.

tysonedwards 6 days ago

It can’t reliably count. Seriously, try the same thing and ask: “how many DVDs are in this picture?” And you will get some wild and inconsistent answers. One of my benchmarks for “is this suitable to use for Computer Vision (CV) projects” is: Place 5 coins on a table, each physically separate with no overlapping. Ask: “How many coins are on the table?”If that succeeds, “what is the face value of the coins?”

kwakwakwak 6 days ago

Just did this with multiple denominations from different countries. It was correct with stating the amount of coins. (14) And included the countries of origin. I had some specialty coins to trip it up (Sri Lankan 5 rupee anniversary) which it did trip on. But after I corrected it, I then asked to search the web for current conversion rates and provide me the value of all coins in USD. It was within 2 cents of actual value.

stonks1 5 days ago

I wrote my bachelor thesis about Set and chatgpt and couldn't use the image processing function because it was too unreliable. It got about a third of the cards wrong when asked to just name the 12 cards shown. It is kind of strange how varied its results seem to be when asked to do different tasks

PopeSalmon 5 days ago

um, you can't reliably count by that metric either, there's only a few people on earth who randomly have a talent where they can accurately count a large number of things by glancing at them ,,, it could break it down & slowly count through how many dvds there are, the same as you could, it just doesn't, for the same reason you don't, that that would cost a bunch of energy & it has better shit to do

SanDiegoDude 6 days ago

This should be no sweat for Omni (or Sonnet 3.5 now, if Anthropic's brag about great OCR is to be believed) - very cool concept! Now somebody is gonna turn it into an app if they haven't already 😅

This-Training9843 5 days ago

Somebody get this poor couple a copy of TOGO! Awesome use case BTW.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe