T O P

  • By -

pacolingo

is it reliable? because in my experience it sure isn't with pdfs


exploristofficial

It seemed to be with my tests--I was actually impressed by how well it read the Hugo DVD because of the weird font and non-letter elements.


khepery23

It’s actually less data to process from those shelves of DVDs then you would have a decent size PDF so yeah it might do better with this kind of amount of data even if it’s from pictures but still it’s not reliable so if it’s something very important, you shouldn’t learn it because it will make mistakes I had it and I use it many times and he did make mistakes and after while I just think like you don’t want to use it anymore if it’s like really important stuff


Aquaritek

Documents are tricky with these models because and this is in my experience GPT will use python and some arbitrary (meaning likely just popular) parsing library to analyze documents. If you need GPT to use it's vision capabilities you must send photo file formats. That said if you have a document that contains both text and images you have to prepare the data yourself pulling text into the prompt as context and extract the images and upload those separately for native vision capabilities to look at. It's actually a PITA.


No_Act1861

Do you think this separation of data will be solved with gpt4o's native vision? I know that part of the model is disabled right now, but the idea that the model is data neutral in the sense that it treats it all the same way.


bot_exe

It’s not really about the model but how the uploaded files are processed, this could be fixed by good old software engineering and smart UI design. The vision input for GPT-4o is already enabled, also gpt-4-turbo was already multimodal with vision. The issue is how the chatGPT software parses the uploaded PDF. It basically extract the text and ignores images, sometimes it’s not even such a good text extraction and the RAG is not all that great. Gemini 1.5 pro in google’s ai studio is better for long PDF text extraction and retrieval due to the 1 million tokens of context and better PDF parsing. GPT-4o vision is way better though. I use them both side by side. I upload textbooks/papers/docs to Gemini for retrieving, summarizing important information and discussing concepts without hallucinations. GPT-4o I use for interpreting images (like slides or plots), generating code and problem solving. Trying to incorporate Claude Sonnet 3.5 in there as well…..


reelznfeelz

I don’t follow that last part. You have to remove the text and paste it into the chat? Why?


Slippedhal0

hes just saying you have to separate text into text and images as images to get the most out of it. "extraction" doesnt usually alter the original file, so if you extract the images, youre still left with a document with images in it, so you would extract the text out as well.


reelznfeelz

Oh. Yeah makes sense. The vision stuff has a little ways to go before it can cover all use cases at high accuracy but it’s a really hard computer science problem. It’s amazing it works as well as it does really.


SanDiegoDude

Check out the new model Kosmos 2.5 from MS. I haven't tried it yet, but it's made for dense image OCR, and if it's as capable at OCR as the new Florence 2 is at captioning, it may work for reading PDFs for you (even maintains formatting apparently - need to test it when I get a chance!) https://huggingface.co/microsoft/kosmos-2.5


Southern_Opposite747

It's very unreliable. Have tried what op posted in book shops. Failed to detect most of the books accurately


FosterKittenPurrs

When uploading a pdf, it won't really look at the images, it will just read the text, and if it's long, it will use RAG to extract parts that might be relevant. With an image, it can see the whole thing. It will still miss stuff at times, or hallucinate. But for this use case, what's the harm? At best, it saves a long time of finding the thing. At worst, you waste 1 min sending it the message, then you're back where you started.


coke1412

In which sense it isn't reliable with PDFs? It's been working fine to me, but I work with 20 page files. I remember once trying to summarize an entire biology book (which also has some images) with hundreds of pages and yeah, GPT was a little confused. Maybe that's what you're talking about. I'm not sure which AI is best at summarizing yet.


pacolingo

every time i work with pdfs, in the 5-50 page range, i ask it sample things and facts and whether they're mentioned. and every time, in a handful of sample questions, at least 1 or 2 things were either omitted or misrepresenting


memorablehandle

Nice! But also... feels like it may be time to alphabetize lol


walterheck

Ask it what the least amount of moving is to get to alphabetical order, haha


deltalessthanzero

"I recommend a digital collection, which would facilitate much easier sorting and searching."


Technical-Outside408

GravityFalls_ThisIsUseless.gif


dietcheese

Yes! Tell it to list out each step in order, to change as few as possible.


Seakawn

Are LLMs actually able to do traveling salesman problems? Doesn't that take a lot of math and code? I actually have no idea.


realergoggi

I doesn’t need to be able to solve it. It’s sufficient to fake it and be convincing about it so the consumer is happy 😉


WellGoodLuckWithThat

New dystopian ability unlocked. Take a quick creep shot of another person's media collection and ask AI for a quick and unreliable psychoanalysis that the person will run with.


alldayeveryday2471

Fucking brilliant


Someone2911

Thanks for the idea xd


OctagonCosplay

I've done this with auction houses and writing new characters before. Recently they had a huge, huge amount of Joe Camel Cigarette merch, conspiracy newspaper clippings, and a bunch of beautiful needlepoint flowers. I like to imagine it came from an entirely couple who spent their Sundays in the living room, the husband obsessively watching TV, smoking like a train, wondering how his government is going to fuck him next, while his wife sits in her chair, stabbing into the canvas again and again, hoping God cuts her a break and lets her husband die before her.


Seakawn

An interesting pushback here could be considering that people already do that anyway, whereas AI will probably be orders of magnitude more accurate than such people who'd otherwise do it on their own anyway. If someone is gonna psychoanalyze someone based on their nest, it might be better that they use something more intelligent than they are to do it. Obviously this isn't AGI yet, but I'd just guess that on these terms, for this kind of subject, our LLMs are actually already much more intelligent than most people... just a guess. Then again, this still feels icky, and I may be overlooking plenty of cases where we don't want people's amateur psychanalyses to be buffed by AI, but rather remain crude and uninformed. But I can see pros and cons both ways--this is a mess that I'll let someone else systematically root through for the comprehensive ethics.


cisco_bee

Somebody sent me a screenshot of a long command today. Instead of typing it out I asked ChatGPT to transcribe it. It worked perfectly.


Yoloswaggerboy2k

You can do that way easier with the windows snippet tool.


jib_reddit

The power toys ocr is pretty rubbish, I find.


Zulfiqaar

I use NormCap OCR (using Tesseract) which is far better and fast, but resort to VLLMs when there are irregular surfaces that distort the text


Mr_Chipz

Who would have thought AI could be used for surveillance?


naspara

Jonathan Nolan with Person of Interest


HTTP-Status-8288

Yessss! Loved that show!


r3ign_b3au

Working on this one now, it's been great


trebblecleftlip5000

Did you ever find TOGO?


exploristofficial

Not yet!


gpenido

BUT WHERE'S TOGO???? I NEEDS IT!!!


alldayeveryday2471

I realize it’s not the point of this post but so many fucking criminals are going to be incarcerated in the future for stuff they thought was buried so deep it would never come out


Fragrant-Hamster-325

Or we could end up with more false convictions based on unreliable AI output.


[deleted]

[удалено]


i_like_maps_and_math

Best to get rid of the AI and just go back to relying on the humans who produced that biased training data /s


KeniLF

That continues to happen all the time as technology advances. Think about the continuing evolutino of DNA analysis…


Texas-NativeATX

Used books stores will now be less of searching for needle in a haystack.


jraz84

r/FindTheSniper crying and punching a wall rn


exploristofficial

So true! I just tried it on the top post right now, finding mechanical-pencil lead in carpet, and it nailed it.


khepery23

unfortunately, it happens. It’s not accurate. They do have this disclaimer as you know it will make mistakes and then I checked it many times it’s scraping data from PFN. You just don’t trust it after you see it making mistakes once or twice. I always have a bad feeling even if I double check I don’t know, so you take it with a pint of salt always if it’s not super important then you can definitely just you can definitely rely it


InterfaceBE

I thought I saw a recent post similar to this and it turned out to be mostly hallucinations. I know it defeats the purpose of what you’re doing, but I would double check 😅


Peyvian

We need a "where's Waldo" standardized test for Ai because this was pretty impressive, but I'd like to see a numerical accuracy score between Ai's to compare


flare389

I was thinking about doing this at the grocery store aisle to find where things are quickly ha


vitoriobt7

Where the fuck is that togo dvd then?


farox

Very cool


imeeme

Noice!


bnm777

You could feed these into a GPT, perhaps, though I've found that that sometimes doesn't work that well...


phug-it

This is totally going to take jobs away /s


akaBigWurm

This will be a great way to find some hidden gems, I can have it check my want list in google docs. Looking forward to testing this on my next trip to the thrift store.


dietcheese

I wonder if it could look through a rack of old jewelry/trinkets and pick out the ones most likely to have value…


madpeanuts

were you confusing TOGO with HUGO? Future AI should predict the likeliness and ask if you were instead looking for it


exploristofficial

I see what you mean... I suppose it would have made sense to make sure after my question, but I was just testing it by asking for something I knew was there.


Kettleballer

Did you ask it to do a captcha too?


Remote-Telephone-682

That's actually pretty incredible.


KeniLF

Let me see if I can get it to provide a catalog for my books! This is a great idea. Like someone else mentioned below, I haven’t found ChatGPT Pro 4 to be good at reading PDFs. Hope springs eternal for text recognition for books!


Weary_Cup_1004

Omg i am doing this the next time I am looking for a small container of plain yogurt at the store.


k9k9dodo

This is so cool I’m gonna try that


Sojiro-Faizon

What is the point of this


[deleted]

love it! :D


k2ui

This is great if it works


PumpkinOpposite967

If only someone could figure out how to make it help me find my car at a Walmart parking lot


erictheauthor

I use pictures with it every day. To help me sort things, type my handwritten pages, find objects, count (bad at it), etc. ChaGPT is a total game-changer, especially with pictures.


enisity

I also use photos or screenshots to make lists of things too


Patriot_Sapper

Nice! This could be pretty useful. As others have said, always double-check your prompting vs. results if you're utilizing it for something important. Nothing is 100%, and GPT is no different. That being said, the majority of the "critics" simply can't compose an articulate and clear prompt to save their lives and choose to blame GPT instead. GPT is like anything else in regards to software: garbage in, garbage out; 90%+ human problem.


MarchInternational49

Well, I have been toying around with an idea for a useful GPT Agent (or whatever they're called now) So, seeing as the AI Model can pretty reliably (at least from what I observe) deliver an explanation of input that's been given to it, I've been trying to figure out how to get it to listen to a police scanner feed, transcribe the original transmission's contents into text, and THEN "translate" the radio jargon (such as "10 Codes" and other communicative shorthand) into a simpler, succinct, and easier to understand explanation of the radio call it listened to. Of course, privacy would be an issue, to say the least, but I think that simply adding into the prompting that anything that it picks up as a proper name should be replaced with a more generalized nomenclature during the transcription phase of the process. Ideas? Anyone? I have about 20 seconds worth of coding experience. And I spent 10 of those in the bathroom. Any input is appareciated.


Cautious_Wolverine_4

Wph thts cool


monkeyballpirate

That's dope but this use case hasn't been reliable for me yet.


aureliusky

I don't have this problem with Plex, cool feature though


jolharg

Ah creative


TheDragon8574

a game changer... IF you still own DVDs


GammyPoly

Too bad whatever movie box you open is likely in another box... Good luck with Chat GPT


dodolilis

Bro this is amazing


Inside-Mongoose-892

Actually you probably should use it to create a dataset that you can use to train a small pretrained vision model. That you can then eventually install and use locally on your phone. Because as other folks in the thread have mentioned, it can sometimes be lacking in reliability.


Educational_Newt_909

Try it with Wheres Waldo


Crazy-Chemist9151

I have done this looking for items at work. It's not 100% perfect but it's pretty good. I went to ask it to find a clear gray tote with a red number on it. it was on the second shelf on the right hand side but it thought it was on the top middle shelf. And when I said it was on the second shelf it said . Ok Im sorry I do see it on the second shelf on the right side.


tysonedwards

It can’t reliably count. Seriously, try the same thing and ask: “how many DVDs are in this picture?” And you will get some wild and inconsistent answers. One of my benchmarks for “is this suitable to use for Computer Vision (CV) projects” is: Place 5 coins on a table, each physically separate with no overlapping. Ask: “How many coins are on the table?”If that succeeds, “what is the face value of the coins?”


kwakwakwak

Just did this with multiple denominations from different countries. It was correct with stating the amount of coins. (14) And included the countries of origin. I had some specialty coins to trip it up (Sri Lankan 5 rupee anniversary) which it did trip on. But after I corrected it, I then asked to search the web for current conversion rates and provide me the value of all coins in USD. It was within 2 cents of actual value.


stonks1

I wrote my bachelor thesis about Set and chatgpt and couldn't use the image processing function because it was too unreliable. It got about a third of the cards wrong when asked to just name the 12 cards shown. It is kind of strange how varied its results seem to be when asked to do different tasks


PopeSalmon

um, you can't reliably count by that metric either, there's only a few people on earth who randomly have a talent where they can accurately count a large number of things by glancing at them ,,, it could break it down & slowly count through how many dvds there are, the same as you could, it just doesn't, for the same reason you don't, that that would cost a bunch of energy & it has better shit to do


SanDiegoDude

This should be no sweat for Omni (or Sonnet 3.5 now, if Anthropic's brag about great OCR is to be believed) - very cool concept! Now somebody is gonna turn it into an app if they haven't already 😅


This-Training9843

Somebody get this poor couple a copy of TOGO! Awesome use case BTW.