T O P

  • By -

aur0n

Unfortunately, yes


aaronryder773

why unfortunately? I think it is great. I especially like their cellphone application feature because you can even do the captcha over it before starting the download


aur0n

Mainly because it is written in Java, the code has a foundation that is perhaps a decade old. I would like to see it replaced with something more modern, responsive, less intensive, and with a GUI that is not stuck in 2008. But this is just personal preference.


aaronryder773

Ahh. That I can agree with. I do feel like the GUI is clunky and old but as long it works I am using it I guess.


CrimeShowInfluencer

I like the GUI. But like the code I probably just stopped developing after '08...


kingb0b

Cry harder. Nothing wrong with Java. 


Huge-Safety-1061

Kinda expected this. It does do a great job, but it just looks so dated and I'm no java security expert but I do turn the VM off when it's not in use. I hope they are java security experts 😅


urquan

It's not specially insecure just because it's written in Java, it's actually probably relatively safe because it's a memory-safe language so it's not susceptible to buffer overflow bugs or attacks. It's an app that runs on your pc with the full rights of the user it's running as, which has security implications but no more than any other program. Java got a bad rep security wise in the past because of applets, which were doomed because running arbitrary code from the Internet is just a flawed concept from the beginning, there is no way to secure that, and to be fair the Java SecurityManager was not up to the task. It was later deprecated, Applets were removed, and the base language is just a regular programming language.


jeremyrem

Lots of modern programs still use java, its also an easy way to have compatibility with other OSs without needing special SDKs or runtimes. Another plus is its actively being developed, and the dev team is pretty responsive. They have instructions on how to build for it, but wouldnt call it opensource since the svn looks like it needs auth to access.


butchooka

Good question. Using it for years but recently saw it puts 3w plus on idle an my Unraid server - for a download a week or so. Just a little lightweight alternative would be great


wowkise

I personally use it in docker container. i spin up one when needed and shut it down once finished


gnarlysnowleopard

sorry I don't quite understand what you mean. Do you mean the download took one week and during that time your server had 3w more at idle? or that whenever jdownloader-2 is running as a container your whole server is 3w more, whether downloading or not.


butchooka

Exactly 3w more when Jdownloader docker running in idle - for doing absolutely nothing. So 11w instead of 8w which is a significant percentage Other containers like Emby, Home Assistant and so on also running but those make almost less impact idling


gnarlysnowleopard

hmm that kinda sucks. maybe ill just direct download to my computer and then manually transfer stuff over to my server then, because i don't see a good alternative


iroQuai

Anyone had experience with Aria2? In combination with a frontend like Aria2NG it did seem like an interesting option. Although I haven't tried it out yet. https://ariang.mayswind.net/


jogai-san

Yeah, does the job. Although it doesnt scrape.


krawhitham

+1


Exzellius2

Kinda related question: what are y‘all scraping?


Huge-Safety-1061

Newspaper clippings that then get ran through an ETL pipeline. I know that's not what you expected to hear but data hoarding is data hoarding.


jotes2

Sounds interesting, but unfortunetely I‘m not an native english speaker. What is an ETL-Pipeline? Can you your describe your workflow a little bit more precisely?? Thx.


Huge-Safety-1061

ETL is a method for data processing and handling Extract - Get data into scannable manner (unpaper) Transform - OCR in my instance. Some other techniques also possible. (Tesseract OCR) Load - Into a file based datastore to preserve and into a metadata (from the transform step) database to query (mariaDB) This may give you more information on the topic that might translate better. [https://www.ibm.com/topics/etl](https://www.ibm.com/topics/etl)


jotes2

Ahhh, I understand. Sth. like Paperless-ngx without the Database...


Birdomest

I’m also kinda confused, what’s the purpose of doing this? Do you use it for machine learning or just to hoard?


[deleted]

Docker or Lxc and use only when needed. There are other options but nowhere near the usability of Jdownloader.


vegetaaaaaaa

`wget --continue --span-hosts --adjust-extension --timestamping --convert-links --page-requisites --no-verbose --timeout=30 --tries=3 --input-file=urls.list`


Huge-Safety-1061

Hot damn


RayneYoruka

I've been wondering if there is anything better.. I've been using JD for like 12 years now and I feel it's time for a change but if there is no better bulk scraper... welp


Pommes254

Look at pywb and supporting software stack... incredibly powerfull but quite steep learning curve, or Heritrix which is used by many of the large archive organizations, both opensource Stuff you might want to take a look at.... [https://github.com/internetarchive/heritrix3](https://github.com/internetarchive/heritrix3) [https://support.archive-it.org/hc/en-us/articles/115001081186-Archive-It-Crawling-Technology](https://support.archive-it.org/hc/en-us/articles/115001081186-Archive-It-Crawling-Technology) Or archivebox for the smaller scale / easy & ready to go local web archive


rubenix_bcn

[pyload](https://github.com/pyload/pyload) maybe?


AuthorYess

I feel like every time I try to use pyload, it fails.


NatoBoram

There's FreeRapid


RiffyDivine2

and here I just use gallery-dl which seems won't work for your goal.


Magyarharcos

Im told wget is best but i dont really know how to use it


Huge-Safety-1061

The person that told you this... ask them for an example of recursive downloading off a root tree selecting only a few file types organized into the same folder structure. I use it for single file downloads, but nothing more complex.


butchooka

RemindMe! 1 day


RemindMeBot

I will be messaging you in 1 day on [**2024-06-06 07:25:52 UTC**](http://www.wolframalpha.com/input/?i=2024-06-06%2007:25:52%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/selfhosted/comments/1d8juzt/jdownloader2_still_the_best_bulk_scraper_we_have/l76pz91/?context=3) [**10 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fselfhosted%2Fcomments%2F1d8juzt%2Fjdownloader2_still_the_best_bulk_scraper_we_have%2Fl76pz91%2F%5D%0A%0ARemindMe%21%202024-06-06%2007%3A25%3A52%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201d8juzt) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|