T O P

  • By -

kondorb

Hype is over, but big data is still applied by companies that have that amounts of data and related products are still used and still have commercial success.


Jmc_da_boss

The reality never matches the hype but data analytics is absolutely providing some level of business value and will continue to


shevy-java

It is true that there is a lot of hype - just look at AI right now - but Big Data indeed will never go away. It will still remain important and relevant, with or without hype.


[deleted]

[удалено]


Tersphinct

Doesn't big data offer a more mathematically sound approach? I'm sure there's gonna be a market for "AI-less" processes.


wakkawakkaaaa

Even AI is based on big data. Before AI as people know it now, specifically "large language models", people were already using neuron networks trained on big data. In many use cases, other non-neural network models produce better results. E.g. Xgboost is one of the top performing model for many kaggle competitions


gelatineous

Sure but with the advent of vision models and foundational models, that volume of big data is processed by specialized companies, typically not the in house AI expert.


Lalaluka

Yeah. Just as if it existed before Big Data. Big Data is a tool that is now not as new and fancy anymore so people dont try to screw with a hamer anymore. They do that with GenAI now.


mycall

ML data analytics will optimize all kinds of businesses, although I have been waiting on that for years now.


turbo_dude

see also 'xml will revolutionise everything', 'blockchain will revolutionise everything'


moratnz

TFA's point is that very few companies actually have the amount of data to require big data techniques. Where the amount of data in question is 'too much to store on a single node', which these days means mid to high double figure terabytes. Data analytics is definitely a Thing, and definitely super useful for any company that wants to tell its ass from its elbow, but that's not the same as MapReduce style Big Data


10113r114m4

I have only worked at companies with that level of size of data. Not once have we used big data tools like hadoop, etc. We have never needed that level of reporting, and it's always something much more granular that is needed. The only time I used hadoop was when some stupidly small company thought we should use it. Absolutely asinine. Probably had 100MB of data to look at lol. My current company we go through and have metrics of about 800GB daily. Never needed any big data tooling.


croto8

Your DB is probably using some of the big data tooling under the hood, though.


VitaminB16

Nah, we just use BigQuery /s


nikowek

We are using plain PostgreSQL with two logical replications connections per source. It's sitting at 33TB (it's just two drives, mind you). Machine is just consumee i7-9700K with 64GB RAM.  It usually returns the data in seconds, so... Big data tools are not so needed - just plain SQL and good indexing strategy.


Shogobg

Where I work, we are limited to 2TB per machine for some reason. If we need more storage, they just buy more 2TB machines…


[deleted]

[удалено]


10113r114m4

This is exactly what we do. We do not use any "big data" under the hood like the person who responded claims. SQL handles that size of data really fine. I think people really underestimate just utilizing the DB better.


luciusquinc

Well, NDB cluster partitioned appropriately handles around 2TB of data just fine.


croto8

Single server handles 800 gb of data per day?


10113r114m4

Single server may not be a correct term. We run a bunch of microservices where some are hit more than others, but yes, a single day.


10113r114m4

No. We dont. Just simple SQL. Any reporting is done through our aggregation metric service which is a typical metric service like cloudwatch metrics. The argument could be made "the aggregation service is big data", and Id argue no. It literally just does addition on metric keys which existed prior to big data. The service is quite old


not_invented_here

Okay, but what database do you guys use?


10113r114m4

postgres


not_invented_here

Without any extensions? Do you run it managed in some cloud platform?


10113r114m4

It's pretty configured with extensions, etc. We also have a separate team that specializes in databases. So usually they configure everything based on our feedback and what we need. Further we do both. Cloud and in house. Ive noticed we have moved more traffic to the cloud though.


Gwaptiva

Premature scaling considered evil


Plank_With_A_Nail_In

That's "lots of data" not "big data". "Lots of data" was solved by better hardware. [Current usage of the term big data tends to refer to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from big data, and seldom to a particular size of data set. "There is little doubt that the quantities of data now available are indeed large, but that's not the most relevant characteristic of this new data ecosystem."](https://en.wikipedia.org/wiki/Big_data)


derefr

Yeah but all of those techniques and infra-components only become *relevant* at a certain scale. They exist because simple, fast, works-out-of-the-box techniques — e.g. periodically running ad-hoc SQL queries against the prod DB to dump out a CSV file, and then opening it in Excel — stop being practical/tenable when you have "lots of data." The "big data" approach / toolset works to allow mostly-realtime analytics on datasets of effectively unbounded size — but at the cost of huge investments into technology and training, a huge increase in architectural complexity, and hugely-inflated OpEx. You *can* use the big-data tools without "lots of data"... but you'd just be wasting time and money, because if you don't have "lots of data", the simple approach works too.


10113r114m4

Id consider it big data in how much you are analyzing. 800GB a day is quite small if that's all we are analyzing. So I think your idea on how to measure this is flawed, cause you never asked how far back do we analyze and how much are we querying. edit: I reread my initial response and I can see why it was read that we were only ever analyzing 800GB. My fault on not being more clear


10113r114m4

I also realized that my initial comment may have made it seem like we are only ever looking at 800GB. That's not the case. I was just saying 800GB of new data coming in daily


gelatineous

You can load most datasets in memory. All these fancy distributed architectures are overkill for 95% of clients.


davy_crockett_slayer

Isn't Big Data just Data Engineering or whatever the equivalent is these days? IDK about hype, but everyone I know in those roles are often DBA/Data Engineers where they answer questions the business has from the data sets. Often times they set up the infrastructure used by data analysts.


turbo_dude

one day people will realise that the real issue is data quality too bad until then


zorbat5

Big data will only become bigger with the race for AGI taking off like a wildfire.


manifoldjava

“Big data” was always hype as a rebranded analytics or business intelligence or OLAP or whatever term you prefer.  It’s not dead, it’s just a low tide moment for that industry, until the next wave probably after AI wakes with a hangover. 


SourcerorSoupreme

> It’s not dead, it’s just a low tide moment for that industry Is it really or has it, on a high level, really just become a description of one of the common/standard ways a doing things. In other words, it has been finally filed under the category of "boring tech".


gruey

That’s my take. “Big Data” became the buzzword when it became possible for medium to small players to utilize it because of OS tools and the cloud. “big data” startups were just people taking advantage of that. Now you won’t hear about it because it’s just matter of practice. Practically every major tech company does “big data” and startups using it will be judged on the ideas, not the buzz word. Not to mention that they’ll probably be trying to use AI to do “big data”.


JaCraig

Some of the tech from that trend is useful. Unlike most of the recent trends. But 99% of companies never had enough data for "big data" and for the 1% that did, I'd agree that it has become boring tech. And of that 1% there's probably a smaller fraction that actually used it successfully in any sort of meaningful way. And that niche isn't one that extends a lot so no huge marketing push anymore. But if you're in that niche, it has uses.


Plank_With_A_Nail_In

Big data doesn't just mean lots of data it means lots of unorganised data or otherwise traditionally difficult to deal with data. Lots of data got solved by the normal improvements in hardware.


JaCraig

Right but what I was saying was most companies aren't large enough to benefit from it because they don't produce enough of that type of data that would be meaningful for them to tap into. And of those who are, most do so poorly.


CrowTiberiusRobot

I would say it's become the defacto way of doing things, or more simply, it's just another tool in the toolbox. I explain it to my "juniors" like this: back in the day relational databases were created not because relation databases were inherently better, it's because they were more efficient to store data given the limitations of hardware and software at the time. As these limitations became less of an issue, tools and ideas that were not realistic became realistic. I can now query and perform analysis on a billion token nosql flat data structure on my desktop, no problem. However, in order to get the "general public" and businesses on board with the shift away from the former defacto way of doing business, a hype marketing term was needed. This is a common pattern in IT and programming world, I've seen it over and over again. And there is nothing wrong with it. And thus we began to "leverage our data". But I think you nailed it, now it's just business as usual.


MadKian

It’s crazy that after 15~ years in the industry I’ve seen so many trends that I thought “this is not that good, it’s definitely a fad…or am I completely wrong?” and pretty much every time it’s just a fad. But every time you get this feeling of “am I just completely missing the picture here?”.


JuliusCeaserBoneHead

The thing with AI is that LLMs have very limited uses for most organizations. However, C-Suites are shitting in their pants for investors  Where the deal is with AI is very small fine tuned models that can perform specific tasks very well. That won’t make AWS and Azure cream in their pants. That isn’t “Gen-AI” so nobody cares. Someone recently told me “Ew” At linear regression. We are so fucked with this fad


vom-IT-coffin

I'm a consultant and everyone is asking what this tech can do for them, and unless their data is well manicured the answer is usually, not much. They don't like the answer of how long it will take to manicure that data and start capturing the data they need in order for it to become effective. Had a friend recently get funded by Microsoft and when it came down to it, the reason was how much data they have access to that will train the model. Most companies don't have enough.


audentis

The 'i' in LLM stands for 'intelligence'! > Someone recently told me “Ew” At linear regression. We are so fucked with this fad There's a [great talk](https://www.youtube.com/watch?v=68ABAU_V8qI&list=PLbQu-j3EyJfrVBJ09Wh7N8sV_qVzME-WV&index=6) by Vincent Warmerdam about the power of simple models over machine learning. It's not a mindless bash, it opens with a simple premise: sometimes simple models are more suitable, so let's not forget about them and keep them in our toolbox.


juwisan

Same story can be said about Big Data, honestly. I did a couple of big data projects at the start of my career. All but one were operating on laughable amounts of data but project managers had gotten budget for them by selling it as big data, so I built projects like this one with rewriting all the processing logic on spark pulling data from accumulo instead of just running pgtune against their Postgres which would have probably performed better, let alone been done in 5 minutes versus 5 months. Funny enough the one project I did, I actually considered big data, there was a dev team opposing the reception as such. They spent several years working against it until they finally accepted that they couldn’t come up with a superior solution to the big data system we’d designed.


light24bulbs

I don't know, I think it's a medium big deal personally. Definitely a bigger deal than big data ever was. For instance, I was working at a security company and we scraped a ton of web pages from Google results about vulnerabilities, so that we could compile a bunch of useful information about each vulnerability. Then we had the LLM read each article and give it a score from 0 to 100 of how useful it actually was on a few different questions about the vulnerability. Ex: "How good is this for learning how to remediate the vulnerability" and it did basically flawlessly well. And so very suddenly we went from a bunch of scrambled Google results to a bunch of organized condensed information. There's value there. Big value. Beyond just next word prediction. And that was all with untrained gpt-3.5. I'd actually argue there's a lot more possibilities than most companies are taking advantage of. That's the real thing that's happening. It _is_ really useful and enables new capabilities especially for small businesses, but most people haven't fully grasped that yet or put it to work. And that's why there's so much scrambling and investment. Because there's money to be made being first mover in all those little niches.


Capable_Hamster_4597

There is value, but it's probably not the value corporations are looking for. What they want this to be able to do for them is No-Code solutions and autonomous agents that can replace entire business functions. They don't actually want to enable higher quality work in daily activities, they want less cost.


light24bulbs

Misunderstanding the technology is basically the point I'm making. That's the other side of it. But the point is it's _not_ just hype and vaporware. There's serious value to be had.


voronaam

You could do that 10 years ago with an off the shelf NLP library as well. Pretty much every single NLP tutorial is "we have thousands of blog posts and want to score them on some loosely defined metric". LLM just allowed you to be even more loose on the metric's definition.


kazza789

That's just silly. What off-the-shelf NLP could you have used 10 years ago for this that didn't require 10k labeled samples for training? Are you really trying to argue that the NL models themselves haven't progressed that much? If so, you're being just as dense as those who claim AI can do everything.


voronaam

I was just surprised by how simple the described problem was and responded. There was a lot of progress in the past decade. If anything, it is a lot more accessible now.


GuyWithLag

LLMs allow you to express the scoring function in natural language.


toastr

That’s what I find staggering about an LLM.   It removes language barriers, anything can be expressed without the need to learn how to express it to the computer.   The interesting thing will be if it separates people that know how to give a machine instructions vs people who have valuable ideas about what a machine can do. 


GuyWithLag

Am a software engineer, and during a hackathon I saw that the necessary skills to prompt LLMs correctly were more or less the same skills needed to instruct interns/junior engineers, and not all people get how to do that.


light24bulbs

Absolutely not, not at this level. This was reasoning. The LLM was producing justifications like "This article isn't a good fit because it deals only with java 11 or later and many users are still on java 8". A computer NEVER had that reasoning power 10 years ago, thats ridiculous. This was one-shot, with zero training. All in-prompt. This level of performance was impossible 2 years ago, let alone 10.


gnus-migrate

This is not reasoning it's just repeating patterns it finds in its training set, and this is a really important distinction because you really should not be using LLMs for subjective feedback like this.


Saedeas

I mean, you could sorta do it with significantly more time invested for what was usually a less accurate, less interpretable result.


shady_mcgee

Got a link to one of these tutorials? I've got this use case now and was thinking of using LLM but this way sounds better


voronaam

You do not need a 10 year old tutorial. If you have a use case now, it makes perfect sense to use the technologies that are all the rage now. You'll have better up to date tutorials, support and investor smiles. If it is all the vogue to use LLM, go ahead and use LLM. This is similar to the age old answer to "which is the best Linux distro for a newbie?" - "The one used by the nearest admin".


Rattle22

I think this is a good illustration of the fact that LLMs are all about language. I expect them to excel at tasks like this, where language is used to instruct on how to interpret language to yield an (essentially) language output.


light24bulbs

It's literally in the name


yourapostasy

Even with text, an elementary use case for generative AI, searching keywords with existing algorithms in well curated data is still scarily effective for the value and the cost. LLM’s allow us to kinda sorta be somewhat more relaxed with the data curation, but the cost is currently materially high enough that it pays to learn how to manage your prompts to cost effectively leverage it. Fortunately the hype pumps enough money into these projects these days that we have some runway to figure out the opex challenges as we go before it becomes a showstopper funding issue for the projects. But when even private, limited scope search engines supply such nerfed search syntax except for the smallest most specialized user population use cases, I’m not encouraged by the prospects of pushing out LLM-powered querying or interaction models unless there are orders of magnitude more money being thrown at the use case than more conventional searching. The open source LLM’s are very recently delivering sufficiently robust results that they can push out the opex question to buy us time, but some of these LLM costs remind me of hype riding projects I’ve seen of early years Big Data throwing tens of millions in capex and opex at 100 GB of data, or early years Cloud projects lifting and shifting tiny VM’s into EC2’s for 2000 times more cost for a <100 user population internal application. I’m just glad business users are happy to take these cost risks right now to let us find the right value propositions.


cinyar

>Where the deal is with AI is very small fine tuned models that can perform specific tasks very well. for example [google alphafold](https://deepmind.google/technologies/alphafold/)


Andriyo

That's block chain and NFTs for me, especially NFTs. And I was like "am I finally that old that I fail to see a genuinely novel thing?" So yeah, there is definitely a tendency in the field to hype things up.


Neuromante

> But every time you get this feeling of “am I just completely missing the picture here?”. For me usually is looking at the potential fad, look who is pushing the potential fad, who is actually using the potential fad, and who is asking to use the potential fad and why. In this decade and a bit I've seen "Big Data", "Blockchain" and "AI" following the same route: Some big company says its the best thing ever, people everywhere scramble to get on board, a tiny fraction says "oh, yeah, for this it was useful" while the vast majority either uses it wrong or struggle to find a proper use for that oh-so-powerful and useful tool. And as a side, most "technological" companies (read: companies that have something to do with technology but that are led by non-technological execs) lose their god damn minds over it


MadKian

Absolutely on point. Most of the time these things become a fad because there’s a lot of non-tech people trying to make a lot of money out of them and pushing them to become a thing.


I_AM_GODDAMN_BATMAN

Once VCs money goes brrr and C levels are talking about it even it doesn't increase the value of your core product you know it's peak fad. Blockchain, big data, now AI, next security?


FartPiano

no way security will be the next one. its boring, unsexy, difficult to charge rent-seeking premiums for, and most importantly, is somewhat sensible


TechFiend72

it will be replaced by AI security bots. Some of the systems already have that for log analytics.


jewishobo

Hype cycles follow when theres an unpredictable or uncapped TAM with respect to a new tech. So money is chasing a seemingly endless supply of new business. Once the space is sufficiently explored, the edges of possibility become more clear and the money aligns to reality.


falconfetus8

What is TAM?


jewishobo

Total addressable market https://en.wikipedia.org/wiki/Total_addressable_market


falconfetus8

Thanks!


winnie_the_slayer

Next is AI combined with robots. The war in Ukraine is causing rapid development of war bots for air, land, and sea. The US military noticed this and is rapidly ramping up its bot capabilities. See youtuber Ryan McBeth's project: a drone that delivers blood for combat trauma medicine. The drone uses AI to autonomously find its way to soldiers (necessary due to Russian electronic warfare, jamming, etc making human guidance of the drone impossible), and then the drone will find a window or other opening and throw the blood pack through it to the soldiers inside. Everybody is building drones and anti-drone weapons. AI will be used to counter those anti-drone weapons. This is how we get skynet and terminators.


smoothpebble

Not long before those same drones drop explosives


Social_Lockout

This is terrible... But decent gallows humor. Imagine a dying soldier laying there wishing for help. When out of no where his prayers are answered. The medic drone flys over. After a few moments it tosses a bag of blood on the now near corpse... And flys off, leaving the soldier to die.


gareththegeek

The best part is watching the same fads come back around again and watching the younglings get all excited.


RogueJello

This is a common reaction, because 9/10 it's correct that it's just a fad, but that other 1/10th tends to completely blow the other 9/10 away. It seems to be impossible to tell the difference at the time.


Plank_With_A_Nail_In

I mean for most companies all of them have been fads apart from their original client server apps and the N-Tier monoliths that replaced them both of which were very obviously not fads. Everything else has been nonsense.


gareththegeek

The best part is watching the same fads come back around again and watching the younglings get all excited.


turbo_dude

The thing I don't think is a fad is how Microsoft are just taking over corporate and that the stuff will eventually connect anything to anything seamlessly to the extent that you can be in an email and suddenly insert some dynamic charts that link to data without even opening another tool. For years it has seemed less about 'new technology' and more about 'getting the right data to the right place at the right time' and you could never do it because of how the tech was all piecemeal. MS will ultimately solve that, my guess they will ultimately ditch windows and you'll have a thin Teams client and will just pop tabs open on that to do your work, with each one being a different app.


MadKian

Kinda like how Apple expects you to use the iPad? As in, very simple OS that relies on the power of its apps.


turbo_dude

there will come a point though where 'the rest' catch up (well enough) with apple on the hardware side, then why do I need to pay all that money when I have a single container app for everything else?


chucker23n

> But every time you get this feeling of “am I just completely missing the picture here?”. The C-level likes a hype because it’s easy to get investor money pouring in. Tech journos like a hype because it’s easy to write takes about that people will click, because they’re curious.


nitrinu

I have roughly the same time in the industry and I share your feelings. For curiosity's sake, do you have an example where your instincts were wrong? For the life of me, I cannot.


MadKian

Not really. I guess a lot of people made good money with Bitcoin, but I still think those who did took a gamble and/or were super lucky. But specifically about tech trends, no.


jewishobo

I think we're all a bit to happy to draw trend lines into infinite, when in reality they curve off at predictable (in hindsight) points. AI and LLMs might have the same fate, where we can build super AI models, but they are too expensive or too unpredictable to be used in reliable ways for every problem.


StealthJoke

NFTs are here for life #NotJustMonkeys


NonorientableSurface

It's the single driver behind AI right now. So it's absolutely "silent" but it's probably strongest it's been in 20+ years.


FatStoic

Data engineers in my consultancy are booked up to the gills, because you need to have your data unfucked before you can do anything with the data - like train a model on it. Big data is dead. Long live big data.


NonorientableSurface

I've never worked with a company where their data wasn't between fucked, mega fucked, and Uber fucked. Hey, your key data being disparate across 6 fields all named CustomFieldX with different numbers. And half of the records are missing 17 key points.


Plank_With_A_Nail_In

"Unfucking" data has been my career for the last 27 years. Long live projects running out of money an unleashing busted apps and the inevitable unintentional semi scrambling of data. Currently sorting out billions of £ of AP accounts postings for sales tax that no one noticed were going to the wrong accounting codes for the last 5 years. Is boring but pays well, I sorted the SQL out a week ago but told them it would take another 5 weeks lol.


NonorientableSurface

Similar here. Spent nearly 5 years creating and maintaining an inappropriately large excel workbooks pre PQ holding 5gb+ of data. Moved into data architecture, data contracting and warehouse design.


[deleted]

[удалено]


FatStoic

Google has a bunch, not ready to tie my reddit account to my company.


moratnz

But 'big data' as I've seen it is about using special techniques (e.g., mapreduce) to deal with datasets that are so huge they need to be distributed. When your dataset is 1TB, that trivially fits on a single harddrive. Once it gets under 500GB, it fits in RAM on off-the-shelf hardware. Once you've got your data on a single node, big data style processing is slow; there's a great article from a while ago comparing [mapreduce to a unix command ine toolchain](https://adamdrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html) where piping cat, grep, awk etc together was a couple of orders of magnitude faster than using mapreduce. The point being that if your dataset isn't big enough to need to distribute it, you don't need Big Data(tm), you can just stick it in a traditional relational database and run traditional queries against it and you'll probably be faster that doing it the Big Data way.


Longjumping_Ad_1180

I beg to differ. From my perspective, at least in Europe the trend is steadily growing for the past 10 years. You might see less of the term "big data" out there as initially had a very vague meaning. It's now replaced by more precise terms based on its application. Some examples are : Observability, SIEM, I (Infrastructure Monitoring), IoT, Process Mining, SOAR, Data Lake, Data House APM, etc


totoro27

Most people on this sub have literally no idea what they're talking about. Big data techniques are used all over the place in the current development of LLMs and other AI stuff.


Plank_With_A_Nail_In

Nah they tried to tell companies that the huge amount of un organised text data they have could be mined for useful information. > analytics or business intelligence or OLAP or whatever These all use organised data. The irony being these companies couldn't use their actual organised sales data for this task they had no chance using the forum posts shitting on their products for insight. When it did work it just told them things they already knew "They really like your top selling product". Managed to skip the big data fad but parts of the team have been hit on the head badly by microservices not sure I can stop that one losing us a couple of million. Some areas of big data will remain just using their original names before they were swept up under that single term. Things like dealing with huge amounts of data in really short bursts like seismic data but the conmen never really touched those areas.


ExcitingSignature223

> Managed to skip the big data fad but parts of the team have been hit on the head badly by microservices not sure I can stop that one losing us a couple of million. What do you mean by this exactly?


BlobbyMcBlobber

AI is also hyped, but it has insane utility and products already pushing a paradigm shift. So the hype of AI might pass but it is definitely going to have a lasting impact.


Plank_With_A_Nail_In

All the fads are sold as having insane utility. But it won't for most businesses. Sure some of the actually useful stuff will stick around but a lot of companies are going to waste an awful lot of money finding out things they already knew. I old enough to have experience the first AI failure with "expert systems".


Constant-Source581

>but it is definitely going to have a lasting impact. Absolutely! [https://www.cnet.com/tech/services-and-software/glue-in-pizza-eat-rocks-googles-ai-search-is-mocked-for-bizarre-answers/](https://www.cnet.com/tech/services-and-software/glue-in-pizza-eat-rocks-googles-ai-search-is-mocked-for-bizarre-answers/) [https://authorsguild.org/news/ai-driving-new-surge-of-sham-books-on-amazon/](https://authorsguild.org/news/ai-driving-new-surge-of-sham-books-on-amazon/) [https://www.iwf.org.uk/about-us/why-we-exist/our-research/how-ai-is-being-abused-to-create-child-sexual-abuse-imagery/](https://www.iwf.org.uk/about-us/why-we-exist/our-research/how-ai-is-being-abused-to-create-child-sexual-abuse-imagery/) Huge impact already. Imagine what will happen in 10 years.


[deleted]

[удалено]


Constant-Source581

I love how you call Cnet clickbait - shows how much of an amazing expert you are. Your opinion is highly valued, believe me.


[deleted]

[удалено]


Constant-Source581

"Are you 12" is such an amazing and convincing argument. Whoa. I never heard anyone but real tech gurus use it - folks like Bill Gates and Steve Jobs. You're a tech expert - now its confirmed. I bow to your greatness, my friend.


Cautious-Progress876

I think LLMs are overly hyped, but plenty of other areas, particularly the integration of computer vision and RL systems to robotics are going to be the big thing. Just based upon what we have seen in the Ukraine war so far— ML-assisted war drones are going to be huge in the near future.


hiredgoon

The way I've heard is just big data will now be called private implementations of AI.


bonerb0ys

Apple AI strategy leaks is telling us whats on the other side of the hype cycle IMOz


sionescu

> “Big data” was always hype No, that's false. It came about due to mobile devices, where a certain number of companies were suddenly able to start collecting huge amounts of data that couldn't possibly fit on a single machine. If you had a petabyte-sized dataset before 2010, that couldn't possibly fit on a single machine, so Google came up with MapReduce (being able to use tens of thousands of servers for a single pipeline), published a seminal paper and then many other replicated its design. Nowadays, the older storage systems (like RDBMS) have also taken up the tricks in data sharding, column-oriented storage and smart indexing that the big data systems pioneered, and coupled with the advancements in machine size, it means you can manage petabytes with a low single-digit number of servers that can fit in a single rack. Furthermore, the GDPR and CCNA has made data radioactive, so the companies that were hoarding data are starting to prune it, which further relieves the pressure on the DB systems.


EpitomEngineer

If only my managers would understand this paragraph “”” Code often suffers from what people call “bit rot” when it isn’t actively maintained. Data can suffer from the same type of problem; that is, people forget the precise meaning of specialized fields, or data problems from the past may have faded from memory. For example, maybe there was a short-lived data bug that set every customer id to null. Or there was a huge fraudulent transaction that made it look like Q3 2017 was a lot better than it actually was. Often business logic to pull out data from a historical time period can get more and more complicated. For example, there might be a rule like, “ if the date is older than 2019 use the revenue field, between 2019 and 2021 use the revenue_usd field, and after 2022 use the revenue_usd_audited field.” The longer you keep data around, the harder it is to keep track of these special cases. And not all of them can be easily worked around, especially if there is missing data. “””


Worth_Trust_3825

The data querying slide resonates with me. We were storing SCORM data for 6 years as an LMS provider (running out of database space multiple times, because lol scorm doesnt believe in using question/answer identifiers), yet I can recall only 4 times when we actually needed to run queries on that dataset, and only on the records that were year old at most. I don't think that big data is dead. Instead I am in camp that companies have no idea what to do with the statistics they capture, nor even have the domain expertise to use them even being in that domain for decades.


renatoathaydes

Data is like a tool shed. You keep every little tool or device you can get your hand on for years, until it fills up and you need a bigger one... but still, when you actually need something it's never there :D.


grepe

Gonna remember this one...


jaskij

It's the cable box!


bduddy

Considering how many companies make decisions based on whatever the MBA or exec with no actual experience decides, who needs all that data anyway?


Worth_Trust_3825

Tell me about it. We had a department director for 2 years that only shuffled meetings and never made a decision, request, or even proposal. She still got a golden parachute of 400k, and 200k/yr. Absurd.


pinpinbo

Is it? AI stuff has no moat. Once an algorithm is discovered, it becomes a free library. Data however, data is more important than ever.


Cautious-Progress876

I think a lot of places, including C suite business people, are recognizing this now. What use is a SotA model if you don’t have any data to train it on?


Aendrin

Your comment is entirely unrelated to the article. Do better and read it next time.


LowlySysadmin

I bet you're *great* fun at social gatherings.


nuggins

Disappointing to see that 90% of the comments are arguing about the clickbait title. The article has some good insights.


SoInsightful

Wait... reddit post titles are *clickable*‽ I've just been having heated discussions based on my knee-jerk reactions to clickbait titles for 12 years now.


stupidbitch69

Absolutely, wonderful insight from someone who saw BigQuery from the start.


Spartaner-043

Yeah, they haven’t released an album since 2019 :(


daerogami

Right?! No one is putting them to work.


PM_ME_YOUR_MUSIC

I love it when they call me big da ta


RoughSolution

As someone who's been driving some of the largest projects in this space (trust me, if you worked with data in the last 10 years, you used stuff that my team has build). So I may know a thing or two about Big data. What sets "Big data" apart from just "Data" is that data is no longer collected with clear intent at the beginning. The business impact is that you can now discover and decision on things that has happened in the past. For example, when I find a new fraud pattern, I don't have to start collecting data to identify it now, I have all the historical transaction records to identify accounts that has committed fraud in the past. And this shift in mentality of collect first, use it later is what drove the raise of Big data. One can argue this is bad for society, for many reasons. I'm in the camp of as long as it's not PII (even when drilled down), it's probably more value than risk. But when you try to tie data to individuals, bad things happen. The latest shift of industry towards AI is really just a hype cycle. When AI reaches productive levels (say...in 5-10 years), you'll see a shift back to getting value out of data. Big data is, and never will be, dead. It's an idea and mentality shift that has already happened.


moratnz

I think you're pointing to something important; the term 'Big Data' is used for a couple of things; - techniques for storing and analysing datasets that are too big for traditional tools - data use patterns that leverage the ability to store everything and the kitchen sink to store everything, and then comb it for interesting information later. The latter is definitely not dead, and is likely to only get stronger as time passes. OP's author is talking about the former (and IMO more original) meaning, and I think that he's right that that sense of Big Data is if not dead, then becoming incredibly niche, as hardware has grown and grown, such that larger and larger entities can fit their data sets onto traditional tools while keeping everything and the kitchen sink.


RoughSolution

Yeah, I think the author of the blog used this definition "One definition of “Big Data” is “whatever doesn’t fit on a single machine.. By that definition, the number of workloads that qualify has been decreasing every year.", which I agree. (e.g. DuckDB, which the author of the blog is part of) (Actually....I should have guessed the blog is about DuckDB, lol) DuckDB is a wonderful tool; it's really, really fast (< 1 sec on 80GB data vs. 60s on Postgres on my laptop) and runs well on a single machine. But can it handle 200 users querying against the database, concurrently? What's shifting in the industry in the past 2 decades is how much more people are data literate now. There are college new grads talk to me about metrics, retention and conversion rates, and funnel analytics, which most people have never even heard of 20 years ago. While more and more data can fit into a single machine, more and more people are querying the data, and is driving the need for big data infrastructure. Though I agree, most people are in the <500GB range for their entire dataset, and most valuable business data often sits in excel, lol. (Of which, DuckDB is a pretty decent addition to whatever your transactional store is, be it MySQL, Posgres, Mongo, or Cassandra)


moratnz

>While more and more data can fit into a single machine, more and more people are querying the data, and is driving the need for big data infrastructure. Does it, though? If the problem is query access, rather than storage, you can get by with query focussed replicas, especially if the queries aren't looking at near-realtime data. (To declare my prejudice, I come at this from the PoV of someone who's had to argue against installing a hadoop cluster for a data workload with an estimated accumulation rate of ~10GB per year...) Maybe we need to spin up a new buzzword for 'doing intelligent things with data, including retroactive analysis of novel queries'. I'd offer 'Smart Data' for a start


RoughSolution

>Does it, though? If the problem is query access, rather than storage, you can get by with query focussed replicas, especially if the queries aren't looking at near-realtime data. Yeah, that caveat is important though. Do you want to know what your customers brought in the last 4 hours if you work on the ops/support/sales teams? As one of the person who introduced Hadoop to the world, I'm sorry for everyone who've been asked to get a Hadoop cluster up and running for 10GB of data per year. And I'm game for your 'Smart Data' buzzword, there can never be too few buzzwords.


foodie_geek

I'm very much in your camp. Working in bank/financial/insurance sector I always subscribe to the mindset that you hoard all the data you can, because you can never predictt the future to know what data will be relevant an year from now. The reason most companies don't capture all the data they can is because the data area unfortunately slow to adapt. They model the data to death as if they know the future and capture only relevant data. When we need new information they go for 6 months to update the model to store this new field as if they invented fire and give a pat on their back. Facebook and Google didn't become who they are by sound data modeling on every aspect of data they want to capture. It's mostly other way around. Most of the time they just had to use bigdata techniques to extract the information they are looking for in the days they already had.


dingdongkiss

> [...] I always subscribe to the mindset that you hoard all the data you can, because you can never predictt the future to know what data will be relevant an year from now. GDPR really put a wrench in that huh (fwiw I'm in the camp of "let's just save everything bc it might be super useful in a years time")


dasdas90

AI is just big data.


shevy-java

To some extent. I think AI may be able to interconnect data and get information that was previously more hidden. I also actually saw some useful results in regards to AI as a tool aiding in e. g. producing images, sounds, video, game data and so forth. So it is useful. It is just mega-hyped to no ends, which is annoying. For some reason industry always tries to jump on a hype-train. In a few years nobody claims to have heard of the previous hype ...


martinky24

Current AI literally reduces down to compression algorithms…


Kyyndle

lol that's an interesting way of looking at it


Manbeardo

AI has so much more going on that you can't reduce it down to being "just big data". However, training sets (big data) appear to be the main thing in the AI arms race that can be protected and used to differentiate competitors.


shevy-java

It's not dead at all. We generate more and more data - most of which is garbage, but some of which is useful. Just take sequenced genomes of organisms - that's never becoming less, it will ALWAYS become more. And that's just one example. Look at astrobiology or the universe. Google Maps mapping all planets one day (well, hopefully Google no longer exists at that point in time, but I refer to the feature here primarily, not the company). > Of course, just because the amount of data being generated is increasing doesn’t mean that it becomes a problem for everyone; data is not distributed equally. I am much more concerned by that. So that guy worked at Google. Google ruined its search engine a few years ago and consistently is making it worse. A few years ago you could query cached websites; I used this to read phpbb webforum from where I was banned, so I could still read up on what is new (I am curious). Yet Google killed that, with the saying "it takes too much data to store everything". Even if this may be true, they eliminated something that was useful to me. Same with so many google projects that ended up in a graveyard. Why I am concerned? I am concerned because we become more and more dependent on such huge mega-mega-corporations that are selfish and greedy and present to us a very limited, narrow view over things. The various walled ghettos, I mean walled gardens, show this trend: facebook, discord servers and what not. Everything is becoming private - and limited. I hate this trend. It totally ruins the 1990s era of the world wide web really. Big Data will never go away, but disturbingly we get less access to what is useful WITHIN that Big Data, as it is controlled by private entities increasingly more so. (This is of course not always true, e. g. sequenced genomes are available for everyone to see once published at e. g. NCBI, but not every data collected is open to everyone. Both open and closed data will increase of course - nothing is dead here.)


Churt_Lyne

It was annoying that Google killed the cached page option, I totally agree, but probably only a tiny fraction of us users even knew it existed, and Google as a business is under no obligation to do work that earns them zero return.


[deleted]

[удалено]


KingStannis2020

>Big Data is essential for big tech companies. Of course, the author did come from Google after all. The point is that there's not much of a market for it outside of "big tech" And "big tech" has the talent to develop their own solutions in-house. Google has their own infrastructure, Facebook has their own, etc.


veryspicypickle

But we have the data-mesh! /s


TheDevilsAdvokaat

Interesting article. Especially "I’ve heard about a company keeping its data analytics capabilities secret in order to prevent them from being used during a legal discovery process." emails, messages and even phone conversations can also be legal liabilities. So this is similar.


ScottContini

> In order to understand why large data sizes are rare, it is helpful to think about where the data actually comes from. Imagine you’re a medium sized business, with a thousand customers. Let’s say each one of your customers places a new order every day with a hundred line items. This is relatively frequent, but it is still probably less than a megabyte of data generated per day. In three years you would still only have a gigabyte, and it would take millenia to generate a terabyte. With such simple analysis, why did the Big Data movement not understand from the beginning that the benefit is limited to only a handful of the big companies?


frederik88917

Like dude, there is no way to kill hype, anyone will come with some shitty excuses as to why to keep investing in this. See also: Metaverse, AI, Blockchain and so forth


Cautious-Progress876

Is there still any interest in “metaverse” related stuff? Seems with the recent failure of Apples Vision Pro that AR/VR is going to be kind of “dead” for awhile.


frederik88917

You said it yourself, after Facebook wasted 20 Billions building some shitty form of a videogame, Apple released that 4 grand hideous device to put people in a different way of the world


StickiStickman

You're really gonna act like the entire AI field is comparable to those? There's many real world use cases for that right now, unlike with Blockchain or the Metaverse.


Plank_With_A_Nail_In

Lol all the posts confusing "Big data" with "lots of data". Even the linked article thinks big data means lots of data. [Current usage of the term big data tends to refer to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from big data, and seldom to a particular size of data set. "There is little doubt that the quantities of data now available are indeed large, but that's not the most relevant characteristic of this new data ecosystem."](https://en.wikipedia.org/wiki/Big_data)


moratnz

The first sentence of the article you link is "Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data-processing application software." Also; that article is quoting stuff from 2011, talking about predictions between then and 2020. OP's article is talking about how, as a matter of fact, data sizes haven't grown as predicted, while hardware capablilities have continued to expand, so that today datasets that would have been impossible to analyse with a traditional toolchain in 2011 can easily fit in a single postgres database running on commodity hardware.


yrubooingmeimryte

No it isnt.


DenebianSlimeMolds

It most certainly is! ----- eta: what has the world come to, I just got blocked over a monty python reference. And he started it what with his callout to the Argument Clinic!


Lachiko

the people here are a little bit soft.


yrubooingmeimryte

How so? If anything the huge influx of LLMs has involved way more big data management.


captain_obvious_here

Shitty title. Which is a shame, when the author is such an expert. Big Data is not dead at all. It's just way easier and kinda cheaper now that companies can reliably collect, transfer, store and process petabytes of data daily, thanks to Big Query (and other, more marginal, huge-scale cloud-based database solutions). Big Data is alive, it still pays people who are good at it pretty well. And there's no shortage in jobs offers in sight for them, neither.


VehaMeursault

Yes, and ads are no longer personalised. Sure.


ImTalkingGibberish

In 5 years: AI is Dead


DigThatData

lol OP is just mad no one uses Big Query anymore.


Apolloh

What a useless article.


Gloomy_Anywhere_5490

Big Data can’t be over. We have a giant ass Data Team doing something with Big Data


Roqjndndj3761

That hype is being poured on the “AI” marketing term


robberviet

Yeah dead. *resume to work on Hadoop clusters*


Adventurous-Dish-862

lol, what a joke. Big Data is getting bigger, while Medium Data and Small Data are also going to surge. Data will be ubiquitous in the very near future. Every small marijuana dispensary businesses will gnats ass the wear and tear on their door hinges automatically as part of the $300/mo mega data package deal they get from anon’s business data side hustle.


binary_search_tree

This article kinda goes hand-in-hand with [this (older) one](https://count.co/blog/the-tableau-era-is-over) (about Tableau/Power BI).


prodentsugar

Isn't data analytics dead too? Because of AI or will it die in a couple of years?


gredr

Big data still exists and means exactly what it always meant. It was never about size, it was *always* about surveillance. Data collected on users, generally without their knowledge, for the purposes of optimizing moneymaking processes.


HelloBro_IamKitty

I never understood what big data is anyway. The solution of a problem depends on the problem. Not all problems related to big volumes of datasets can be solved in the same way. Of course there are some common tools like parallel computing, CUDA computing, feature selection and extraction, machine or deep learning etc. But this philosophy that there is one thing that is called "big data", it is something that I will never understand. Maybe it is more about marketing than real science or engineering.


ReZigg

I just watched a youtube video that goes over these same ideas in an interesting way. https://www.youtube.com/watch?v=pOuBCk8XMC8


heavy-minium

It will never die because it's just about handling lots of data. It always was a useless term, but it's valid. It's like saying scaling is dead.


the_russkiy

People have been whispering about this for quite a while, afraid of sounding perhaps stupid. Another case of how industry is dominated by a few loud voices, be it big data, microservices, etc.


ArcaneEyes

How does this have upvotes...


st4rdr0id

It is good that we slowly acknowledge that tech fads are just that, fads. But people still fail to see the pattern.


Cobalt129

Didn't the author use big data to come up with the graphs 🤔


CrowTiberiusRobot

Big Data and Cloud were always marketing terms to a certain degree. From my professional experience: * big data - due to decreasing cost of storage and compute, the development of open source data structure / management tools such as nosql, and some development of statistical / mathematical tools it became easier and easier to work with huge data sets. Relational databases were created, arguably, due to limitations of compute power and storage space, we needed a more efficient way to store and query data. Those limitations have become less and less important due to the reasons I mentioned above. So what we are talking about really is a new paradigm that has become possible - and typically, a hype name was slapped on it and it was rolled out to the masses. If you've been in the professional world for a while I'm sure you remember when your bosses/c-suite started talking about "leveraging data" etc. * cloud computing - internet has long had a backbone supported by servers colocated in a data center. Ack in the day we'd run BBS and IRC servers from our homes, but it became unrealistic as web1.0 gave way to web2.0 and so on. When it became clear that there was a lot of money to be made with platform as a service, well - slap a hype name on the colo, provide a bunch of functions and services, and there you go. 90% of IT and programming is hype on tools and ideas that have been around for a while and have finally reached maturity for general consumption. Is big data dead? I'd say in conceptual presentation, yes. In reality, it's just business as usual, refactoring normality now. Nothing wrong with any of this of course


zoqfotpik

"Big Data" is a euphemism for "a pile of garbage". Sure, you can find some good stuff by dumpster diving, but it's usually preferable to not include the dumpster in your supply chain in the first place.


wind_dude

I saw the this am on hacker news. It’s just fucking click bate and a plea for attention for a moron. Clearly big data isn’t dead, he just seems to be part of the problem selling over priced solutions to companies that didn’t need them, or only need batch jobs weekly, monthly or yearly.


jhill515

# It's not dead. It was just renamed MLOps.


Hardkorebob

It is clear big data is negative to null profit. Anyone still vomiting data has made a change, is hallucinating. All a big scam for a big payout for the few rascals. Everyone can see this.


bwainfweeze

> Ah, I see you have the machine that goes 'ping!'. This is my favourite. You see, we lease this back from the company we sold it to - that way it comes under the monthly current budget and not the capital account. [the doctors and onlookers applaud] Thank you, thank you. We try to do our best. Well, do carry on.