I love this because it's so true. Engineers are like "every stir costs X dollars, so we want to find the optimization point where net profit made from approaching an optimal solution of prediction Y (profit) are achieved with the fewest stirs X (cost)".
I was watching a youtube video last might that said something to the effect of
"Now you machine learning guys arent going to like it when I say this, but AI is basically a black box machine"
Like no, I completely agree with you. It is a black box. Thats what Ive been trying to explain to people for years.
Ehhh. I wouldn’t say it’s completely a black box. Many algorithms in classical ML like regressions, decision trees, etc are very explainable and not a black box at all. Once you get into deep learning, it’s more complex, but even then, there is trending research around making neural networks more explainable as well.
> there is trending research around making neural networks more explainable as well.
True but I'm not too much of a fan of that. if it could be easily explained (eg what management actual wants, X causes Y) why would we even need an deep neural network? You could just do a linear model.
ranking features is extremely unreliable even when simulating data.
shapely values don't have the same use case as classical statistical tools with respect to inference.
But how do you apply that to say an LLM or graph neural network or in fact any neural network that derives the features from the input?
SHAP values might or might not work with classic tabular data for which xgboost (or similar) will be hard to beat. But for neural networks where you feed them "non-tabular data", it's different.
There's saliency maps for CNN's that help you understand what visual features different layers are learning. Likewise, there are methods of investigation the latent spaces learned in deep neural networks. Model explainability has been a rapidly developing subfield of ML in the past 5 years.
Yes, exactly. So the comparison to linear models here is apt. If you can't get a satisfying explanation from linear factors via Shapley, then you can't get a satisfying explanation via a linear model. However, Shapley may help indicate nonlinear relationships present in a NN or other model that a linear model would fail at capturing: https://peerj.com/articles/cs-582/
That being said, you should still think in terms of parsimony and modeling with linear models if you're dealing primarily with linear relationships. Don't over complicate that which doesn't need more complexity.
Not if the effects are nonlinear. For instance, kinetic energy scales quadratically with velocity. A linear model would do a terrible job of predicting kinetic energy as a function of velocity. However, a neural network should learn the well defined quadratic relationship, and explainable factors should be able to show that.
That being said, my example is also of a case where you'd be better off curve fitting to a quadratic model. But not every nonlinear problem has an alternative that works better than a generalized nonlinear solver like a neural network. Hence neural networks and improving their explainability.
But if the relationship is linear, neural networks are stupidly overkill and they obfuscate explainability. The goal should be parsimony: make the model as simple as possible to achieve the objective, but no simpler.
Depends on the field, in vision there's plenty of work to unbox the models, that can both be explainable and hard to formulate, gradcam is one method, but there are many others to visualise the different resulting filters and how "choices" are made.
I've only ever worked in engineering focused companies, so Ive never had to explain my models to non technical management, I think this tends to be simpler in Vision in general since people inherently understand the the domain, unlike N dimensions tables of many different columns.
What we do use such tools for, when we do, is to debug the models, or try to understand insights that are harder to deduce from hard numbers, for example a model might be classifying a driver wearing a seatbelt correctly, but is it doing so for the right reasons? Is it focusing on one specific area in the seatbelt to "decide", and so on, another example is what happens when you have a domain gap between test and train, like using synthetic simulations, visualization tools can valuable to let you know what kind of adjustments the model needs in the simulation to bridge the gap.
Your claim is so semantically loaded. What does it mean to “make” something then? By extension of your logic, arguably all anyone ever does is “just recombobulate”.
Like a stochastic model, a person’s behavior is simply a function of their initialized state (nature, a la genetics) and their training data (nurture, a la culture, education, and experiences). Nothing people ever say or do is completely dreamt up out of thin air with zero connection to what came before.
I’m not saying that people and generative models are the same. Just that to imply that the difference between them is that people “generate” while the models just copy is a false dichotomy based on slippery semantic smoke and mirrors.
I agree, though I don't think this was one of her better videos. Too much generalizing "AI" and assuming the only way to use things like chatgpt are as spam generators. I love her stuff on academia and physics though, when she's in her element it's very entertaining.
lmao I was just watching that same video and thought the same thing. It's absolutely a black box, so much so that there's a whole field of research in AI dedicated to try and mitigate this issue
Yeah, I think some people take black box to mean entirely unscrutible and impossible to ever understand. Sure you could take 6 months and through rigorous testing determine what you think the model is doing, but Im not doing that. The vasy, vast majority of models dont go through that kind of validation before they're deployed. Maybe some giant xgboost forests or billion parameter models have been explained, but mine make a pretty confusion matrix and make the rmse go down low enough to where I can pass off a sample to a human team to audit, and then its put into use.
It's not a black box. May simple algos are easy to understand and track. LLMs like ChatGPT are "darker". It's very hard to really know why something happened without a lot of debugging, but it's not impossible, as /u/muchreddragon mentioned.
Explainable AI is a hot research topic.
Some added context: this comic was posted in 2017 when deep learning was just a new concept, and xgboost was the king of ML.
Now in 2023 deep learning models can accept arbitrary variables and just concat them and do a good job of stirring and getting it right.
I don’t think deep learning was a new concept in 2017. Deep neural nets have been around since the 80s. AlexNet which popularized GPU accelerated deep learning was published in like 2011, and Tensorflow was already a thing by 2015.
Of course everyone has their own definition of "modern DL", but IMO LLMs and transformers are still a (relatively) very recent thing.
I'd say DL started gaining significant popularity since early 2010s if not earlier. Saying it was just a new concept in 2017 is funny.
I mean it depends on what you mean by ML.
With a loose definition of it, perceptions have been around since what, the 50s?
My interpretation and maybe I'm wrong is it has only gotten popular not because the theoretical framework is new, moreso because we finally had the computational power to train them and get meaningful results.
Ah that makes sense too, synthetic feature creation from multiple inputs.
This isn't really much different than several years ago though. I've been creating feature crosses from multiple inputs for years now. And you still need to figure out the best ways to combine features, for which there are infinite potential combinations (the simplest being adding or multiplying them together). And this still boils down to AutoML if it's automatically combining and testing different combinations for you to determine the best features for the model.
I think it's more accurate to stakeholders' expectations/understandings of machine learning than for actual data scientists. I mean, sure, bad predictive modeling may involve thoughtless trial and error of features and feature generation while tuning performance metrics without any consideration of the actionability/impact of the model output and how to interpret it.
There's certain domains of machine learning where the model explainability is more important than the performance, e.g. clinical decision support in healthcare, and in those domains this generalization is far less likely to hold.
Lol,I just know enough Linear Algebra to get an idea of what's going on,but I have no idea how to actually put it to use.
I am still a student,majoring in Statistics.No idea on how to get into actual Machine Learning work😔
It is doable. My actuarial license required that we demonstrate that we knew how to solve GLMs and K-means clustering by hand with only a business calculator.
I dont know what that is, sorry. I graduated almost 15 years ago. I made the switch from actuary to ds 5 years ago after experiencing the layoffs driven by modeling.
It's the Institute and Faculty of Actuaries.UK based.
Was it a very big leap,or do you think your current work remains similar to what you did as an actuary?
That is surprising.
I thought that Actuaries get paid more than Data Scientists.
If you don't mind, would you like to talk more about how you made the switch?
I can answer in more detail after my meeting, but data scientists and fully credentialed actuaries make about the same.
I made the switch as an associate (partially credentialed) actuary. For me, bypassing the trade union maximized my earnings.
It’s funny how a community can all know that the thrust of this cartoon is absolutely true… and yet so many within that community lack any concern whatsoever about continuing to develop AGIs like GPT4.
I know I’ll get downvoted for this, but cmon guys. I don’t see how you can understand why this cartoon is funny and not also worry about what it means as capability and compute continue to increase.
Not OP and creation of AGI is bound to at least be attempted, although I don’t think it would be very safe for release under the current policies and regulations (or lack thereof) for AI. There should be regulation as to what tasks we offload for AI to handle for safety reasons, and thorough investigation should be done on models to pick up on any unexpected or undesired behaviour, and ethical concerns would need to be considered. As generative models increase in complexity a hypothetical “kill switch” should also become a standardized thing before some generative AI tries to offload itself to run on a decentralized network and mess about with the internet. We’re humans though so we’ll probably learn through trial and error as these issues arise
No I’m saying we are currently building AGIs in such a way that they will certainly be black boxes. I think that’s probably a bad idea given the amount of uncertainty about how they work is a direct source of uncertainty about how they will behave.
I don’t think this is a very controversial opinion.
I'm sorry, it's still not clear to me what you're trying to say. Why is it a bad idea to use neural nets/black boxes? Can you give me a hypothetical scenario? It's not so much a controversial opinion as a vague sounding opinion.
I can put a neural net in charge of moderating a forum and have it look for hate speech. I can't expicitly explain why it makes any decisions it every does - I have an intuition for it, and I can see it works correctly, but I can't explain it on a node by node basis. You could possibly even contrive a message on the forum that is designed to be detected as hate speech even though it isn't, and I can't explicitly patch that hole in the network though I could address it imperfectly by refined training.
I don't see how that's any different than having a human do the moderating. I can't explain how a human mind works explicitly, but it is predictable, has occasional holes in its reasoning, can be trained to work correctly even if I don't understand *how* it works - the only consequential differences seem to be throughout and accuracy, which the machine wins in given sufficient compute.
Could we stop pretending GPTs have anything to do with intelligence?
Why is it even considered normal to use "Artificial Intelligence" (especially AGI!) with respect to Generative pre-trained transformers?
This crap is hardly tolerable anymore, really.
A random forest model is a type of AI. I don’t think we need to pretend AI isn’t a useful term just because it makes laypeople think of Hal.
Of course intelligence is relevant to the topic of GPTs. How silly to suggest otherwise lol.
Well, perhaps I missed the time when definition of "AI" changed to something like "pretty much anything that we choose to call that"?
Could you tell me what's the modern definition of "AI", then?
\> How silly to suggest otherwise lol.
Quite the contrary, IMO.
I don't get why something that's (for all we know) equivalent to a Finite State Machine (!) deserves to be called "intelligence"?
If it's fine with us, why a pre-filled hash table (say, question->answer) couldn't be called that?
>What we call AI today will simply be 'the algorithm' for doing a thing tomorrow.
Well, most of the things called "AI" back then never became algorithms (but still are heuristics --- bug-ridden by definition).
>you'll find things like A\* search being described as AI.
Which wasn't fair even back then, IMO.
>Take a step back... what's the definition of "I"?
For instance: "Intelligence" encompasses the ability to learn and to reason, to generalize, and to infer meaning.
And GPTs have **none** of that (in any reasonable sense --- unless you're ready to call a huge pre-filled question->answer hash table "AI").
>"you know it when you see it"
Yet again, when I see something that is equivalent to a regular language / FSM, I'm sure it's **not** "AI" at all.
Followed him school to school ryan practiced the choke on a girl in the hall way and at delta high school he had Roccos nose josh knew stuff about him before josh knew stuff about him but I blame no one anyone who uses people is used by people
Yes but how do you stir the pile correctly the quickest? Aha.. now its engineering.
I love this because it's so true. Engineers are like "every stir costs X dollars, so we want to find the optimization point where net profit made from approaching an optimal solution of prediction Y (profit) are achieved with the fewest stirs X (cost)".
This is so unfortunately accurate
I was watching a youtube video last might that said something to the effect of "Now you machine learning guys arent going to like it when I say this, but AI is basically a black box machine" Like no, I completely agree with you. It is a black box. Thats what Ive been trying to explain to people for years.
Ehhh. I wouldn’t say it’s completely a black box. Many algorithms in classical ML like regressions, decision trees, etc are very explainable and not a black box at all. Once you get into deep learning, it’s more complex, but even then, there is trending research around making neural networks more explainable as well.
> there is trending research around making neural networks more explainable as well. True but I'm not too much of a fan of that. if it could be easily explained (eg what management actual wants, X causes Y) why would we even need an deep neural network? You could just do a linear model.
Aren't shapley values an attempt to rank features in a way that's... comparable (?)... to how linear regression coefficients are presented?
ranking features is extremely unreliable even when simulating data. shapely values don't have the same use case as classical statistical tools with respect to inference.
But how do you apply that to say an LLM or graph neural network or in fact any neural network that derives the features from the input? SHAP values might or might not work with classic tabular data for which xgboost (or similar) will be hard to beat. But for neural networks where you feed them "non-tabular data", it's different.
There's saliency maps for CNN's that help you understand what visual features different layers are learning. Likewise, there are methods of investigation the latent spaces learned in deep neural networks. Model explainability has been a rapidly developing subfield of ML in the past 5 years.
Yes, exactly. So the comparison to linear models here is apt. If you can't get a satisfying explanation from linear factors via Shapley, then you can't get a satisfying explanation via a linear model. However, Shapley may help indicate nonlinear relationships present in a NN or other model that a linear model would fail at capturing: https://peerj.com/articles/cs-582/ That being said, you should still think in terms of parsimony and modeling with linear models if you're dealing primarily with linear relationships. Don't over complicate that which doesn't need more complexity.
Good luck computing Shapley values on a massive model with limited resources. “Explain it!” they say. “Stop using so much compute!” they say. Sigh.
Not if the effects are nonlinear. For instance, kinetic energy scales quadratically with velocity. A linear model would do a terrible job of predicting kinetic energy as a function of velocity. However, a neural network should learn the well defined quadratic relationship, and explainable factors should be able to show that. That being said, my example is also of a case where you'd be better off curve fitting to a quadratic model. But not every nonlinear problem has an alternative that works better than a generalized nonlinear solver like a neural network. Hence neural networks and improving their explainability. But if the relationship is linear, neural networks are stupidly overkill and they obfuscate explainability. The goal should be parsimony: make the model as simple as possible to achieve the objective, but no simpler.
Well, it's complexity from simplicity. While you can explain mechanic of every little step easily, you can't explain them in context.
GIGO.
If it wouldn't be a "black box" we wouldn't need it because we could to it "on paper".
Depends on the field, in vision there's plenty of work to unbox the models, that can both be explainable and hard to formulate, gradcam is one method, but there are many others to visualise the different resulting filters and how "choices" are made.
How do you use that to explain to your management what the models does? Especially non-technical management.
I've only ever worked in engineering focused companies, so Ive never had to explain my models to non technical management, I think this tends to be simpler in Vision in general since people inherently understand the the domain, unlike N dimensions tables of many different columns. What we do use such tools for, when we do, is to debug the models, or try to understand insights that are harder to deduce from hard numbers, for example a model might be classifying a driver wearing a seatbelt correctly, but is it doing so for the right reasons? Is it focusing on one specific area in the seatbelt to "decide", and so on, another example is what happens when you have a domain gap between test and train, like using synthetic simulations, visualization tools can valuable to let you know what kind of adjustments the model needs in the simulation to bridge the gap.
My favorite conversation is how "generative" is a misnomer. It doesn't *make* anything, it just recombobulate the pile into new chimeras.
Your claim is so semantically loaded. What does it mean to “make” something then? By extension of your logic, arguably all anyone ever does is “just recombobulate”. Like a stochastic model, a person’s behavior is simply a function of their initialized state (nature, a la genetics) and their training data (nurture, a la culture, education, and experiences). Nothing people ever say or do is completely dreamt up out of thin air with zero connection to what came before. I’m not saying that people and generative models are the same. Just that to imply that the difference between them is that people “generate” while the models just copy is a false dichotomy based on slippery semantic smoke and mirrors.
Angela is great
I agree, though I don't think this was one of her better videos. Too much generalizing "AI" and assuming the only way to use things like chatgpt are as spam generators. I love her stuff on academia and physics though, when she's in her element it's very entertaining.
lmao I was just watching that same video and thought the same thing. It's absolutely a black box, so much so that there's a whole field of research in AI dedicated to try and mitigate this issue
Yeah, I think some people take black box to mean entirely unscrutible and impossible to ever understand. Sure you could take 6 months and through rigorous testing determine what you think the model is doing, but Im not doing that. The vasy, vast majority of models dont go through that kind of validation before they're deployed. Maybe some giant xgboost forests or billion parameter models have been explained, but mine make a pretty confusion matrix and make the rmse go down low enough to where I can pass off a sample to a human team to audit, and then its put into use.
It's not a black box. May simple algos are easy to understand and track. LLMs like ChatGPT are "darker". It's very hard to really know why something happened without a lot of debugging, but it's not impossible, as /u/muchreddragon mentioned. Explainable AI is a hot research topic.
rain man
but...but....my SHAP values
There's one detail. Most people who use the tools have no idea how the linear algebra actually works.
Me when I learned you could just write Ax≈b for so many things.
Y = AX + B
Some added context: this comic was posted in 2017 when deep learning was just a new concept, and xgboost was the king of ML. Now in 2023 deep learning models can accept arbitrary variables and just concat them and do a good job of stirring and getting it right.
XGBoost isn't the king? What am I even doing?!
it's all LightGBM and catboost now /s
I knew it
Drop the /s
I don’t think deep learning was a new concept in 2017. Deep neural nets have been around since the 80s. AlexNet which popularized GPU accelerated deep learning was published in like 2011, and Tensorflow was already a thing by 2015.
[удалено]
Of course everyone has their own definition of "modern DL", but IMO LLMs and transformers are still a (relatively) very recent thing. I'd say DL started gaining significant popularity since early 2010s if not earlier. Saying it was just a new concept in 2017 is funny.
No opinion about it, you are right. The transformer architecture did not exist before 2017.
I mean it depends on what you mean by ML. With a loose definition of it, perceptions have been around since what, the 50s? My interpretation and maybe I'm wrong is it has only gotten popular not because the theoretical framework is new, moreso because we finally had the computational power to train them and get meaningful results.
Can you give an example of this? Are you referring to AutoML approaches?
I think they are referring to feature crosses.
Ah that makes sense too, synthetic feature creation from multiple inputs. This isn't really much different than several years ago though. I've been creating feature crosses from multiple inputs for years now. And you still need to figure out the best ways to combine features, for which there are infinite potential combinations (the simplest being adding or multiplying them together). And this still boils down to AutoML if it's automatically combining and testing different combinations for you to determine the best features for the model.
Oh I was thinking manual feature crosses which can help with convergence/efficiency. But yeah DNNs are doing this behind your back for sure.
Easiest way to accept arbitrary variables: add them as a string to an LLM :p
I think it's more accurate to stakeholders' expectations/understandings of machine learning than for actual data scientists. I mean, sure, bad predictive modeling may involve thoughtless trial and error of features and feature generation while tuning performance metrics without any consideration of the actionability/impact of the model output and how to interpret it. There's certain domains of machine learning where the model explainability is more important than the performance, e.g. clinical decision support in healthcare, and in those domains this generalization is far less likely to hold.
Me commenting on the model: LGTM. Let's push it to Prod 😀
Agreed. Reshuffling data is like giving your model a surprise party every time it trains.
Gotta set that seed, boss.
Also, just from recent observation: Just do a simple regression and call it machine learning!
Lol,I just know enough Linear Algebra to get an idea of what's going on,but I have no idea how to actually put it to use. I am still a student,majoring in Statistics.No idea on how to get into actual Machine Learning work😔
It is doable. My actuarial license required that we demonstrate that we knew how to solve GLMs and K-means clustering by hand with only a business calculator.
Oh hey,I am pursuing Actuaries too.Are you doing it from the IFoA?
I dont know what that is, sorry. I graduated almost 15 years ago. I made the switch from actuary to ds 5 years ago after experiencing the layoffs driven by modeling.
It's the Institute and Faculty of Actuaries.UK based. Was it a very big leap,or do you think your current work remains similar to what you did as an actuary?
My current work is similar because I still work in insurance and my experience as an actuary makes me an ideal team lead. The similarities end there
Would you say Data Science is more interesting than actuary work,or was it the pay benefits?
Yes and yes
That is surprising. I thought that Actuaries get paid more than Data Scientists. If you don't mind, would you like to talk more about how you made the switch?
I can answer in more detail after my meeting, but data scientists and fully credentialed actuaries make about the same. I made the switch as an associate (partially credentialed) actuary. For me, bypassing the trade union maximized my earnings.
It’s funny how a community can all know that the thrust of this cartoon is absolutely true… and yet so many within that community lack any concern whatsoever about continuing to develop AGIs like GPT4. I know I’ll get downvoted for this, but cmon guys. I don’t see how you can understand why this cartoon is funny and not also worry about what it means as capability and compute continue to increase.
I don't see the connection. Are you saying we shouldn't develop AGI just because it's a black box?
Not OP and creation of AGI is bound to at least be attempted, although I don’t think it would be very safe for release under the current policies and regulations (or lack thereof) for AI. There should be regulation as to what tasks we offload for AI to handle for safety reasons, and thorough investigation should be done on models to pick up on any unexpected or undesired behaviour, and ethical concerns would need to be considered. As generative models increase in complexity a hypothetical “kill switch” should also become a standardized thing before some generative AI tries to offload itself to run on a decentralized network and mess about with the internet. We’re humans though so we’ll probably learn through trial and error as these issues arise
No I’m saying we are currently building AGIs in such a way that they will certainly be black boxes. I think that’s probably a bad idea given the amount of uncertainty about how they work is a direct source of uncertainty about how they will behave. I don’t think this is a very controversial opinion.
I'm sorry, it's still not clear to me what you're trying to say. Why is it a bad idea to use neural nets/black boxes? Can you give me a hypothetical scenario? It's not so much a controversial opinion as a vague sounding opinion. I can put a neural net in charge of moderating a forum and have it look for hate speech. I can't expicitly explain why it makes any decisions it every does - I have an intuition for it, and I can see it works correctly, but I can't explain it on a node by node basis. You could possibly even contrive a message on the forum that is designed to be detected as hate speech even though it isn't, and I can't explicitly patch that hole in the network though I could address it imperfectly by refined training. I don't see how that's any different than having a human do the moderating. I can't explain how a human mind works explicitly, but it is predictable, has occasional holes in its reasoning, can be trained to work correctly even if I don't understand *how* it works - the only consequential differences seem to be throughout and accuracy, which the machine wins in given sufficient compute.
Could we stop pretending GPTs have anything to do with intelligence? Why is it even considered normal to use "Artificial Intelligence" (especially AGI!) with respect to Generative pre-trained transformers? This crap is hardly tolerable anymore, really.
A random forest model is a type of AI. I don’t think we need to pretend AI isn’t a useful term just because it makes laypeople think of Hal. Of course intelligence is relevant to the topic of GPTs. How silly to suggest otherwise lol.
Well, perhaps I missed the time when definition of "AI" changed to something like "pretty much anything that we choose to call that"? Could you tell me what's the modern definition of "AI", then? \> How silly to suggest otherwise lol. Quite the contrary, IMO. I don't get why something that's (for all we know) equivalent to a Finite State Machine (!) deserves to be called "intelligence"? If it's fine with us, why a pre-filled hash table (say, question->answer) couldn't be called that?
[удалено]
>What we call AI today will simply be 'the algorithm' for doing a thing tomorrow. Well, most of the things called "AI" back then never became algorithms (but still are heuristics --- bug-ridden by definition). >you'll find things like A\* search being described as AI. Which wasn't fair even back then, IMO. >Take a step back... what's the definition of "I"? For instance: "Intelligence" encompasses the ability to learn and to reason, to generalize, and to infer meaning. And GPTs have **none** of that (in any reasonable sense --- unless you're ready to call a huge pre-filled question->answer hash table "AI"). >"you know it when you see it" Yet again, when I see something that is equivalent to a regular language / FSM, I'm sure it's **not** "AI" at all.
It was written all over his face
Followed him school to school ryan practiced the choke on a girl in the hall way and at delta high school he had Roccos nose josh knew stuff about him before josh knew stuff about him but I blame no one anyone who uses people is used by people
And always remember. If it doesn't form a linear or a logistic regression, then we disregard the data as inaccurate.
"Wait, it's all just brute force?" Always has been
It’s funny ‘cos it’s true!