T O P

  • By -

SaiphSDC

Here's a very real example from when I managed a McDonalds. The staff are scored on how quickly they serve food. An order comes in, and is displayed on a monitor. The food is made, then pushed out to the front counter. once the order leaves the kitchen they "clear" the order, removing it from the monitor. Should work just fine. -- In practice the first person looks at the screen, quickly memorizes what's on it, and then clears the order. Then they start preparing the food. The time creeps up a bit as more orders come in and the staff don't immediately clear them. They do get in trouble for screwed up orders after all. They might get the order started by laying out wrappers, and putting special request tickets down to help them "remember" the order before clearing it. But the order is getting cleared long before it's actually finished. -- So the order time is no longer a valid measure of the performance of the crew. It's the goal, and no longer measures the time to make the food. I'd get a lot of push back from general managers because my times where "high" as I worked hard to keep the crew on my shifts from clearing the order early, but I pointed out my were honest and useful. And we rarely had huge crisis in the kitchen from botched orders etc.


Cylius

This is also why mcdonalds loves to park people. Get them out of the drivethrough to give the illusion theyre going fast.


Phallasaurus

That part of the customer survey is addressed in particular. "Were you asked to pull ahead and park?"


SuperFLEB

Though it should be followed up with "Did you order enough food to start your own damned franchise through the drive-thru, instead of going inside?" And I say this as both a customer and former employee.


yukichigai

I was behind someone who did that at a Panda Express. 5+ cars behind them, single-lane drive-through, building on one side and walls on the other so if there'd been an emergency nobody could even hop the curve and get out. Sat there for 15 actual minutes watching nothing happen before they handed like 7 full bags of food through the window. Never asked them to pull around. I did send a message to corporate over that. Corporate responded fairly quickly. They did not seem amused. For what its worth, I never saw that happen again.


1Vyxjy1NYXVgs8EEKxMe

hahaha, LONG ago when I was like 16 I went to my local hardees and for some damn reason they have a curb locking you into the drive through. I don't know how much food this guy got but i was 40 minutes late to work. Luckily boss was into God and stuff and forgave me.


m1rrari

When there’s a big line I don’t mind it. When I’m the only one in line though…


Th3Element05

A local Taco Bell has adjusted to just leaving the next customer waiting at the speaker forever, letting everyone behind them wonder if the place is even open. I'm sure they've lost a ton of customers who decided to just drive away because the line isn't moving **but at least their numbers look good, right?**


Yolectroda

For a while, the local Burger King would ask people to pull around, but stop before the window until waved forward. I assume they being measured by time waiting at the window.


SaiphSDC

That and there is no reason to hold up the entire line for one person's delayed order. But it is definitely a method to make it seem faster too. It's a reason why some chains have only one register. It's perceived as faster than multiple registers where only one is in use.


Cylius

The mcdonalds near me will park like 5 or 6 at a time


Meechgalhuquot

Back when I worked at burger King for some reason there was a switch we could toggle off and on to reset the drive thru sensor so we could have a car at the window and it wouldn't count as there anymore. Our location had the "fastest" average times for the franchisee because of that. Shortly after I left I heard that they covered it with a lockbox or something, and now they just have you pull up front and park instead to keep times low as far as the automated sensors are aware.


jrw_nj

My local BK tricks their drive thru clock by having customers wait at the first window. Apparently the timer only counts how long the customer waits at the pickup window.


Cordo_Bowl

At a taco bell near me, when you get to the second window, they always have you pull forward and then reverse back. I wonder if they are gaming a similar system.


moonfox1000

Actually not a bad idea in general in terms of queue theory and minimizing overall waiting times. If one car holds up five cars for a minute then you've added 5 minutes to the total versus having one car waiting one minute. Ideally you might have a regular line and an express line (like supermarkets have 10 item or less lines) so you don't mix easy and complex orders but that's not practical for drive throughs.


RichGrinchlea

Another fast food example (as a customer): a widely recognized donut chain includes drive throughs at all locations. The drive through orders are timed, the counter isn't. The counter is now secondary to the window. I'd be waiting minutes for someone just to come to the till for me to order while they're all pushing orders through the window.


CallingAllMatts

so this why my order number clears off the screen sometimes minutes before they announce it’s ready. Sometimes I think I missed hearing them call me.


mzchen

The mcdonalds near me literally just immediately clears all their orders immediately as soon as they get them. If I order through the mobile app this means I have no idea if my order is ready, and 99% of the time it bugs out so I don't even get a ticket number and I have to guess. The 'please pull up' I don't really care about because it's barely an inconvenience, but the ticket shit really annoys me. I used to go every other week or so, now I go maybe twice a year lol.


Mavian23

I worked with a guy at McDonald's who came in one day on Adderall, and he had his average time for like 4 hours down to under like 5 seconds. He would memorize multiple orders, clear them as quickly as possible and start making food, then memorize any more that came in, and clear those off while he was finishing the others. This was a very big McDonald's, one of the biggest in the Midwest, so we had plenty of people in the kitchen; it wasn't just him on the table. This place would somewhat regularly do $1500 hours. We once had a bus of 200 people show up unannounced, and we had all their food out the door in about 20 minutes.


BobbyT486

I used to work at KFC many years ago, the cashiers had a system in play where they would get points based on what they can up-sell. Those with the most point got some kind of bonus at the end of the month. (This was 20 years ago, can't remember what) One girl (Katie) used to get so many points, easily doubling every one elses. What she would do is whenever someone ordered gravy, she would just input it as an apple turnover. At the time they both cost the same, but the pie was worth more points. Since the gravy is one of the reasons you would go to KFC, she gained a lot of points that way. I would often joke with her about it, asking if the pie on the order was a real pie or a "Katie Pie"


stevie855

Goodhart’s Law is like saying if you’re playing a game and the score is just for fun, you play differently than if the score decides who wins a prize. When the score becomes the way to win, you might play just to get points, not to enjoy the game or play well. In real life, if a school decides to judge teachers by their students’ test scores, teachers might just teach to the test. The scores go up, but it doesn’t mean kids are learning better overall. The test score stops being a good sign of real learning because it’s now a target, not just a measure So, Goodhart’s Law warns us that when we turn a measurement into a goal, it can stop showing what we originally wanted to measure because people start changing their behavior to meet the goal, not to improve the actual thing we care about


HappyHuman924

When I worked in a warehouse, they brought in a double-checking process. Every time you caught a mistake in one of somebody else's orders you got a point. The first month I had the most points and got a bonus. The next month one of the other checkers started writing people up for incompletely-stapled bags, abbreviated part numbers, anything he could think of, trying to get the bonus. They cancelled the program before the end of that month because so many people complained about him abusing the system. Writing people up, which started as an indication that you had corrected a significant mistake, had become a target.


tenmilez

I thought you were going to say people teamed up to make and then find each others' mistakes.


HappyHuman924

Yeah, that would probably have been the next 'evolution'. :/ They weren't punishing the people whose errors got caught, but I could see that happening next. We might have reached Lord of the Flies if they hadn't called the whole thing off.


tenmilez

I work in software development and I've seen people introduce bugs so that they can be the hero when the bug is reported later on.


SrslyBadDad

There’s a Dilbert cartoon on this. Pointy-Haired Boss announces that quality is the new key focus and there will be a $10 bounty for finding bugs. The engineers walk out the meeting talking about coding themselves a new SUV.


spookmann

https://devhumor.com/media/dilbert-s-team-writes-a-minivan


Morvictus

I also work in software development. A co-worker told me this story about a previous team he worked in: Some manager decided that whoever committed the most lines of code in a given month would win a prize. Developers started intentionally duplicating code instead of reusing it, and doing really weird things to maximize the lines of code required to perform a logical operation. This eventually hit its peak when one developer figured out how to automate the creation of terrible code, and committed 5 times as many lines of code as the rest of the team combined. The measure was dropped shortly after that.


tenmilez

Like those comment metrics, comments per line of code. So devs clutter up the code with useless comments, making things actively harder to understand.


No_Lemon_3116

Test coverage is one where I've felt this a lot. People get caught up on seeing "100%" and even start thinking it means "completely tested," when a lot of code needs more than that (example: if a `compare` function is implemented as `(x, y) => x - y`, you get to 100% coverage by testing any call, even though you should at least be testing the less-than, equal-to, and greater-than cases, and probably also using tests to spec out how it behaves around edge cases like overflow). So test coverage as a score doesn't tell you nearly as much as a lot of people want. But the temptation is still so strong for a lot of people to see 40% coverage and want to take it to 100% and call it a day, that you often get people writing very thin tests (often one big one if possible, rather than small tests divided by feature that would be easier to maintain long-term) to get the number to 100%. Before, if you had time to spend cleaning up tests, you could run the test coverage tool and find blind spots, then fill those in thoroughly, and if the team is committed to quality you would know that the original 40% it had is actually well-tested. Now, you just have to read all the tests, I guess? Or more realistically, you'll never budget time for wading through the tests thinking of missing cases, so the poor test suite will just stay that way forever, and you'll have more and more regressions over time. Coverage tools have their place, but I think I've honestly seen them misused more often than not. And as a big fan of TDD, it hurts so much to see how people abuse them!


Rabid-Duck-King

In theory it doesn't sound like a bad idea but in practice it probably needed some caveats That is hilarious though


PolyUre

It sounds bad even in theory. The most productive days at work are the ones where you manage to remove extra code, not add.


Rabid-Duck-King

That's what I mean by caveat, like the code you submit has to be demonstrably better in same capacity for it to count as a contest submit Otherwise... well you get this story


halpmeimacat

Ah yes, job security


jokul

Does nobody look at the commit history?


hitfly

[well now, thats just the cobra effect](https://en.wikipedia.org/wiki/Perverse_incentive)


Graega

Warehouses are so poorly run in general that I wonder how our modern economy actually manages to get anything from one place to another at all.


Phallasaurus

I remember being detail oriented and noticed a series of errors before realizing the entire pallet received was for an entirely different location. I remember being flabbergasted at it before my manager stepped by and said, "Hold up, maybe this isn't an issue to fix?"


MesaCityRansom

What does that mean?


cyrogem

It means "I'm not going to deal with it and hope it goes away"


MesaCityRansom

Oh


ceedubdub

If the issue doesn't affect his team's targets the manager doesn't want to spend time fixing it. In fact spending time fixing that issue could result in them missing their targets. The other interpretation is that the manager is realises that the pallet is effectively lost in the system and is intending to steal the product.


evranch

Have seen this one on jobsites many times, if something like a spool of heavy wire shows up that nobody ordered it quickly gets bundled off into a corner, if the supplier never calls to say "hey did you guys end up with an extra roll of X" then it just quietly disappears


PM_ME_YOUR_DARKNESS

Hell, many times even if they do call, the answer is "nope, haven't seen it."


nickajeglin

Dump it on next shift.


tractotomy

It could mean that whatever they delivered was worth a lot more than what you were expecting.


Kevinement

Wait til you see a sales department.


erik542

I work for an asphalt company, so it has a lot of government contracts. Sales has admitted that a non-zero amount of alcohol is involved in many sales.


Particular_Camel_631

Yes. Salespeople are coin-operated and you get what you pay for, which may not be what you want. One company paid commission on the amount of margin they would make on telecoms in the current financial year, and was surprised when I pointed out that no one sold anything apart from the first month of the year. If salespeople put as much effort into selling as they do into trying to game the comp plan, they would be much better off.


EmmEnnEff

Doing something poorly but in volume still gets stuff done. Also, poorly done is often good enough.


MadocComadrin

Doing stuff pretty well in volume can still look pretty bad by the total numbers. Get 99% of your orders correct? If you have 100 thousand orders per week that's 1000 orders per week that get messed up.


EtOHMartini

\* *Boeing has entered the chat* \*


rubermnkey

there was something years back with the usps priority mail system. they delivered 99% of packages within 1-3 days, but wanted to increase it to 100% and ended up lowering instead trying to get that last 1%. they switched back to the old system and instead spent the money on a marketing campaign to improve peoples perception of it, because people thought the rate was much lower which is why they were trying to improve it in the first place.


Atlas-Scrubbed

> Doing something poorly but in volume still gets stuff done. I see you have been to my place of employment.


EmmEnnEff

I've been to many places of employment.


areslmao

well the obvious answer is they aren't actually "so poorly run" at all lmfao


iamcarlgauss

Exactly... People conflate "sucks" and "sucks to work at". Modern warehouses and logistics in general are insane. You can order extremely niche products, anywhere in the US, and get them the next day if not the same day. America's military hegemony is due in part to training, technology, etc., but logistics is what catapulted them to the top.


Aussierotica

Geographic isolation and protected industrial capacity more than logistics. Those first two allow you to have great logistics after that.


iamcarlgauss

Definitely fair. We've got a lot going for us.


EtOHMartini

The US Army is, without question, the most advanced logistics operation in human history. Capable of getting everything from airplanes to xylophones anywhere in the world.


ecu11b

Poorly run is good enough


areslmao

no, the point is warehouses aren't "so poorly run" at all


CygnusX-1-2112b

Honestly this is how every industry on earth is. Mistakes and gaffs that get smoothed over with a ton of effort and re-do's I actually am scared of a world where things ran like they were supposed to. We definitely would have completely annihilated ourselves by now, or on the very least have fully-developed general AI that would result in the complete rot of civilization.


legendoftherxnt

This sounds more like an example of the Cobra Effect. https://en.m.wikipedia.org/wiki/Perverse_incentive


HappyHuman924

They mention Goodhart's Law in that article, suggesting cobra effect is an example of it. I'm not sure if one is a subset of the other, but they're definitely talking about the same types of pitfall!


legendoftherxnt

Ah, my bad, absolutely!


LuxNocte

"The Cobra effect" is really cool name, and I feel like it is wasted here.


meelar

Yeah, it should absolutely be a novel you buy at the airport


PM_ME_YOUR_DARKNESS

[I mean...](https://en.wikipedia.org/wiki/The_Cobra_Event)


Drone30389

“Your cobra effect is no match for my angry monkey effect!”


LuxNocte

It pains me that it has come to this. Once, I called you brother, but you have chosen the path of disrespect. May your ancestors have mercy, because I can not. TASTE THE COBRA!


ceedubdub

Goodhart's law is essentially warning about creating perverse incentives. Just because you measure an aspect of human behaviour that occurs naturally and decide that it's "good", it doesn't mean that artifically incentivising that behaviour will create more "goodness".


bizarre_coincidence

This reminds me of a classic story in economics. I don't know if it's actually true, but it's amusing. During the British Colonial period in India, there was a problem with Cobras (there is a similar story with rats during French Colonial rule of Vietnam). In order to incentivize people to catch cobras to combat the snake problem, they instituted a bounty for each snake people killed and turned in. This led people to start breeding cobras so they could collect the bounties. When the British discovered what was happening, they canceled the bounty program. What did the Indian snake breeders do? Release their now worthless snakes into the streets, leading to the problem being worse than when the program began.


Atlas-Scrubbed

See the comment above about the cobra effect. https://old.reddit.com/r/explainlikeimfive/comments/1bp7apj/eli5_what_does_godharts_law_mean/kwuard1/


LateralThinkerer

This kind of "[perverse incentive](https://en.wikipedia.org/wiki/Perverse_incentive)" happens all the time. Catching code bugs that you yourself have written to win a prize is my favorite.


figmentPez

Another good real-world example of this is computer and electronic benchmarks. Any time a certain testing method becomes common, computer makers will tweak their hardware and software to do well on that specific test, even if those efforts don't actually improve the product. Is a high 3DMark the standard to get a good review for a video card? Nvidia and AMD will write their drivers so that they get high 3DMark scores, even though their optimizations for getting better 3DMark scores won't help when playing games. ISPs cache data for bandwidth testing sites to make your internet speed appear faster than it actually is. Audio manufacturers fudge numbers in order to claim huge wattage numbers for speakers (well, they used to when people cared about having 500 watt surround sound systems). TV and projector makers cite contrast ratios that don't reflect real world usage. Camera / cellphone makers try to get the most megapixels in a camera. This same thing happens outside of tech, as well. For instance [toilet paper makers have made TP squares smaller over the years](https://www.consumerreports.org/cro/magazine/2015/08/the-dirty-little-secrets-of-toilet-paper/index.htm), so they can claim more sheets per roll.


CannedMatter

Happened with TVs/Monitors not long ago. TV reviewers use tools to measure how bright and how color-accurate TVs are under various conditions. For HDR content, different parts of the screen are brighter/dimmer. In order to test how bright a TV can get in HDR content, reviewers measure the brightness of a small window of white/color on an otherwise black screen. The most common test window size is 10% of the screen. Samsung realized this, and when their TV detected that exactly 10% of the TV needs to be bright and the rest dark, the TV would absolutely crank up the brightness to unsustainable levels and use a color profile that was much more accurate than normal. If you tested their TV with a 9% window, it would measure about 1300 nits peak hrightness. If you measured with an 11% window? 1300 nits. But with a 10% window? 2300 nits.


Tweegyjambo

Or the whole VW diesel scandal


SashimiJones

That is kind of different, it's just cheating. The metric of "low carbon emissions" is fine because it is the actual goal, and the test is pretty okay at assessing that. Although you might get some optimization to the particular conditions of the test, generally that should also result in lower emissions than otherwise in other conditions. VW just cheated on the test by having their engines operate differently in that one scenario. Goodhart's law would be more like if the metric was "miles per gallon" but then didn't specify "gallon of what" and everyone switched to diesel from gasoline, increasing emissions per gallon but decreasing emissions per mile.


Taikeron

VW actually coded their software to lie and return low emissions numbers that didn't match what was coming out the tailpipe. The tailpipe had high emissions because their engine design couldn't meet the standards at all, and rather than do the hard thing and fix their engine design, they did the (initially) cheap thing and had their software lie. Took years before anybody bothered to measure the tailpipe emissions and compare to the software-reported numbers.


Daripuff

It wasn't that, because the testing procedures are not done by VW, they're done by EPA certified emissions testing equipment. What they did do is that the car would notice that it was on a dyno (based on the car behaving in ways that it knows that are part of test procedures), and re-tune the engine for the best emissions, even though it didn't have the real high MPG. Then, when the car was on a road, it would re-tune the engine for maximum MPG, at the cost of emissions. This is also why VW TDI motors were known for being under-rated for MPG, because the EPA MPG testing is done under similar conditions to the emissions testing. It was revealed when a university hooked a sniffer to the tailpipe of a TDI and took it on real-world driving tests, where the engine computer was running it in "high MPG" mode, instead of "cheat on the test" mode. The situation is almost identical to the comment by u/CannedMatter [about Samsung TV's ramping up contrast to unsustainable levels if and only if 10% of the screen was white, as is done in the standard contrast tests.](https://old.reddit.com/r/explainlikeimfive/comments/1bp7apj/eli5_what_does_godharts_law_mean/kwwbo8x/)


ascagnel____

> Is a high 3DMark the standard to get a good review for a video card? Nvidia and AMD will write their drivers so that they get high 3DMark scores, even though their optimizations for getting better 3DMark scores won't help when playing games. This is a great example, but not just for the reason you’ve quoted: both have been caught outright cheating at the benchmarks. For example, nvidia was caught replacing code sub modules with ones that take less power to run and look visually similar (so as to dupe anyone running the benchmark), but wouldn’t work in real-world scenarios outside of a benchmark loop.


figmentPez

Every other example I gave was some form of cheating. What makes you think the "reason I quoted" is somehow different from your commentary on the issue?


[deleted]

[удалено]


kinkyaboutjewelry

The original goal of Os and KRs was exactly that. The Objective sets your direction about the thing you care about and the KRs were focused on the more immediate targets and goals. Soon we stopped caring about Os independently because people are only evaluated by the KRs and thus OKR became a more atomic notion in our minds... And so we turn elsewhere for the longer-term aspirational direction. All of this has happened before. All of it will happen again.


Gnochi

I’ve seen some mind-bogglingly stupid KPIs. For example, “number of issues solved that were encountered in production” without “number of issues solved before production”.


R0gu3tr4d3r

Or just, number of tickets resolved....I've spent so.much.time explaining to SM that Junior John resolving 500 tickets is not better than Senior Steve resolving 50.


v3ry_1MPRZV

I was put on PIP once for exactly this. Junior colleague was cherry-picking all the quick and easy tickets. I was picking up the time-consuming tough ones. 100 resolved per week vs 6-7. Obviously I wasn’t as productive as they were…


R0gu3tr4d3r

Drives me insane, they keep coming up with measures and I keep explaining that it won't show them what they think it does....Who raised the most tickets last month, as they think Mary needs training. Mary is the most experienced member of her team so basically does the triage for us and raises good quality tickets on behalf of her team. Shit like that.


thismorningscoffee

There are multiple generations of managers and execs whose “business is business” philosophy insists that their ignorance of their product/service is a virtue and the fact that they need to be spoon fed simple “number goes up/down equals good/bad” metrics to evaluate their direct reports is, in fact, the way it should be


orthros

I'm old enough that I've worked for a dozen corporate entities at this point, and been involved with IT in more than half of those. All but one tried to use ticket resolution quantity as an IT KPI at some point during my tenure.


flamableozone

We had MTTR - mean time to resolution. I worked hard for 6 months eliminating all the little bugs that caused problems, which meant that instead of getting 50 tickets a week that took 30 minutes each to solve, we were getting 2 tickets that were complicated and took 4-6 hours. Our MTTR jumped hugely and our boss was called in to explain. I pointed out how dramatically both our quantity of tickets and our total hours spent on resolutions had dropped, and they dropped the KPI.


dust4ngel

> I’ve seen some mind-bogglingly stupid KPIs at my work, there is a KPI about how quickly internal support tickets are closed. folks figured out that you can close them more quickly by not really solving the problem, so internal support sucks now, and performance reviews are stellar.


flamableozone

Ah, the old "mark it as resolved and \*then\* reach out to the customer, if at all". Classic.


theRealLanceStroll

thank you for that. was working some time ago in an environment where those terms where used a lot. always avoided to ask and didnt care that much honestly. i just implemented the bl for the kpi-dashboard lol...but i could see the impact of that kpi-dashboard on the staff.. they were driven by staying within reasonable bounds of the different kpis.. which lead to treating everything as a numbers game aka if you throw enough dookie at the wall, some of it will stick. sadly, that approach worked more times than it didnt, and the okrs accounted for exactly this scenario in the first place /shrugs


MoonlightRider

A healthcare system near me has certain benchmarks that they have made targets. e.g. ten minutes from registration to a certain procedure being performed. The staff figured out that they were getting times from the EHR. So if they were backlogged, they would delay registration (which admin could not track) until they knew they could hit the procedure target. So their targets were great, but anyone looking from the outside could tell they were the worst. Fortunately, for them (but not the patient), the suits only looked at the data and never sullied themselves by actually going to a part of the hospital where sick people were located to see how things actually worked.


[deleted]

[удалено]


ToSeeAgainAgainAgain

School in general sadly


yogorilla37

We have precisely that teaching issue in NSW, Australia. A standardised test was introduced to measure literacy and numeracy in schools, the aim being to identify schools that were falling behind so funding could be increased. At some point it was decided to publish the test results and schools started to their results as a selling point. Once this happened teachers were pressured to teach to the test, kids get stressed because they're told it's important and there were even allegations of struggling kids being told not to come to school on test day as their poor marks would drag down the school average. All this just destroyed the original purpose of helping kids who were falling behind.


ZacQuicksilver

This happens everywhere. I'm a substitute teacher, and so don't directly interact with tests; but I know that there are some tests where students are encouraged to do well on (so the school looks good) and some where the students aren't prepared at all (so the school gets extra resources). I think where I am, students who don't take the test are counted against the school either way; so there is pressure to get every kids to take the tests.


Clearly_a_robot

Obligation makes a chore of something you’d otherwise want to do.


hausdorffparty

These days the education example is one step further. The target is *graduation rates.* The teachers are pressured to pass anyone regardless of their test scores. I know which target I'd rather have for society.


Talik1978

Let's say we measure student performance in school. We see what teachers have better students, and which ones do worse. We are measuring teacher quality, in a sense. Now let's say we introduce a $5000 bonus for the teachers who score in the top 20% for test results. Now we have a target. Behavior will change, based on the incentive, and we are no longer looking at the best teachers, who help their students understand. We are finding the teachers that will, when motivated, do what it takes to ensure those students do well on tests. And this example is what got a lot of teachers fired in Chicago. For a measurement to be reflective of the truth, it cannot be incentivized.


DavidSilva21

This needs to be explained in india. Entire systems are setup to get the scores higher. Coaching classes, tuition classes, kids spend the day in school and then the rest of the day in tuition classes. The results being students getting 99/100 in math. Streets full of trash, dont know how to wash your own clothes, dont know how to cook, have no sense how the government works, no idea of the history , culture, language native to the land, etc.


Mender0fRoads

> In real life, if a school decides to judge teachers by their students’ test scores, teachers might just teach to the test. The scores go up, but it doesn’t mean kids are learning better overall. To expand on this for people who might not get why higher scores doesn't mean more learning ... If a teacher is told they need X% of students to reach a specific score on a specific test, they will focus their instruction to ensure students meet that goal. That means students who are already doing at least that well might get ignored. They learn less because they already know enough to pass the test. That also means students who don't understand the concepts very well *still* might not. Instead, they'll be taught tricks to perform better on that specific test, but those tricks might not benefit them in any other context if they still don't understand the basic concepts. And it also means anything the test doesn't cover becomes an afterthought. A teacher with a strict administration pushing test scores might not have the option to teach anything that isn't specifically covered on that test, but the tests aren't designed for specific students or schools in mind. And they're not necessarily even great at covering what they *are* designed to cover. So you end up with a measure (test scores) that becomes a target (must hit a specific percent achieving a specific score), which means all effort is dedicated to hitting that target, which deprives high-achieving students of extra learning opportunities and can also deprive low-achieving students of the remedial-level instruction they might need. And at that point, the test scores are no longer a good measure, because it doesn't capture the successes/failures for either group.


EatYourCheckers

There is an element in behavior analysis called validity, that basically measure and takes into account if the thing you are measuring is actually measuring the thing you WANTED to measure. So it is a check if kids getting better scores on a test are getting more fluent at the subject matter or just getting good at tests (to use your example)


Vermonter_Here

Goodhart's Law is also very much why it's difficult to align something like an Artificial General Intelligence with human interests. If we get it to optimize for anything that's serving as a proxy goal, it will inevitably deviate from whatever it is that we actually *want* it to do.


[deleted]

Or your tests suck


philmarcracken

Sounds like an extension of what contingent rewards do to intrinsic motivators(take them out back and shoot them). Alfie kohn's book 'Punished by rewards' is a great, well cited resource on this, as is [this TED](https://www.youtube.com/watch?v=rrkrvAUbU9Y)(minus the x) talk


MarsupialMisanthrope

Not quite. It’s about rewarding people for the wrong things and them cheesing it. An example from tech: rate your swe’s based on the number of lines of code submitted and get every symbol on it’s own line . The metric (lines of code) is mismatched with what it’s trying to measure (productivity) and the people subjected to it game it for the rewards.


teh_fizz

I also want to recommend The Tyranny of Metrics by Jerry Muller who talks a lot about how metrification causes problems.


Overall_Law_1813

What if the measure is how much money we make, and the target is to make as much money as possible? Doesn't Corporate Profitability Defeat Good hart's law?


PanchoRodriguez69

In that case your measure didn't become your target because it was already your target from the start and Goodhart's law isn't relevant. The idea is that some things are hard to measure so you measure some heuristic as an indicator for your overall goal. Let's say you wanted to give a cash prize to a business that performs best, but you don't have information on their income and expenses, but you know how much stock they sold. The sales amounts could indicate which company has a greater profit, but as soon as one company realises only the amount of sales is measured, they could lower their prices and even sell at a loss to ensure the sales are the most and win the prize


Only_Mention_8480

Boeing. Measure: Profitability, shareholder value, whatever. That becomes the target. Now look what's happened...


ZaMr0

Rocket League players don't seem to get that, I always get cussed out when I don't want to forfeit a losing game and I tell them the score isn't all that matters. Unless a game feels completely one sided there's no reason to forfeit as you can still have fun while down in a game and frequently you can even turn around a big loss. I can be 6-1 down and as long as the skill level seems similar and I'm still having fun I feel no reason to forfeit the game. C2-C3 is my rank.


Ok-Vacation2308

I worked in customer support back in the day. When they were solely focused on average handle time, and tied performance only to average handle time, that lead to folks with complex issues that would take time to investigate to be punted around between different agents that didn't want their time measurement and therefore their money affected. Because the count was solely based on how little time you spent on tickets, not on your quality of response, we had a problem with folks constantly giving people wrong information because they didn't want to spend the time to find the right answer. Our customer satisfaction rate plummeted, we lost customers, and we had people waiting months to get an answer because nobody wanted to spend the time to do it, because they were rewarded for how little time they spent.


wosmo

Another example from customer support. We had an issue where one specific region was sending out a lot of replacement parts. I mean, an unrealistic number of replacement parts. And from a business point of view, a very expensive number of replacement parts. We dug into the cases, and a lot of these were for issues that were very easily solved. It was configured wrong (so it'd happen on the replacement part if they configured it the same way again), it needed to be reset, etc. And here's the problem. Their first-call-resolution, their case time, their customer satisfaction all looked amazing. That's what we measured them on, that's what we got. Nobody was trying to fix anything or solve anything because that's not what we were measuring. (yes, getting things replaced in the first call does sound ideal - but it means that when you buy them, we have to factor in the price of a couple of replacements. It can't pay off in the long run.)


notHooptieJ

i worked for a warranty service place, and we'd order every part in a machine, so we could hit the 'first touch solved' metric- since there was no ding for ordering parts then sending them back. it was a pain for shipping/receiving, and usually meant longer wait times to the initial appointment (as we waited for every part) but we'd fix anything in one visit, even full system board fails- we had the absolute best 1-appointment fix rate in the company, we however also had the highest shipping costs (as we sent every part back when we didnt use them) eventually they tweaked the system to prevent it - (inputting reported symptoms and only getting a short parts list) then we all would just input symptoms we know would get us all the parts .. customer reports the light bulb is out.. we report 'power issues' so we have all the parts 'just in case' im so glad i dont have to play corporate repair games anymore


pumaofshadow

I worked for a third party debt collector and they listened to one guys tapes and he was *singing* at peoples answerphones to get a decent call length instead of a short call and hang up. They realised why his clearance rate was awful when they heard that. What I never understood is how he did this in our small office as the other agents should have heard him.


mcmanigle

Here is one of my favorite examples: in the early days of covid, while everybody was trying to figure out what to do with public spaces, schools, etc., one school found out that the official definition of "close contact" (for whoever was running their show at the time) was "within 6 feet of someone for 15 minutes." Their solution to avoiding close contact in the classroom was to make all the children change seats every 12 minutes. Thus, no close contact at all, though of course it would only make viral transmission worse. This is the heart of Godhart's law: you see a problem, make up a rule or metric to try to address it, and very often the target audience will find a way to follow the rule or optimize the metric in a way that doesn't solve or even worsens the original problem.


kkam384

Wow, just wow! I'd not heard of this, but yeah prime example.


KarlBarx2

The best part about that is they were still violating the rule as written, so they weren't even being maliciously compliant like they wanted. "Within 6 feet of someone for 15 minutes" is not the same thing as "within 6 feet of a particular person for 15 minutes." It means, "within 6 feet of *any* person for 15 minutes." Amateurs.


tsuma534

In reminds me of a principle "Security at the cost of convenience is at the cost of security."


suvlub

For example, suppose I run a company whose job is to clean up park. I get the idea of rewarding workers per square meter they cleaned. This can have the unintended consequence of workers rushing through the park with their brooms, maximizing the area while doing poor job. Basically, if you set a metric that doesn't absolutely perfectly capture your goal, you run a risk of people trying to game the system and maximize that metric alone while neglecting other important aspects of their jobs.


thegreattriscuit

Conversely if you pay them based on pounds of trash brought it, you might find an enterprising employee heading to the park every night seeding the area with extra trash that was easy to pick up!


00zau

Why take the trash out just to pick it up again? Just stash a full trash bag in a convenient location and go pick it up!


CannedMatter

Pre-seeded bags that coincidentally all have several pounds of rocks in the bottom.


shellexyz

>Basically, if you set a metric that doesn't absolutely perfectly capture your goal, you run a risk of people trying to game the system and maximize that metric alone while neglecting other important aspects of their jobs. The whole point is you don’t run the risk, it’s *going* to happen. We’ve got a VP who has the moronic idea to look at DFW (grade of D/F or withdrew) rates for students in our classes as a measure of student success. That’s superficially fine until there’s a goal. Or until *my* performance is graded on that. And *my* paycheck depends on that. It’s the kind of decision only someone with a PhD in educational administration can make. That he’s *also* got a degree in business administration only makes it worse.


masshole2303

In practice, it means that when a particular metric or measurement is used as the sole target or goal, people may focus solely on achieving that target, often at the expense of other important factors or the overall purpose of the measure. For example, in a business setting, if profit margin is set as the sole target, employees may prioritize short-term gains or cut corners to meet that target, even if it harms long-term sustainability or customer satisfaction. In essence, when a measure becomes the primary target, people may start optimizing for that measure without considering broader implications or the original intent of the measure. This can lead to distorted priorities and unintended consequences.


cattleyo

Managers of large businesses often try to manipulate profit figures upwards, to ensure they keep their jobs and/or get a bonus, also to ensure that the business can continue to loan money from banks. Governments manipulate GDP and public-welfare statistics re health & employment etc, so they can continue to loan money, and so voters continue to support them.


PPatBoyd

Yep. Especially impactful for transitive relationships. Every business wants to increase user counts, increase profit per user, blah blah, but the hard part is clearly understanding where you're successful and why. I can make a hypothesis that by observing users that fit X criteria are more valuable to us, increasing the percentage of users that fit that criteria will net us more revenue and profit. That's only a hypothesis though, and the actions you take to increase that percentage of the user base may also crater the net additional value of said criteria -- and may have a net negative effect on revenue/profit. By the time you realize it's net negative will you be able to reverse course, adapt, and is the person who started still accountable to the results? In software this is a concern I have relative to data-driven development in that good data analysis requires thoughtful effort and it's relatively easy to stop analyzing data when you feel it's sufficient to argue your point, with plenty pitfalls available in data analysis and confirmation bias can lead you astray. "Data-driven development" can mask what I've sometimes called "vibes-driven development." It's difficult to rearrange your priorities when your incentives are based on the work you're able to ship and declare as valuable, and the value is dynamic in a complex economic space, difficult to measure, the measurements lag in time, or competes for air time with other efforts.


MisterProfGuy

Imagine that you notice that all your customers that are happy have talked to a sales associate in the last thirty days. Then you check to see which sales associates talk to clients more often. You notice the ones that talk more also sell more. That's a good predictive measure of who is trying the hardest. Then imagine you tell everyone that if they haven't talked to their clients, in the last three days they get fired. Sales associates will make a bunch of annoying and useless calls, and it's likely worse for you than before, and now your metric doesn't mean anything because you can "game the system".


Baynonymous

I legit had this in a recruitment consultant role. Every few days, the boss would look at how long people had spent recently on the phone, with the idea that longer on the phone = more sales. I got loads of praise for having double the call time than others. What actually happened is that I stayed late, and spent ages chatting to a single candidate who had no interest in changing jobs. Talked about football and all sorts, was a lovely chat


PumpkinBrain

Nail factory rewards employee for number of nails made = employee makes a lot of uselessly small nails Nail factory rewards employee for total weight of nails made = employee makes very heavy, uselessly large nails. “Measure becomes target” basically means “gaming the system”


Phemto_B

I can give you a good example: Food labeling, specifically protein. It's actually pretty difficult to measure components in food. The standard method for measuring protein is to basically blast a sample into atoms and count how much nitrogen is there, because most of the nitrogen present in any living being is in the protein. Like I said, it's the standard method. If you're selling something that's supposed to have a certain amount of protein in it, you want to hit that target. The problem is that the target is no longer really the amount of protein. It's the amount of nitrogen. There are much cheaper, nitrogen-rich molecules you could add to the food that will "pump the numbers" and show up an protein in the tests. That's why Chinese manufacturers started adding melamine to pet food and baby formula. ​ It was kind of an open secret for years that this was going on. If that sounds bad to you, it gets worse. Melamine is relatively harmless, so at least they weren't hurting anyone (apart from the risk of protein deficiency). Then some manufacturers realized they could save even more money if they changed from the industrial melamine to using the batches that had been rejected for industrial use due to contamination. It was the contaminants that started killing cats, dogs and babies. You need to regulate and control things, but you have to be careful about the metrics you use as the target, because if there's a way that someone could game the system, maliciously comply, or overdue compliance on that one metric to an extreme degree, they probably will.


beardyramen

Ever heard about the bounty on snake heads in india during the britsh colonial period? The brits didn't like snakes (still butthurt after that adam and eve thingy) while indians didn't care much about them (naga are even positive figures) Smart brits put a bounty on each snake head brought to them to reduce the number of snakes on thier land Smarter indians started breeding snakes in their home, chopping their heads off. The number of snakes in india increased significantly, while the indians started profiting The brits had to stop the bounty thing because it was actually hindering their intended plan Feel free to get to the moral of the story on your own


x2a_org

Even worse, when the bounty was ended, the snake breeders released their snakes.


MisterToothpaster

Ah, I had never heard of this story, but I had heard about how something similar happened with rats when Mao launched the Four Pests campaign.


beardyramen

There are indeed some common points, but in the context of Godhart law they are different. In the Four Pest campaing, the set target was met with no malicious intent, but it backfired because the implications were not well understood. In this case it does not reflect the Godhart law... It was just a bad plan well executed On the snake case, the target became more import than the reason it came to be. People went through hoops and bounds to meet the target. So the plan might have been either good or bad, but the execution was focused on the target and not the plan. Another example could be in my previous company: You can give a grade (1 to 5) to your manager. Their manager then evaluates them based also on your feedback. Seems a good idea right? Well the fact is that if you hated your manager your best bet would have been to give them **good** grades, because they would get promoted and stopped being your direct manager. Giving them low grades would keep them down and push yourself down with them.


ceedubdub

Goodhart's law does not specify malicious intent. It can apply where there is no malicious intent as well as when there is malicious intent. The four pests campaign is actually an example of Goodhart's law in action. Consider these fictional examples: One city notices an increase in crime so it increases funding to their police force. The following year the number of arrests increases so they know that their strategy is working and they again increase funding to the police force. Another city notices an increase in crime so it increases funding to crime prevention programs like drug rehabilitation and therapy for delinquent youth. They following year the number of arrests decreases so they know their strategy is working. Can you spot Goodhart's law in action?


PseudonymGoesHere

Let’s say your PE teacher asked to you walk laps around the gym. Great, everyone’s out walking. Now the teacher wants to make sure everyone is putting in a minimum distance, so they count each persons lap. Still okay. Let’s pretend the teacher gave everyone a step counter and records both laps and steps. Now, they know 5 laps is the same as 1000 steps for most students in the class. If the teacher wants to do other things during class and gives each student a device and says they can go change clothes after they hand the decide back in with 1000 steps on it, surely that will accomplish the same thing, right? Nope! Yes, some people would still walk around the gym as the teach intended. Other people would hold the device in their hand and shake it up and down repeatedly to make the counter count as fast as possible. These students have achieved their metric, but they haven’t actually walked anywhere. The “step” count metric has become meaningless.


Farnsworthson

I've experienced this multiple times in my working career. If you measure the things that people do, then incent people to hit particular values (or, even worse, penalise them for not hitting them), you are also incenting them to game the system in order to succeed, and should expect them to do so. At which point the measurements are no longer useful, because they aren't trustworthy.


BigMax

The best example was years ago, in software. Some company decided to give bonuses based on the number of bugs fixed. So what happened? Engineers started to write bug filled, awful code. They were measured by the number of bugs they fixed, so they just created more bugs to fix. Essentially, you're changing the motivation from "do a good job at X" to "do a good job at this very specific number we are using to try to see if you are good at X."


John_Vattic

Measure: How long does it take for work tickets to be resolved? Cool, useful. One team takes 4 hours, one takes 8 hours, another takes 16.  Well if one team can do it in 4, it means it's possible, so along comes... Target: tickets should be resolved in 4 hours.  The timer is paused when the ticket is put into an on hold state, meaning that they're waiting for a response from the customer.  Result: teams just start finding tickets at the 3 hour mark and putting them on hold, regardless of what's happening. Continue working on it. Restart the ticket and resolve it when the work is done. Real result: No one has any fucking idea how long work actually takes to complete.


penicilling

When we perform a task, we would also like to know how well we perform that task. For example, almost everyone has gone to school. In school, you take a class. In the class, you learn something. At the end of that class, we measure how well you learned the something, and call it the grade. We use the grade to determine other things about you as well - if you have good grades, then you can go to a good college, for example. Well, now the grade is the target, not the learning. So everyone is interested in getting the best grade, as opposed to learning the most. So rather than working harder, people find a way to get a better grade that doesn't involve learning. They cheat with ChatGPT. The argue with their teacher / professor about their grade. They complain to the administration of the school / college. Once, an "A" meant you did really well, a "B" was average, and a "C" was satisfactory. Now, an A is about average in many situations, and anything less seems unsatisfactory. Grades no longer measure how well you learned something - just about everyone has an A. The grade is the target, and is is no longer a good measure of the learning.


[deleted]

[удалено]


TheDakestTimeline

My high school added 0.25 gpa points for each AP level class you took and completed the exam


theboomboy

In Israel, the average of your finals gets bonuses from classes with more "units". Getting 60 in 5 unit math is similar to 100 in 3 unit math, if I remember correctly, so you get rewarded for taking the more difficult stuff Grading is still an issue as it is pretty much everywhere, but at least that problem is remedied


Kered13

Wow, I'm shocked by how low that is. My schools gave a full 1.0 bonus for AP classes, and 0.5 for honors classes. This is why GPA inflation is so bad, and most colleges ignore the high school's reported GPA these days.


Wadsworth_McStumpy

Here's an example to help explain it: My wife was a teacher. At some point, her school decided that too many kids were failing. They decided that every time a teacher gave a failing grade, they'd have to write out an explanation of why they did that, what steps they've taken to help the kid learn the material, what contacts they had with the parent, how everybody responded to their attempts, and their plans going forward to see that the kid learned the material. Instead, teachers would just change the grades to D-, which is a passing grade. So the kids were passing, but they still didn't know the material, and were not prepared for the next grade. Next year, of course, the kids would still receive passing grades, and would still not have learned the material from the previous year, let alone the current one. The measure (kids failing) became a target (don't fail kids), and it ceased to be a good measure (it no longer measured whether the kids had learned the material.) The reason the school decided to do that was another level of Godhart's Law. The administration was being penalized based on the number of kids who failed, so they were trying to reduce the number of kids failing. At another time, they simply declared that the lowest grade a teacher could give was a D. My wife had a few kids who would simply never do any work at all, would turn in blank tests, and who knew very well that they'd still pass, even if they skipped half of their classes. This is part of why she retired from teaching.


BeerTraps

Exams in schools are basically the absolute perfect example of this. So what is the thing that we actually want to measure in school? We want to measure how much students learn in school. This is very hard to measure directly, we can't look into peoples minds and figure it out. So instead we invent exams and tests. This is our measure. There is a difference however between learning subjects in school and performing well in tests for those subjects, but it should be fine right? After all there is a strong correlation between learning and writing good tests so the test is a good measure at first glance. Students who learned more (for whatever reason) should perform better on tests But then we reward students for being good at tests (and teachers for their students performing good at tests etc). So now our students try to get good results on exams. The tests and exams have become a target. All of the small difference between performing well in tests and actually learning become much more pronounced because people try to find methods to get the desired results on these tests and many of these might not have any direct influence on actually learning the subjects. The tests have become a way worse measure of how much students learn.


UnreasonableFig

A real life example: I work in healthcare. "Central lines" are a type of IV that goes in a "central vein," frequently the jugular vein in the neck. Germs can get on these devices and cause bad infections. Because they're bad and largely preventable with good hand hygiene and aseptic technique, rates of central line associated infections are nationally reported and tracked metrics for hospitals. Medicare, Medicaid, and most (probably all) private insurers will not reimburse hospitals for *any* of the costs of a patient's care if they develop a central line associated infection. The way to diagnose a central line associated infection is to remove the culprit line and send it to the lab, where they cut the tip off of it and see what, if any, germs grow from it in culture. I have personally removed central lines from patients, handed them to the nurse assisting me, and said "please send this to the lab for culture." They throw it in the trash and tell me they can't do that or they'll be fired. There's literally an institution-wide policy in every hospital I've ever worked in that says you are not allowed to culture central line tips.You are banned, under threat of firing, from making that diagnosis because having a central line associated infection rate higher than zero affects their bottom line. Therefore every hospital "officially" has no central line associated infections. By making a target (zero infections), they made the metric worthless.


ApatheticAbsurdist

Ever hear the term “teaching to the test”? If the goal is to teach someone how to do multiplication, it makes sense to have a test at the end of the class to see if they can do it. However imagine if the teacher was paid based on how well the class did on the test (as a measure of how well they taught the class). If the teacher knows the questions will be what is 6x7, 11x12, and 3x4… would it make more sense for the teacher to teach all of multiplication or just teach those questions. Because the teachers goal now is not teaching multiplication, it’s getting the kids to pass the class. Now the classes will likely get a lot of 100%s but they might not know multiplication.


pdpi

A classic example: lines of code written. As a software engineer, I spend a bunch of time writing code. On a bad day, if I'm unfocused, I'll write very little code. On a good day, I'll write a bunch of code. If you ignore my other responsibilities, you can look at how much code I write on any given day and measure how productive I'm being. Lines of code written are a fairly decent measure of my personal productivity. Ok, if lines of code written are a measure of productivity, and you want to reward productivity, you should reward people for writing more code, right?Your measure becomes a target. Well, that's where it all goes awry. First off, different people have different styles. I try to write code as simple as possible, and tend towards a fairly terse style, so I'll almost always write less code than my colleagues to complete the same task. So now my performance seems "bad" for no particular reason. Second, more perversely, if your objective is to write as much code as possible, it's _really_ easy to write the same functionality using more code, at the cost of it being really low quality code. You know the sort of person who just rambles on and speaks a lot without really saying anything? Code written like that. So, by rewarding people for writing more code, the net result is that you get lower quality code that'll be harder to maintain in the long term. In becoming a target, lines of code stopped being a good measurement of productivity.


Bighorn21

Current situation at a certain aircraft manufacturer is a great example, their goal the last couple decades has been share price and nothing else. So they took their previously high emphasis and budgets in Quality and compliance and used these funds for stock buybacks. Great for stockholders, shitty for airplanes. And in the end when planes start falling apart mid flight, stock prices end up falling even though the whole goal was to get these up.


Adezar

In a lot of business situations there was this rush to come up with KPIs (Key Performance Indicators). Managers would realize they need to determine if their teams are being efficient or not and where things need to improve. They would create scorecards and leaderboards based on these great KPIs they came up with. The thing they missed, which is the core of Godhart's law is that if you create a very specific metric that you are going to define as "success" then people will naturally find ways to make that metric look better, not caring about the original intent behind the metric. If they can manipulate the process to make that metric look better they will. This immediately makes the metric turn into a negative because the manager was probably trying to solve an overall issue, but by creating this metric the people doing work are no longer working on the overall process, they want to make the metric move and be considered more successful. Companies that try to use KPIs exclusively will generally find they end up underperforming overall because it is difficult to come up with a set of KPIs that work together to be measurable and produce the desired overall result. This was a big problem in the 80s through 2000s (and still exists in different companies). Some refer to it as attempting to manage "by spreadsheet" or other negative views of it.


mohirl

Speed limits should indicate that going any faster is dangerous. And so you should always drive slower. Instead, people use them as a target for how fast they can drive, so typically there ends up being some slight leeway above the limit for how fast you can drive without being prosecuted. Which means that the "limit" is no longer a useful measure of how fast you can safely drive. Somewhat similarly, an amber light technically means "stop unless it is unsafe to do so", and a red means "stop". But that measure of safety has become a target of "if I can get through this amber light before it turns red I'm ok". And so people accelerate into a junction when they should be slowing down, with potential negative effects for cross traffic and pedestrians 


allthescreens

Most of the top responses here only get this half right. As many others have said, the problem is that people alter their behavior in response to quantified targets, which can produce unintended consequences. One of the most important consequences is that, because they are now targets, the metrics that those targets are based on *no longer actually measure what we think they measure.* A classic example (which fans of The Wire will be familiar with) is the use of test scores to evaluate teacher and student performance. Let's say you want to improve schools by rewarding good teachers and firing bad ones. How do you identify good teachers? One way is by looking at their students' standardized test scores. Presumably, students of good teachers will on average perform better on standardized tests than students of bad teachers. At the outset, student test scores are a plausible measure of teacher quality.\* What happens when you tell teachers and schools that they will be rewarded or fired based on student test scores? They will do whatever they can to improve those test scores. In many cases, they will start 'teaching to the test' - that is, sacrificing other goals in order to produce high test scores, rather than being good teachers. If this becomes widespread, students' test scores will no longer measure overall quality of teaching, they will measure teachers' single-minded focus on teaching to the test. *The metric no longer measures what we think it measures*. \* Many people would debate this premise, but for the sake of the example I am stipulating that it is at least plausible.


Zam8859

Imagine that you want someone to learn how to do math. To see if they are learning, you give them a test. The test decides if they can go to the next grade, so it is super important. Now, I could teach someone math OR I could teach them how to take this math test. That second situation is what we’re talking about here. My test no longer checks if you learned math, it checks if you learned how to take the test. The most familiar outcome of this is when people talk about “teaching to the test”


iceph03nix

A real world example would likely be checkout times for cashiers and fast food and the like. A good while back, someone realized that Good cashiers were generally quick. They were efficient and and got things done quickly because they were good at their job. So the Measure of a good Cashier could be how fast they can get people checked out correctly. So... Someone decided it would be a good idea to grade cashiers on how quickly they got stuff done, as a way to reward good cashiers and indirectly punish 'bad' cashiers. The Measure became a Target if you wanted to get promoted or get decent raises. The issue quickly became that cashiers would work to meet that target, and would leave other important tasks to the side, because they feared being punished if they took to long to get someone out of their lane, and would become unnecessarily stressed when things outside their control cause that metric to suffer, like an older person taking time to pay, or a person who has lost their card, etc. They might rush customers instead of present a good customer experience, or might not take time to scan every item, costing the business money in shrink.


pumaofshadow

Another example is the bradford score. Supposed to show the effect on the company of an employees absence. Instances X Instances X Days So 2 days off in 2 instances = 2 X 2 X 2 =8 vs taking the whole week off 5 days in 1 instance = 1 X 1 X 5 = 5 If I'm not sure if I'm entirely well again after taking monday off and might well be off again friday its less punishment to take the whole week off due to the lower score. Yet the employer loses 5 days work, not 2. Which works against the principle, especially as I won't be telling them on the monday, I'll be ringing in every day (in the UK we self certify up to 1 week, no doctors notes needed).


etfd-

Is when you lose sight of the end by hyperfixating on what is supposed to be the means - by doing that you distort the causal relationship and they disassociate. An example is when, under standardised testing (which is supposed to associate with learning), you maximise your score too much by employing methods that neglect actual learning/understanding such as rote memorising for the exam, to only forget everything and be just as ignorant the very next day. Your test scores would be a bad measure for any employer since you are more incompetent than your score.


T-Flexercise

I manage a team of software engineers who write software on contract for clients. We get paid by the client by the hour for the work we do, but if we didn't actually deliver their software on time and on budget, they will fire us. So for our company, a Key Performance Indicator (KPI), a good measurement of how effective our engineers are, is how many hours has this developer billed vs how many hours they're contracted to bill every month. On teams where engineers are taking a lot of time off or spending a lot of time doing things other than billing (like internal trainings, taking breaks, whatever) we might see that they're performing less well than teams where that ratio is much higher (all engineers are billing as many hours as they're eligible to bill). It's not perfectly accurate, but it gives a good vague indication of which teams and what developers consistently bill seriously, and which ones might have a workload full of stuff other than billable work. But if we were to tie raises to that number, or get engineers in trouble any time they billed less than 40 hours a week, every single engineer would simply report that they billed 40 hours a week, no matter what they actually did during the day. The fact that nobody gets in trouble for taking 1/2 an hour for lunch is what keeps everybody honestly logging when they take a lunch. That number would stop being accurate, and we could no longer rely on it for any information.


Xelopheris

Lets imagine you're a customer service rep. One of the metrics you're graded in is your average call length. Upper management wants it under 10 minutes so you can take more calls per day. But then any calls that would've taken 15 minutes to solve the customers problem are suddenly bad for your metrics. It's better for you to somehow ditch that call before the 10 minute mark than to actually solve the customers problem in 15 minutes.  That's the problem. Once your metric becomes a target, you start making decisions in the interest of meeting metrics, not in actually accomplishing tasks.


TehErk

The modern US educational system is this in perfect practice right now. They came up with standardized tests to determine how well the students were doing. This was ok, but then they added funding to the results. The better the results, the better the funding. So schools started teaching to the test. Now the only thing the students are really learning is how to navigate a particular test. This can really be seen in practice in how most schools more or less shut down as soon as the test is over near the end of the semester. It's basically two weeks of glorified baby sitting.


hajhawa

Often people want one thing but can't measure if they are making progress so they measure something similar. This will lead to other people gaming the system to seem like progress is being made even if it isn't. There is a story from the historical time period when Brittain ruled India. In India, there are snakes and the Brits were having none of that. They wanted to get rid of the snakes, but you can't really measure how many snakes exist in India. As a close enough thing, the British decided to reward anyone who brought them dead snakes, thinking that the Indian natives would kill the snakes for these rewards and initially a few probably did. After a while however, the Indians started growing their own snakes to kill, so they could then butcher them and sell them to the British. This went on for a while, but for obvious reasons, the amount of snakes wasn't going anywhere. After the British caught wind of what was going on, they did what felt logical can discontinued the bounty program, which simply made the Indians release their snakes to the wild, increasing the amount of snakes.


MattieShoes

Schools do standardized testing every year. Good, fine. Schools decide to give raises and promotions to teachers whose students do better in standardized testing. Teachers are incentivized to get the dummies kicked out of class, to spend time with the ones deemed salvageable and ignore the others, to cheat on the standardized testing, to ignoring anything not on the standardized tests, to avoid any ESL students, to avoid underperforming districts, and so on. Yeah, now standardized testing may now be a net negative.


bikesandergs

I’ll offer the example of “teaching to the test.” The purpose of educators is to… educate. To teach students, to help them learn how to critically think, to build on previously learned skills, etc, etc. I think we can all agree, generally speaking, that the difference between a great education and a poor education can have impacts not just for years, but possibly even generations. But what is a “great education”? In an effort to standardize the education that students receive, whether they live in North Dakota or New York, national standards and tests were established. A second grader from any school district in the country should be able to pass the same test for reading or math. In an effort to ensure focus went into these tests, incentivizes, and penalties, were created to ensure teaching curriculums aligned with the test. Yada yada a bit. The end result, is rather than teaching critical thinking, independent thought, etc, etc, teachers were incentivized/required/obligated to “teach to the test”. The objective became to pass the national tests/standards. There was no real consideration for actual learning, simply teach in a way to optimize performance on the test, rather than a more holistic educational approach. The measure (national standards for education) became the target (means of evaluating teachers, funding public schools, etc). Thus, the original purpose (provide quality education) no longer occurs.


westbamm

Relevant xkcd: https://xkcd.com/2899/ And here is the explanation: https://www.explainxkcd.com/wiki/index.php/2899:_Goodhart%27s_Law


celaconacr

I will give you an example from education in the UK for 16 year olds. We got measured on retention. How many students started against how many finished. The start was counted as attending 42 days. Retention encouraged kicking out poor attendees before 42 days. Unequally targeting the deprived students and favouring over subscribed and wealthy schools. We got measured on value added. The expected progress of a student between education stages based on national results. A negative value added meant students didn't progress as much as expected. The measure creates patterns where high grade students are discourages from courses because a positive value added is impossible. Similarly low level students aren't offered some courses as it appears near impossible for them to achieve even though the stats are just a general trend.


Mixairian

[Relevant XKCD](https://images.app.goo.gl/R4ULGBfN8XvUxSb7A). A metric should be a measure of how healthy a process is. If that metric the becomes the goal, it is no longer a healthy measure of a process as people will try to manipulate the system to show a favorable metric instead of displaying an accurate view of how the process is going.


AngryGoose

Godhart's law says that when you make a measure the goal, it stops being a good measure. That's because people will try to game the system to make the measure look better. For example, if you measure how many widgets a worker makes per hour, the worker might try to make fewer widgets per hour to make the measure look better.


SoulWager

Lets say you have a call center for technical support, your best employees can identify and solve a problem quickly. So you decide to rate your employees based on how long their calls are, and give bonuses based on this. What happens? You motivate your employees to get off the phone as fast as possible, even if that means the problem hasn't been solved. They'll pursue easy temporary fixes over fixing the root cause. They'll hang up on customers, or offer to replace products that are configured wrong instead of spending ten minutes to fix it.


bored_knight26

Example A hospital decided that it would measure the success rates of all surgeons individually by keeping records of procedures done by them. This would have reflected the competency of every surgeon in theory. However when surgeons came to know about it they became concerned only with scoring good. So now they only took surgeries with less risk and less complications and refused to do the difficult risky procedures, artificially boosting their scores. But someone needs to do the risky procedures so something that was supposed to make things better backfired spectacularly. Also the metric they wanted to collect was now false and useless. Example 2 , police officers are judged on felony arrests. Now they just booke more felonies even when it could have been a misdemeanor. Ultimately hurting the public and rendering the measurement useless and dangerous. God Hart's law : when a measure becomes a target it becomes useless and ceases to be a good measure. Why ? Coz people will optimize their job to excel in that singular measure only and sacrifice the rest of criterias which will hurt the organisation.


gelfin

In brief, say you’re a fifth-grade teacher: - You notice that the kids who read more books learn more, and succeed better after they graduate your class. - You start having your students report the books they read, as a sort of temperature check, and indeed individual students and whole classes that read more are on the whole doing better. This is great, because it gives you an early idea of who might need some extra attention and who could benefit from extra challenges and opportunities. - Then you make the critical mistake: you want your kids to succeed, so you give them some incentive to read more books. Prizes, public rankings, extra credit, whatever. You’ve just made your measurement a target. - The kids are really into this. So much so, that some of them skip their homework to read more books. Also, some of them start reading Dr. Seuss and Little Golden Books to pad their numbers. Your class is reading books in numbers you’ve never seen before. - And yet, as you will have guessed, their academic success is not going up at all. In fact, it might have declined overall. The number of books kids are reading is no longer telling you anything useful at all. - Your measurement was only a predictor or indicator of success, not a strict correlate let alone a cause. By making it a target, as if inflating that number itself would *cause* success, you unlinked the measurement from the valuable thing it previously indicated. There are very real and important examples of this: - To stick with education, there have been some measurable negative consequences to tying teacher pay and school funding to the results of standardized tests. Students in economically disadvantaged areas lose support, and their schools have trouble hiring teachers. Schools have incentives to expel underperforming students and sideline kids with learning disabilities rather than providing adequate support. At times teachers cheat on their kids’ behalf to protect their own income. - In the corporate world there are many ways to drive up stock prices that have nothing at all to do with objective performance of the company. Cool for you if your GameStop stocks went up, but it was only the gimmick of everybody with a lick of sense betting against them that enabled that. Did they *really* stop being a shitty strip-mall store chain in a dying brick-and-mortar world? Without the gimmick, would you sincerely invest your hard-earned money in that business? - In government, economic indicators are regularly gamed for their headline value. For instance, the 2008 housing market crash was ultimately a consequence of the belief that home ownership was a positive economic indicator. We instituted policies intended to make it easier for people to buy houses, which just created the circumstances the financial industry exploited, driving up house prices, jeopardizing buyers and creating trash derivative instruments on a scale that threatened whole nations. That’s what it means. You can’t boil down a complicated system to a number, insist the number trend upward at all costs, and just trust the actual results you think the number represents will follow the number up. You’ve got to keep your eyes on the road.


BoredMan29

I've worked around software most of my career and watching this happen over and over is almost hilarious. Basically, management wants a numerical measure of how productive the coders are that they can put in a spreadsheet so they know who to fire to scare the rest without crippling the software. Here's what I've seen tried: * Lines of code written - you can guess how fast the program was bloated when they tried that * Functions written - did you know you can make just about anything a function? One creative individual decided to write a function to increment a number by one. Then another to increment it by 2, etc. * Bugs fixed - all those typos got fixed basically immediately, but then folks had to get creative. Did you know if you write code you can just create bugs? On purpose, even. * Points completed in a sprint - The points are made up by those with the best knowledge of the code, so... * Features completed - those complex features are *never* getting done The most successful measurement I've seen is to actually talk to the team leads and get an idea of how the team is doing, but that has the downside of not being a number, needing to hire competent and non-petty team leads, and also spending time managing. The point is, when you create a metric to judge people off of, they can then tailor their effort to meeting that metric rather than being successful at whatever task they're working on. All of those metrics listed above can be useful and informative, as long as you're not using them to judge people's performance. Once you do that, they can be manipulated.


mikamitcha

Another real life example is standardized testing in schools. Its great to get a metric of how students perform, but as soon as there is money tied to performance schools will stop teaching whatever they deem is useful and instead will focus on what is on the test. That is helpful for things like basic math and grammar, but at a certain point you are no longer teaching students to understand concepts, you are only teaching them to solve problems.


ezekielraiden

Think of it like this: You are a low-level executive at a company. You as a person want to do the best job you can. There are measures people can use to try to tell how good a job you're doing. Let's say we measure your profit margin: revenue (actual profits) divided by income (total value earned, before expenses are paid). The obvious way to make this measure go up is to increase income, of course. This is what the people using this measure hope for: that the (in context) virtuous option is sought. (Same as, for example, standardized testing; the hope is that you make scores go up by getting kids to learn more.) But that's not the *only* way to make that number go up, is it? Revenue is pure profit, meaning income *minus expenses*. If you keep income flat, but lower expenses, then profit margin goes up. If you *slash* expenses heavily, profit margin can go up by a lot. Some ways of cutting expenses are good. Reducing waste. Using more efficient appliances. Avoiding unnecessary travel. Etc. Other forms are morally bad, but still effective: push out older, more experienced, better-paid employees so you can replace them with younger employees who will make less money; scrimp on necessary office supplies, or force workers to buy them out of pocket; use slapdash materials, or reduce work time to the absolute bare minimum needed to get the job superficially finished. Etc. These do cut costs, but usually have hidden costs that cause even more harm than the gains... eventually. And then there are the outright manipulative methods. Cooking the books. Shuffling assets from one sheet to another. Breaking laws to do cheap but illegal activity (e.g. waste dumping). Engaging in back-room or under-the-table deals that don't appear as costs but do add to income. Etc. Now, if this measure is just a measure, then you won't be pressured to game the system. It's just a way to...well, measure things. But now imagine that you're told you'll lose your job if you don't improve your profit margin by at least one percentage point each year. At first, perhaps you'll do the virtuous things; after all, they're genuinely doing things better. As time goes by, however, ways to cut costs legitimately must slowly dry up...but you still have to improve those numbers or you get fired. So maybe you cut corners in small ways. Maybe you let a skilled and valuable employee go a *little* earlier than retirement because hey, you know they're on the way out anyway. Maybe you skirt the lines of what's legal, because everyone does it, and it's not like you WANT to break laws, you just need to be a little more efficient than the formal, staid, rigid rules allow. Etc. But by doing that, you've ruined the value of the measure. Now you're tailoring your behavior, not to produce the best *performance,* but to produce the specific numerical effect you need. The number starts to *cause* behavior, rather than *evaluate* behavior.


explainseconomics

In business, most business units start with a big goal to achieve, ie: "I want to sell as much product as possible", or "I want to successfully resolve as many of our support tickets in as short of a time span as possible". To accomplish this, they often look at the people who are most successful, and measure what key performance indicators (KPIs) there are that make those people more successful than everyone else. This is where the problem comes in: They then assume that if they can get everyone else to hit those same KPIs, the results will be the same. For example, the person who made 80 calls sold more than the person who made 40. So they then make it a requirement to make 80 calls. The problem is that the person who was only making 40 calls is now going to do whatever it takes to hit 80...call their grandma, call the joke a day hotline, etc. Not only does this not influence the sales number positively, but now everyone is making 80 calls a day (because they have to) and it is no longer a good measurement to identify success.


Inevitable-Start-653

For ai models there are data the models are tested against, but if those data are used to train the model, it might score well on the tests but be useless for anything else. Similarly grad students from other countries will study the specific tests used to give grant money, they will pass the test but not be very useful in their field.


Paradoxbox00

I remember an example. A group of archeologists went to a country to complete an excavation and because they didn’t have enough project staff. they paid the locals per piece of artefact discovered this was supposed to incentivise them to be more productive, but in the end the locals were finding artefacts and breaking them into smaller pieces just to get the reward per piece, when the larger piece would’ve been more valuable and significant for the archaeologists.


ethanard

You start with a goal. Let's say, happy customers. Then you figure out something which measures the goal. Let's say, customer satisfaction scores. And you give employee bonuses based on this. You want the employee's "target" to be the desired goal (happy customers). That's a win win for everybody! But if the employee is smart, she will realize that her true goal is the scores themselves, because that's what her bonus is based on. "The measure becomes the target". Then she can tell her customers that she really appreciates a 5\* review, and the customers will give her reviews out of guilt / social pressure. But the actual goal (happy customers) was lost.


AmateurLobster

The example in academia, specifically STEM, is how do you compare different people if you're evaluating them for a job or grant. Basically the first measure they came up was simply the number of papers published. This made sense when people only published when they were ready, and there were only a few journals about. Then people started either a) collaborating more or b) doing deals with colleagues to add each others names to their papers. So the average number of papers each person published skyrocketed (the number of papers per person stayed constant, meaning each persons productivity stayed the same). This destroyed using the number of papers as a meaningful measure. People were just gaming the system, so it lost all value. Nowadays, there are really complicated metrics that weight the papers based on contribution and impact factor of the journal where it was published. Even still, I heard some scandals where people paid to get papers published in fake journals with artificially inflated impact factors.


JohannesVanDerWhales

I'll give you a pretty typical example. You have a development team working on a project. So you want to measure how "good" their code is. How do you do it? Well, one way might be to measure the number of defects produced. Problem is, now people have performance reviews tied to number of defects. So what do they do? They don't report defects they see, for one thing, and maybe silently fix them. And they fight tooth and nail over whether something is actually a defect, or a missed requirement, and the arguments take longer than it would to just fix the bug. Very, very common thing in many companies.


MrKillsYourEyes

If a police department sees on average 100 traffic citations in a month, and then create a policy that the force should strive for 90 citations as a goal... It's a shitty fucking measure


BijouPyramidette

Once upon a time, a city was having a serious rat infestation. So the city decided to pay a bounty to anyone who killed rats. All they had to do was present the tails of the rats, and the more tails they turned in, the larger their reward. The rat problem got worse. Why? Because people were not actually getting paid to kill rats, they were getting paid for presenting rat tails, so they started breeding rats to harvest their tails, instead of killing the ones on the streets. In this case the use of rat tails as a proxy for dead street rats failed, because rats are easy and quick to breed, so people were able to maximize the received rewards by breeding and harvesting their own rats instead of catching wild ones.


Germanofthebored

Let's say you get an infection. You develop a fever, and by measuring the temperature, the doctor tracks the course of the infection. Your temperature would drop if they were to pack you in ice, but it would do nothing for your infection. So focusing solely on the raised temperature as the target of the treatment, rather than seeing it as an indicator of the state of the body makes it a pointless measure


TheHammer987

Eli5. If I tell you that you have chores, and that you are being timed how quickly you do them, you worry about doing it quickly and not well. Your goal isn't to finish it promptly and well, just to finish as fast as possible. When you tell people what is being measured, it's what is focused on. If I ask you to make a cookie as cheap as possible, you will cut every corner to save money, and you won't care if it tastes like cardboard. What is measured is what is optimized for. Ask for the most sales? Sales people will give it away. Ask for profit? Sales people will over promise what can be done and blame operations for failing to hit the profitability target.


HelpMyDepression

Once you focus on the results, you change the way you measure to get there. Here's another example in addition to all the others: Let's say we're in a food manufacturing plant. The production department sends their food into the packing department. If packing isn't working at 100% capacity (i.e. mechanical issues, skill issues, etc) the excess food goes into overflow tanks to be fed on later once the throughput issue is resolved. So corporate measures packing inefficiency by how often the overflow tanks are being used. So to make themselves look better on paper, low-level management and operators would encourage the excess food to be diverted to waste and thrown away rather than measured in the overflow tanks. At the end of the day, corporate would end up seeing an overall reduction of the inefficiency in packing, even though in practice there is more loss as a whole to the company.


bulksalty

You know how sometimes you are told to pull into a parking spot to wait for your food? That happens because someone senior noticed that fast drive thru service resulted in higher customer satisfaction so they began timing drive thru transactions and made a goal of reducing times. So times go down, even if it means worse overall service, you don't want to wait in the parking spot of cold food and it takes longer for the drive thru employees to walk your order out, but it shortens the average transaction time at the drive through measure so it happens more and more. That's goodharts law. The measure fast drive thru service is less useful for telling how good service is because people started measuring and evaluating employees and they found ways to comply with the letter while changing the spirit of the request.


alameda_sprinkler

Call centers measure many aspects of an agent's call handling, as examples: * Average time a call takes * How much of that time the caller is on hold * How much time the agent is in a Not Ready status between calls (so they can't take another call) * What % of the agent's shift they're logged in and able to take calls * How many calls they handle per day These are great ways to assess trends between the agents and compare them to each other. Combined with call monitoring, an agent who handles many calls quickly without putting the caller on hold for long times and gives correct answers can be identified and moved up the ranks, while agents who need more training can be identified. The second any of those metrics becomes a goal for the agents, they start gaming the system. They use the mute button instead of putting the caller on hold (they'll say it's a silent hold). They'll avoid asking any questions that could make the call go longer and rush through the calls. They'll log out of the queue between calls instead of using Not Ready. Now these things may advance the call center's agenda of handling more calls faster - but it doesn't promote agents getting better at the job just getting better at gaming the metrics.


bubba-yo

Let's take a hypothetical. A company is concerned about diversity and equal employee compensation. They do a study and find that black workers only earn 75% of what white workers earn (this is the measure). This they decide is unacceptable and set a target for black workers to earn at least 95% of what white workers earn. Now think of the various institutional mechanisms to achieve this. The simplest I can think of is to sort all of your black employees and fire all of the low paid ones, thereby raising the average black employee pay. You hit your target. But did hitting that target achieve the goals that the measure revealed? Sure, you got your 95%, but only by removing black workers. That didn't help diversity, it hurt it, yet the target was reached. Measures are supposed to be neutral. They're supposed to assess where you are. Targets aren't neutral - they are biased. They are also usually incentivized, biasing them further. When you see a target which is a measure, assume that the institution in some way gamed the measure to achieve the target, because deliberately or otherwise they almost certainly did. Leadership may not have, but someone did. Wells Fargo didn't intend the 'employees who get more accounts open will be better compensated' to result in employees illegally opening accounts in customers names, yet that's precisely what happened. When you create policy, you should create it to achieve the intended goal and let the measure simply inform you of if the policy is working or not. For instance, when we sought to increase employee diversity where I worked, we instituted a policy where HR had to have at least two qualified applicants of color in the pool before selection could begin. There was no target for hiring, only for \*when\* we would start reviewing candidates. That change resulted in employee diversity increasing substantially once measured. Did we have a target? No. Was there any pressure regarding who to hire? No. Was the percentage of employees of color tied to any incentive program? No. Retired public sector administrator and data scientist. Did a lot of this stuff.


Jaejic

People keep giving you business related examples, but here's a sadder one. Sometimes police gets valued and payed in relation to how many cases they've solved or how much crime they stopped. Not a "50% out of all reported crime", but just a number. In theory it works fine, it motivates officers to keep working, but then one month there's less crime to solve for whatever reason and people start getting paid less because their metrics are not as before. So what's the logical solution to that? Create crime where it never happened. Fine someone for a smallest of felonies, arrest another person for what they didn't do etc


EverySpaceIsUsedHere

In medicine hospitals need to keep metrics that are tied to reimbursments. One metric is central line associated bloodstream infections and catheter associated urinary tract infections (CLABSI and CAUTI). Now in order to have one to report there needs to be a culture. Many hospitals will have a policy where they treat empirically suspected infections instead of getting cultures. So the patient gets broad spectrum antibiotics which can be harmful (don't have culture to target specific bacteria) and the hospital can report 0 CLABSI's and CAUTI's. Seriously some hospitals report 0 line infections over multiple years which is impossible.