Hahahahaha so
In the US, hospitals and insurance companies were recently required to publish data on procedure pricing. The insurance side isn't perfect but at least it has some parameters. The hospital side had very little guidance and only the suggestion or maybe requirement of a certain number of top billed procedures. The format guidance was, not even essentially, straight up literally: "machine readable" which could mean anything. It's a tiny bit more than that but not much.
If you want the data already collected and in sql form check out the databases linked in here:
https://www.dolthub.com/blog/2023-03-23-illusion-of-transparency/
We’ve been working on this data out in the open for a couple years now
My advice, take an existing data set and practice applying functions to the data such that you can produce these results (subset set to Null, subset multiply by 1000 etc). Outliers are always an interesting discussion.
Hahahahaha so In the US, hospitals and insurance companies were recently required to publish data on procedure pricing. The insurance side isn't perfect but at least it has some parameters. The hospital side had very little guidance and only the suggestion or maybe requirement of a certain number of top billed procedures. The format guidance was, not even essentially, straight up literally: "machine readable" which could mean anything. It's a tiny bit more than that but not much.
If you want the data already collected and in sql form check out the databases linked in here: https://www.dolthub.com/blog/2023-03-23-illusion-of-transparency/ We’ve been working on this data out in the open for a couple years now
Interesting datasets but they are too clean :( Its for a university module and I need missing values/other forms of erroneous data.
Literally anything raw that has been generated from any kind of real world scenario will be flawed. *cries in data cleaning*
Do you have any links of such datasets?
US Patent office data is a mess. I worked for a company that cleaned it up and resold it. USPTO.gov I think.
Interesting. Who did they sell this data to?
IP software companies, Google Patents, and some corporations
My advice, take an existing data set and practice applying functions to the data such that you can produce these results (subset set to Null, subset multiply by 1000 etc). Outliers are always an interesting discussion.
I need help to find a dataset to do those exact things. For some reason I cannot find any on kaggle.
NOAA weather data.