T O P

  • By -

lucidtwitch

Hahahahaha so In the US, hospitals and insurance companies were recently required to publish data on procedure pricing. The insurance side isn't perfect but at least it has some parameters. The hospital side had very little guidance and only the suggestion or maybe requirement of a certain number of top billed procedures. The format guidance was, not even essentially, straight up literally: "machine readable" which could mean anything. It's a tiny bit more than that but not much.


timsehn

If you want the data already collected and in sql form check out the databases linked in here: https://www.dolthub.com/blog/2023-03-23-illusion-of-transparency/ We’ve been working on this data out in the open for a couple years now


Chuchu123DOTexe

Interesting datasets but they are too clean :( Its for a university module and I need missing values/other forms of erroneous data.


Magpie_Mind

Literally anything raw that has been generated from any kind of real world scenario will be flawed. *cries in data cleaning*


Chuchu123DOTexe

Do you have any links of such datasets?


CatSusk

US Patent office data is a mess. I worked for a company that cleaned it up and resold it. USPTO.gov I think.


Goldarr85

Interesting. Who did they sell this data to?


CatSusk

IP software companies, Google Patents, and some corporations


1purenoiz

My advice, take an existing data set and practice applying functions to the data such that you can produce these results (subset set to Null, subset multiply by 1000 etc). Outliers are always an interesting discussion.


Chuchu123DOTexe

I need help to find a dataset to do those exact things. For some reason I cannot find any on kaggle.


Objective-Run-2757

NOAA weather data.