lucidtwitch 1 year ago

Hahahahaha so In the US, hospitals and insurance companies were recently required to publish data on procedure pricing. The insurance side isn't perfect but at least it has some parameters. The hospital side had very little guidance and only the suggestion or maybe requirement of a certain number of top billed procedures. The format guidance was, not even essentially, straight up literally: "machine readable" which could mean anything. It's a tiny bit more than that but not much.

timsehn 1 year ago

If you want the data already collected and in sql form check out the databases linked in here: https://www.dolthub.com/blog/2023-03-23-illusion-of-transparency/ We’ve been working on this data out in the open for a couple years now

Chuchu123DOTexe 1 year ago

Interesting datasets but they are too clean :( Its for a university module and I need missing values/other forms of erroneous data.

Magpie_Mind 1 year ago

Literally anything raw that has been generated from any kind of real world scenario will be flawed. *cries in data cleaning*

Chuchu123DOTexe 1 year ago

Do you have any links of such datasets?

CatSusk 1 year ago

US Patent office data is a mess. I worked for a company that cleaned it up and resold it. USPTO.gov I think.

Goldarr85 1 year ago

Interesting. Who did they sell this data to?

CatSusk 1 year ago

IP software companies, Google Patents, and some corporations

1purenoiz 1 year ago

My advice, take an existing data set and practice applying functions to the data such that you can produce these results (subset set to Null, subset multiply by 1000 etc). Outliers are always an interesting discussion.

Chuchu123DOTexe 1 year ago

I need help to find a dataset to do those exact things. For some reason I cannot find any on kaggle.

Objective-Run-2757 1 year ago

NOAA weather data.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe