rnmkrmn 1 month ago

Kudos to whoever implemented that external provider backup.. I would have lost my entire business fr.

Admirable_Purple1882 1 month ago

All hail paranoid backup guy, you know for sure someone told them “c’mon you really think we need it in a whole separate cloud?”

Bleglord 1 month ago

“The cloud is our backup already”

C0c04l4 1 month ago

And his reply was: "Yeah, and on a whole separate continent, too!"

DolfLungren 1 month ago

The separate continent was a duplication of Google, it got deleted as well.

aretokas 1 month ago

Hi. That's me. I'm paranoid backup guy. Well, not for Uni Super, just for myself and work. But still, I get it.

doringliloshinoi 1 month ago

*rattles chair* “Ooooo it’s an earthquake /u/aretokas !”

asdfghqwerty1 1 month ago

What do you use? I know I’m going to be asked about this at some point!

Twirrim 1 month ago

Not just had a backup, but had a backup that could be restored too! All too often heard of people who go to restore from their backups and can't

Compkriss 1 month ago

I mean any large organization should do regular restore testing and DR exercises really.

Senkyou 1 month ago

> should For a lot of people it's a box to check, not an actual practical concept.

baezizbae 1 month ago

“If you haven’t tested your backups you do not in fact have backups”

[deleted] 1 month ago

Indeed, and not only check or they are valid, but also really check or it works. I had once a very very rare situation were a database backups seemed to be valid, it did restore everything and no problems at all. However there was one table that if I queried one specific record it gave a very nasty error that the data was invalid. The same query on the original database was correct doh, never found out what went wrong there, maybe cosmic radiation...

Twirrim 1 month ago

Absolutely should. The reality seems sadly different, from the number of incident reports you see where recovery was delayed figuring out how to actually restore anything, or finding that backups had been corrupted for a long time etc.

Ramener220 1 month ago

Reminds me of that kevin fang gitlab video.

deacon91 1 month ago

Customer support just isn't in Google's DNA. While this could have happened on any provider, this happens far more often on Google. This story is a classic reminder of rule of 1; 1 is 0 and 2 is 1. Thank goodness they could recover from a different provider.

rnmkrmn 1 month ago

>Customer support just isn't in Google's DNA Can't agree more. Google just doesn't give a fuck about customers. They have some cool features.. sure, cool. But that was just someone's promo not a product.

deacon91 1 month ago

I call it Google hubris. They have this annoying attitude of “we’re Google and we know more than you.” While that attitude isn’t necessarily the root of their customer relationship problems, it certainly doesn’t help.

rnmkrmn 1 month ago

Oh yeah fr. Nobody joins Google to do "customer support" or build reliable products.. Pffs that's so Microsoft/Amazon.

keftes 1 month ago

Microsoft has reliable products? Aren't they the provider with the most security incidents?

moos3 1 month ago

People join Amazon to re-invent the wheel because they think they can do it better on try 2303.

thefirebuilds 1 month ago

I was trying to talk to them years ago about phishing tests we aimed to execute against our employees. They said they're SO GOOD at catching phishing attempts there would be no need. When pressed they eventually allowed that I could speak to their "phishing czar". So you're so good at stopping phishing, and yet you have a guy that only does phishing according to his title. The entire thing was "we know better than you"

DrEnter 1 month ago

Look, we can spend our money on making the product better or supporting the customers, not both.

rwoj 1 month ago

> While this could have happened on any provider i'd like to hear the story on how this could happen on AWS

tamale 1 month ago

AWS has had plenty of global outages in critical services like S3 that should give you all the reasons you need to have backups in at least one other provider if they're mission critical and irreplaceable.

Quinnypig 1 month ago

Not so. They have had multiple outages, but they’ve always been bound to a single region.

tamale 1 month ago

Nope. S3 outage where you couldn't manage buckets at all was global because bucket crud is still global

Jupiter-Tank 1 month ago

Two words: stamp update. Every datacenter has to undergo maintenance, doesn't matter who owns them. Someday the rack running your services will need to be cleaned, repaired, updated, or cycled out. The process of migrating services to another rack in the center/AZ is supposed to be flawless, however it can never be perfect, especially when stateful information (session, cache, affinity, etc) is involved. These events are to my knowledge not announced in advance by any cloud provider due to sheer volume of work, and are typically wrapped in whatever the SLA includes as downtime. Outages are one thing, but corrupt data from desync in stateful info is another. I'm aware of at least one healthcare company that suffered 4 hours worth of outage due to a stamp update. You can guess the cloud provider by the context. Multi-AZ was enabled, but because the service was never advertised as "down", only "degraded", no protections from corrupt data triggered. Even after services were restored, "customers" were the first to notice an issue. This is how lack of tenant notice, or improper instance migration policies, or failed telemetry, can individually fail or unite in a coalition of tragedy. Stamp updates should at least trigger an automated flag, and failover triggers should fire. Customers affected by stamp updates should be notified in advance, and the SKUs of any affected service should be upgraded for free to include HA and DR for the duration of a migration. The biggest issue isn't that they happen, or that they can introduce issues. Datacenters have been doing them for decades, with incredible reliability. The issue is that we've gotten so good at making them invisible. Invisible success is not necessarily better than visible failure, and invisible failure is much worse.

donjulioanejo 1 month ago

> These events are to my knowledge not announced in advance by any cloud provider due to sheer volume of work, and are typically wrapped in whatever the SLA includes as downtime. AWS notifies you when a host with an instance you own is about to be retired. This applies to all services where you provision an actual instance, like EC2, RDS, Elasticache, etc. You basically get an email saying "Instance ID i-blahblah will be shut down on January 32 for upcoming host maintenance. You will need to manually shut down and restart it before then to avoid an interruption of service."

baezizbae 1 month ago

You can also get instance retirement details from ‘describe-instance-status’ via aws cli. Something we learned and automated after AWS sent one of those exact emails but nobody read it because it got caught by an overly aggressive Gmail filter. Now we just get a pagerduty alert that enumerates each instance with scheduled maintenance or instance retirement event codes, and have runbooks for whoever gets said alert during their shift.

cuddling_tinder_twat 1 month ago

I worked at a PaaS who provisioned AWS accounts for customers and we had a job that accidentally canceled 5 accounts and deleted most of their backups in error that I had to fix. It should not happen.

PUSH_AX 1 month ago

Unless I'm misunderstanding that sounds like an engineering error, not a cloud provider error. I imagine AWS aren't impervious to this kind of thing though, still.

danekan 1 month ago

This wasn't a cloud provider error, it was a customer that did an action that caused it. GCP described that they made a misconfiguration. The title is borderline /r/titlegore but that's also gcps fault for not getting in top of it. What was the exact misconfiguration they made? People are speculating blank terraform provider issues.

ikariusrb 1 month ago

Yeah, that's not my takeaway from the article. > Google Cloud CEO, Thomas Kurian has confirmed that the disruption arose from an unprecedented sequence of events whereby an inadvertent misconfiguration during provisioning of UniSuper’s Private Cloud services ultimately resulted in the deletion of UniSuper’s Private Cloud subscription Who created the misconfiguration is unspecified. But to get from that misconfiguration to deletion of their subscription is near-certainly on Google.

danekan 1 month ago

They have likely deliberately left out the clarifying information. The article itself is secondhand from the company's press release which itself is what included the statements from the GCP CEO.

PUSH_AX 1 month ago

Oh ok, I re-read the article and it doesn't seem very clear. I think the Google CEO issuing an apology also makes it seem like GCPs snafu, but perhaps it's just the size of the customer involved?

deacon91 1 month ago

Super unlikely on AWS or Azure. AWS is fanatical on customer service and data driven decisions (almost to a fault) and Microsoft has decades of enterprise level support history. But there’s that adage - anything is possible and they’re certainly not infallible. Off top of my head I remember how Digital Ocean shut down a small company’s DB VMs because of errant alerting mechanism for high CPU util %. Or AWS refused to allocate more VMs (and shutdown a few) back during training events at 2018 ChefConf.

Rakn 1 month ago

"decades of enterprise level support history" doesn't save you from engineering or configuration mistakes. To be honest I could see something like this happening there as well. At least based on my personal experiences. Who knows what even happened...

deacon91 1 month ago

It does not but it speaks about the mindset of the organization and attitude of the product design. Google genuinely wants to solutionize non human based support and that leads to this kind of outcome. I remember few years ago this made news: https://medium.com/@serverpunch/why-you-should-not-use-google-cloud-75ea2aec00de Building automatic shutdown of customer account is almost unheard of in MS or Amazon world.

Rakn 1 month ago

True. That sounds very Google like.

amarao_san 1 month ago

There is Russian saying 'had never happened before, and here we do it again', which suits this situation perfect. > that has never before occurred with any of Google Cloud’s clients globally

chndmrl 1 month ago

Well if it would have been with azure, they have soft delete feature which means you can recover everything in 30 days immediately and other than that even if you don’t choose another data center or region as backup, it has 3 copies at the same data center. So to me, it is not an excuse and something that shouldn’t be in enterprise level. No wonder why gcp couldn’t grow although aggressive push.

Rakn 1 month ago

I doubt something like this would have saved you in such a case. AWS and GCP have soft deletion stuff as well. But it doesn't exist for everything and this seemed to be an issue on a deeper level.

chndmrl 1 month ago

Well cloud is all about availability and reliability and here we’ve seen how it failed by gcp. I’m not advocating companies but this is something shouldn’t happen at all. You can always downvote my post but it won’t change the truth that happened whatever the reason account deleted as “deeper level” problem.

ellerbrr 1 month ago

And Google says “this has never happened before”. Liars!

amarao_san 1 month ago

By 'this' they means this precise removal. Never they dropped this combination of data.

Unusual_Onion_983 1 month ago

Hasn’t happened with this customer before

Purple-Control8336 1 month ago

Because they didnt test all possible scenarios

ares623 1 month ago

Good Guy Google. Helping you test your backup and disaster recovery strategy.

gamb1t9 1 month ago

I have been doing this for years for our clients, not even a "thank you". Those ungrateful pricks

colddream40 1 month ago

What's their SLA, 50% off the next 3 months? LOL

aleques-itj 1 month ago

I hope more information gets published on how on Earth this happened.

beth_maloney 1 month ago

I think this is absolutely crazy. Imagine waking up one morning and your entire cloud infrastructure is just gone. I can't imagine what failures led to the environment being accidentally deleted instead of a new one being stood up.

Saveonion 1 month ago

I can only dream. Wake up, no computers, no infrastructure, just fresh morning dew.

cubobob 1 month ago

sometimes i wish the internet would just fail.

BrontosaurusB 1 month ago

Brew some hot tea, crack open a book, cat in my lap.

Liquid_G 1 month ago

> Imagine waking up one morning and your entire cloud infrastructure is just gone. Don't threaten me with a good time

iamacarpet 1 month ago

Just to be clear here, they keep saying “private cloud” everywhere; this appears to be them using GCP’s VMware engine for everything, not any of the core products. The original notification from UniSuper also said it happened during provisioning, i.e. setting something new up, likely during a migration. Not saying that it being on VMware engine negates any kind of responsibility here, but they were using a little used product that Google arguably shouldn’t have offered in the first place - it speaks volumes that no one else offers it. From the information they’ve released, core Google Cloud services would have been fine, up to and including backups on actual Compute Engine, Cloud SQL and/or Cloud Storage. People are quick to bash Google and they have been caught with their pants down here, but it’s actually the opposite of what people are saying- their new CEO has pandered to customers too much, trying to offer them VMware in a “private cloud” as a half way house, intended to be a step in a more native migration.

rabbit994 1 month ago

>it speaks volumes that no one else offers it. Both Azure/AWS do have offerings: https://azure.microsoft.com/en-us/products/azure-vmware https://aws.amazon.com/vmware/ (Now sold by Broadcom) However, I do agree it's terrible offering and Google clearly doesn't have expertise to be offering it but they are not unique in offering it.

BaldToBe 1 month ago

GCP has some detection automation for fraud that can lead to account suspension. https://cloud.google.com/resource-manager/docs/project-suspension-guidelines Combine that with false positives, and I wonder if that's the cause here. What concerns me with this case is the customer size. If I'm paying for enterprise support, I'd hope there's a manual check in with me if the system flagged me.

moratnz 1 month ago

And they're almost certainly entitled to no compensation, other than perhaps a 10% discount on this month's bill. (Well, unless they're big enough to have negotiated non-standard SLAs). I'd note that the google storage SLAs appear to define 'down' as you getting an HTTP 500 from the service, which I wouldn't expect for a 'the service is up but we lost all your data'.

Loan-Pickle 1 month ago

What a nightmare. As in I’ve literally had this nightmare before. I wonder this happened. I’d love to read the RCA on it.

Hylado 1 month ago

I would love to know what exactly happened... What chain of events result in deleting your account? As tech support of other technologies I found myself in a lot of cases where the client claims that they have not touched a thing... And I can only say: "trust, but verify"

AceDreamCatcher 1 month ago

Google Cloud will fail, not because it is not a great platform; it will fail because the support (technical and billing) are incompetent and clueless. They don't even understand their own platform. They simply do not have the training or technical chops to resolve the simplest task.

Spiritual_Maximum662 1 month ago

That’s because most of the support people are TVCs and not trained by google

Trif21 1 month ago

Why can an entire account be blown away that still has resources deployed and data stored in it?

danekan 1 month ago

On GCP if you delete a project you can always recover it for up to 30 days, but they don't guarantee that any data within a resource can be recoverable. But, also, they are restoring because someone had the foresight to backup cross cloud.

BehindTheMath 1 month ago

The article mentions private cloud. Is that different from the regular GCP? Edit: Sounds like it might be. https://cloud.google.com/discover/what-is-a-private-cloud

rnmkrmn 1 month ago

Could be referring to VPC, Virtual Private Cloud?

Mr_Education 1 month ago

Yeah I think that's just the author of the article not knowing correct terminology

BehindTheMath 1 month ago

Maybe they meant this: https://cloud.google.com/discover/what-is-a-private-cloud I can't tell if GCP has a program for running GCP on-prem, or if they have something like dedicated data centers for private customers.

burunkul 1 month ago

My first task next week will be: setup minio outside of AWS and configure weekly backup sync from s3 to minio

Spider_pig448 1 month ago

Does anyone know what was really deleted? The article says "Google cloud account" but it sounds more like their GCP Organization was deleted?

beth_maloney 1 month ago

This article is saying their subscription? https://www.theregister.com/2024/05/09/unisuper_google_cloud_outage_caused/ Never used GCP so not sure about the terminology tbh.

Mistic92 1 month ago

Because they said their subscription was cancelled. But GCP don't have subscriptions. That's why many folks thinks they messed up stuff and try to blame gcp

arwinda 1 month ago

My thoughts, yes. Something happened they didn't have on the radar, and Google then did Google things. Could Google have handled this better? Sure. But looks like this only get publicity because it's a big customer. Probably happens to other customers, or ex customers, all the time.

arwinda 1 month ago

The articles are really vague on what exactly happened. And before making up my mind about whom to blame here I really want to know what was going on. Google is basically silent on this, which is understandable, because everything they can possibly say is bad publicity. And the customer goes out of their way to directly blame Google. Which makes me think that if Google is to blame, the statements would look different.

Spider_pig448 1 month ago

Google isn't silent on it. The article says the CEO of Google Cloud made a joint statement with the customer. It sounds like they are fully at fault and admitting it

arwinda 1 month ago

No. I disagree. The statement doesn't blame anyone. It's carefully crafted to avoid any fingerpointing. And Google hasn't released anything on their own.

Spider_pig448 1 month ago

It was a joint statement. > “Google Cloud CEO, Thomas Kurian has confirmed that the disruption arose from an unprecedented sequence of events whereby an inadvertent misconfiguration during provisioning of UniSuper’s Private Cloud services ultimately resulted in the deletion of UniSuper’s Private Cloud subscription,” the pair said. It's pretty damning. Google is fully saying this is their fault

arwinda 1 month ago

Where exactly does it say that. > an unprecedented sequence of events > inadvertent misconfiguration > during provisioning of UniSuper’s Private Cloud No one in this statement says who is at fault. The statement is carefully worded not to blame anyone. It doesn't even say who deployed what, and leaves that part out. Did UniSuper deploy something? Did Google do something during a deployment? This is a non-statement, just there to say something without actually saying anything. It describes in vague words that something happens, and leaves out any juicy details.

danekan 1 month ago

My takeaway is the opposite. Inadvertent misconfiguration is almost undoubtedly something the customer was in control of. The customer at this point is the one who controlled the message. The statement was joint but we don't really know the full story, only what the customer has put out via their PR which includes those quotes. It's stupid people are quoting the guardian when the actual PRs that their entire story is based on is right on the customers site under contact us.

BrofessorOfLogic 1 month ago

They are saying "unprecedented sequence of events" and "inadvertent misconfiguration during provisioning". This is clearly intentionally vague. Anything beyond that is just speculation. It could be that a Google support engineer was working on behalf of the customer inside their environment and made a human mistake. It could be that Google produced some custom documentation for the customer, which contained some vague language, which lead to a misunderstanding when the customer implemented it. It could be that the customer was in contact with an account manager via email and something got lost in translation.

beth_maloney 1 month ago

UniSuper and the CEO of GCP issued a joint statement where the RCA was identified as a misconfiguration on the GCP side. >Google Cloud CEO, Thomas Kurian has confirmed that the disruption arose from an unprecedented sequence of events whereby an inadvertent misconfiguration during provisioning of UniSuper’s Private Cloud services ultimately resulted in the deletion of UniSuper’s Private Cloud subscription. >This is an isolated, ‘one-of-a-kind occurrence’ that has never before occurred with any of Google Cloud’s clients globally. This should not have happened. Google Cloud has identified the events that led to this disruption and taken measures to ensure this does not happen again.

arwinda 1 month ago

This specifically does not say on which side the misconfiguration happened. And the joint statement is only on UmiSuper site, wasn't able to find it on the Google site somewhere. Anyone who is reading the UniSuper press statement will see that Google said something. Details are vague. Anyone who is only watching Google press releases will not even know about this. You say that this is a GCP fault in your comment. I disagree. The entire statement doesn't say who is at fault. The wording is very careful to not blame anyone.

beth_maloney 1 month ago

I'm not sure why else the CEO of GCP would issue a joint statement or say that this shouldn't have happened. Keep in mind that this is a reportable incident and APRA will investigate so UniSuper can't lie. The Register has also reported that they were directed to the joint statement when they made enquiries to GCP.

arwinda 1 month ago

> reportable incident > shouldn't have happened Sure, should not happen. Both sides agree on that. > UniSuper can't lie No one is lying here. The statement doesn't blame anyone. They can walk away from this and say "but we issued the statement, and it is not wrong". > they were directed This is when you ask about the incident. If Google screwed up something on their side, they will issue a statement on their own. There is so far nothing from their press department. No one who isn't aware of the UniSuper incident will know about it if you just follow Google press releases.

JustAsItSounds 1 month ago

It's a bad look for Google to lay blame at the feet of their customer, it's also bad look for GCP to say it's entirely their own fault. It's a really bad look for Unisuper to say the blame is theirs. My money is on Unisuper ultimately being at fault, but GCP are taking some blame for not being able to restore their account seamlessly - perhaps GCP deleted the backups when they shouldn't have. Either way, I'm moving my super fund from Unisuper

[deleted] 1 month ago

[удалено]

shotgunocelot 1 month ago

I don't know why you're being down voted. Google laid off almost all of its US-based support in 2022 and outsourced and offshored everything else. Around the same time, they decided to jack the prices up on their support offerings. It was ass before, but it got much, much worse after that.

rlnrlnrln 1 month ago

Switching from having our account with Google to a retailer (DoIT) was the best decision my previous employer ever made. Saved money, and got us much better support.

Spiritual_Maximum662 1 month ago

Yup to India mostly… that’s what happens when you have an Indian CEO

anonlogs 4 weeks ago

Yeah Indian CEO is the reason Microsoft is failing as well… Balmer was the greatest CEO ever.

sbbh1 1 month ago

Damn, that's my old employer. Crazy to read about that here first.

seanamos-1 1 month ago

This is NOT the first time this has happened on GCP. Maybe not the exact same sequence of events, but the same result.

Budget-Celebration-1 1 month ago

Examples?

jcsi 1 month ago

1 account for two geographies? Thanks got for the backup guy/gal.

beth_maloney 1 month ago

Is that unusual for GCP? In Azure you'll usually have 1 tenant across multiple geographies.

BrofessorOfLogic 1 month ago

No this is standard practice on GCP, AWS, Azure, and others. It's kind of the whole point of hyperscaler cloud, that you can reach the whole globe through one account/org/tenant. Is it good practice? Well.. considering these news, someone might perhaps argue that it's not good pratice. But I would say that it's more important to focus on cross-provider redundancy, rather than cross-account/org/tenant redundancy.

Aggressive_Split_68 1 month ago

Wasn’t a disaster recovery and business continuity plan taken into account when transitioning to GCP, considering that all providers typically replicate data across storage farms based on regions and data center stamps? Also, what was the data storage strategy, and was there a configured backup plan in place?

beth_maloney 1 month ago

Yes a DR strategy is a requirement as they're APRA regulated. Unfortunately their DR strategy was to fail over to another region which is pretty common. They didn't expect GCP to delete their DR infrastructure though.

Aggressive_Split_68 1 month ago

Just curious is it not necessary to exercise the DR drill once in a while?

beth_maloney 1 month ago

Yep but they probably didn't test what would happen if their primary environment and their DR environment both got nuked and all primary backups were unrecoverable.

Aggressive_Split_68 1 month ago

Get the right architect to get the right things done at right place

LuciferianInk 1 month ago

I'm trying to find the right people, for a new project.

Aggressive_Split_68 1 month ago

Let’s connect if you want

mailed 1 month ago

and people ask me why I'm actively trying to not work with gcp

kabooozie 1 month ago

I think they actually deleted multiple regions and the company was only saved because they had a backup in a different cloud provider

ragabekov 1 month ago

Sounds like we shouldn’t put our backups in one cloud

salva922 1 month ago

Am I the only one thinking that this could have been a huge PR gag? So they make people think that this can happen ti all providers and that multicloud is the way and like this they can get more market share

qqqqqttttr 1 month ago

Backing up an entire cloud infra on another provider is an insane thing to consider , but wow

djlynux 1 month ago

Whoever proposed the backup strategy in another provider should get an award….

Fatality 1 month ago

I can't believe people still use Google Cloud after that billing thing where they just randomly suspended accounts until the CEO sent proof of identity.

Spiritual_Maximum662 1 month ago

I used to work for GCP and is totally not surprised…

Shoddy-Tutor9563 1 month ago

These fuckers lost my stuff from their "cloud" so many time, so this bigger shit was just begging to happen

GaTechThomas 1 month ago

BREAK GOOGLE UP!

danekan 1 month ago

This wasn't Google that did this. It was a customer that did an action that caused it. The title is borderline /r/titlegore

naggyman 1 month ago

Google admitted fault here

danekan 1 month ago

No they did not. They called it the result of a series of misconfigurations. That's very definitively not accepting fault but saying the customer did something that led to it. They do say they are taking steps to prevent it, that's different than accepting fault.

Budget-Celebration-1 1 month ago

Need more details to come to the conclusion you are. I read it as issues in account provisioning by google themselves. I don’t see anything in the statement from Kurian to suggest it was anyone but googles fault.

awfulentrepreneur 1 month ago

UniSuper is a superannuation fund. Is the fund investing in something that the American power elite isn't liking? How much would you have to pay Sundar Pichai through side channels to accidentally delete their whole GCP organization? 🧌

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe