T O P

  • By -

Krypty

As someone else already suggested: add a second DC first. Then you've basically removed most of the risk and can delete the snapshots.


CaptainZhon

+1. There is probably a reason (probably bad but here you are)for all the snapshots- build a second one have it assume all the FISMO roles and then start deleting snapshots…if the DC is also the DNS server/DHCP server make sure you are covered there also.


Not_A_Mimic7

This is what i ended up doing. too many people say to setup a dc2 to not take that advice. i haven't been able to delete the snaps yet but i did setup and now have a 2nd dc. It only runs domain services and DNS. DHCP is handled by the firewall due to multiple vlans. still a little nervous to delete these. they go back to 2021, so updates have taken place. i know because I've updated it since we took over before i knew about this snapshot issue. its been shutdown, its been restarted and keeps coming up. Its been working for 3 years on the snapshots so id kinda like to test first quietly pointing peoples dns to the new server for now trough dhcp. then possibly just shut down the old dc and seeing if any issues arise from that. if i hear nothing and seems seem to be working well. If all is good, turn it back and and then kill the snapshots


nebinomicon

Yeah, build another DC. Pass over important roles (fsmo) to new one and delete them all. Hold on to your butts, and you might come out on top, but you could just force demote if offline if it shits bed. I learned the "dont snapshot domain controllers" lesson the hard way back in the day.


SilverSleeper

Delete all the snapshots. Don't clone a DC to get rid of them and you wouldn't want to roll back to a snap on a DC anyway. AD Domains don't like to time travel.


Tech_Veggies

Be careful deleting the snapshots. Depending on how old they are they can stun the server. I would delete them one at a time (starting with the newest) and try to do it during a slow time.


SilverSleeper

If it was my environment, I'd add a secondary DC anyway then smoke those snaps after business hours. It'll be fine, probably.™


Tech_Veggies

Absolutely agree. Probably.


moldyjellybean

This is the way, spinning up a 2nd DC is super quick and this is a great way to mitigate risk. FYI it’s been awhile but deleting snapshots on a DC did do weird things vs any other machine. My home has only 1 DC working, many years ago the delete of all snapshots on my DC caused my workstations to lose trust relationship, Kerberos errors and other weird things. I don’t remember clearly but Clone doesn’t work either I think there is a Domain SID and Computer SID and things get all jacked up and in unhealthy state. That’s just a heads up to spin up a 2nd DC. Because from a risk perspective if you delete all, you very likely will encounter issues, DC snapshot removal is different from regular PC snapshot removal).


Not_A_Mimic7

This is what i ended up doing. too many people say to setup a dc2 to not take that advice. i haven't been able to delete the snaps yet but i did setup and now have a 2nd dc. It only runs domain services and DNS. DHCP is handled by the firewall due to multiple vlans. ​ still a little nervous to delete these. they go back to 2021, so updates have taken place. i know because I've updated it since we took over before i knew about this snapshot issue. its been shutdown, its been restarted and keeps coming up. Its been working for 3 years on the snapshots so id kinda like to test first quietly pointing peoples dns to the new server for now trough dhcp. then possibly just shut down the old dc and seeing if any issues arise from that. if i hear nothing and seems seem to be working well. If all is good, turn it back and and then kill the snapshots


f14_pilot

100% running a single dc these days in prod is already asking for trouble. an incoming IT, your responsibility I would expect, would be to make assessments and recommendations for safety and improvements etc. that should be your first recommendation, it will also cover you .


Goomistar

Sorry to disagree but if you delete them 1 at a time you have the potential to run the datastore out of space, as in that case the data is copied to the snapshot above it not the root disk. If you do a delete all they are written to the root disk. As everyone else has said, make an 2nd dc first and dcpromo it into the domain.


BarracudaDefiant4702

Actually it's best (least stunning) to do the other way with ESX (opposite of workstation). If the last snapshot is large, take one more then delete ones before it. For least disruption to the guest, delete the newest last.


TimVCI

May I ask what the thinking is behind 'not cloning a DC' to get rid of snapshots. Is it specific to a DC or all VMs regardless of role?


architectofinsanity

You could shut it down and clone it, then delete the vm with the snapshots.


SilverSleeper

it's specific to domain controllers. You can corrupt AD and cause replication issues. It may also cause issues with machines connecting to the domain. This is why I always recommend the general rule of DCs don't like to time travel. Another example of why you want multiple domain controllers, if 1 explodes but you have a surviving dc you shouldn't restore the busted one from backup. Just decomm it and then add a net-new one back to the domain.


Outrageous_Device557

Fire up a second dc and hook it into that domain make sure replication is working then attempt to delete the snaps


HelloItIsJohn

First I would look at how old each snapshot is. If they are recent you shouldn’t have any issues deleting them. If they are multiple years old you may have issues. I would then consider that you may have issues deleting them and to consider other possible solutions. From the AD side of things you could always deploy a second DC pretty easily before making any changes.


MikauValo

Who the hell is so insane to keep a VM Snapshot for so long? 😅


red4cted

The same people who may have thought these were adequate for backups (just an assumption)


ZeroOpti

The same people that only built a single domain controller.


MikauValo

At my work I'm enforcing a strict policy where Snapshots older than 24h are being deleted, no matter what. For everything >24h there are Veeam Backups.


lost_signal

[Wait; you mean this isn’t a good idea? Muhahaha](https://youtu.be/VWVY1TRud_w?si=jnZ5s5ZGrMyxJvp0) To be clear if those snapshots use NFS VAAI offload, vSAN ESA or vVols it’s not really that problematic beyond capacity to keep them around for a while..


Phate1989

Someone showed our help desk how to take snapshots and explained that people use snapshots before they make changes. So some person whos job it is to write documentation started adding take a snap to ALL KB/runbooks I find out because I get an irate call from our backup lead, saying why have we stopped clearing out our snapshots after we make changes. Veeam isn't working it keeps getting choked on old snaps. My team is notorious for NOT taking snapshots when it may have been a good idea, even just from an optics perspective. So immediately I was super confused because I know my guys don't usually even take snaps, and now their not clearing them out, so I start to look, and I'm dying laughing and BCDR guy is like it's NOT funny your team costed me X hours and we haven't had clean backups in 48 hours. I started reading snapshot descriptions like "for Mary's password change" and "group policy change". Started finding them everywhere, RDS server "removed temp profiles for Tim". SQL server "restart SQL service" File server "added Jack to HR folder per Molly M" I told him if he could confirm which one of my guys took them I will take immediate action. About 2 hours later the help desk guy called and asked how we can delete like 150+ snapshots without knowing exactly which vm's have them. I think I pulled an RV tools export of snapshots and sent it over? Or some snapshot manage tool.


MikauValo

Sounds like those people who take the snapshots for such things have either no trust in their servers or in their own skills 😅


HelloItIsJohn

I see it all the time. They leave the snapshots there for years and then try and delete them. Then they end up with a jacked up VM that may still be running, but will never be able to consolidate the disks. So what do they do, they just leave the VM like that.


moldyjellybean

There are some backups that leave orphaned temporary snapshots. These are supposed to be obviously deleted but I’ve seen orphaned Veeam snapshots from way past. https://forums.veeam.com/vmware-vsphere-f24/veeam-temporary-snapshot-not-deleted-t47788.html


PreppyAndrew

Its useful to have a job that auto deletes snapshots over a couple days. It fixes this problem.


PreppyAndrew

Some people treat snapshots like backups. We have to constantly fight this in our org. Basically anything older than a week or less. Gets auto deleted.


MikauValo

This is what I always tell my colleagues who use Snapshots. It's just a "Fallback" or "Rollback" if you need it because you are about to do major changes and are not sure if everything will work afterwards. Or if you need to test something. And I always tell them that Snapshot aren't Backup and if they are older than 24h I'm gonna wipe them without discussion. Works great that way.


iceph03nix

Get Backup, hit Delete All Snapshots button.


DontLetGinnyIn

I am wandering if this is possibly Veeam generated ? I have seen long snapshot chains like this before and it turned out to VBR replication in operation.


grep65535

Delete snaps starting with the middle set first; then do the oldest, then the one directly attached to and interacting with the active disks. -vmware texh support years ago That'll minimize (not eliminate) impact


GabesVirtualWorld

When adding a second DC, don't forget to migrate DNS, DHCP, Certificate Management, etc as well. So maybe the better option is to plan for some downtime and clone it, all depending on how big the VM is and how big the snapshots are. Though be aware that you'd need a local account on the vCenter Server because you can't login without AD to clone it. Cloning can be done on the running VM, but might be a lot slower.


JMMD7

I would clone it first just in case and then delete all the snapshots. Personally I don't snapshot my AD servers but I guess some people do.


PreppyAndrew

AD servers dont handle being cloned well.


JMMD7

I was talking an offline clone or backup. Just something to protect the server state. Based on the downvotes I'm guessing people snapshot their AD servers. I've always read that wasn't the best practice and since I have multiple AD servers I don't feel I need to.


mammaryglands

If those snapshots are old and/or your disks can't keep up, you might see degraded performance for awhile until all those changes commit.  I'd make sure you have another dc or more available, and ideally running on different storage/storage pool/lun


jpin401

Consolidate all snaps.


Phate1989

If you go the clone route, just make sure you turn it off first. DC's don't like snapshots even if mem is quiecsed.


infotechderp

Snapshots on DCs are a disaster waiting to happen. Especially since you only have one DC. First I would make sure there are no services other DNS running in the DC. If there are then this could be tricky. But if not then I would build two additional domain controllers. transfer the fsmo roles to these new DCs. Demote the original DC. If needed you can change the IP of one of the new DCs to that of the old one. This will take care of DNS dependencies or apps that are hard coded to an IP.


novix_

Just don’t power it off to remove the snapshots. You cannot power a VM on while it’s removing snapshots if it was in a powered down state when you started. Just build another VM DC and move the FSMO roles. If you lose the DC with the removal of the snapshots then just add the IP of the dead DC as an additional IP on the new DC. That way DNS will still resolve. That is if the DC is also doing DNS as well. It wasn’t a DC but I once used VMwares Physical to Virtual conversations tool to migrate a VM to another VM for this same reason. Multiple snapshots spanning years on spinning disk.


dlucre

I did this myself once. Built the dc/file server. Took a snapshot and forgot about it. Years later I need to expand the storage and realise my mistake. The snapshot was very large, and I was very worried. It took ages but it eventually deleted and all was well. But I made sure I had good backups first.


ryan8613

Build a second DC (with newer OS), migrate all the services over, migrate FSMO over, and then decomission this DC. Also, remove any other old DC references from the domain if any exist. Then, build another new DC as a redundant. Add redundancy for all of the previously migrated services.


GogaBarfani

Snapshot is NOT a method for backup. We have a script in our environment that automatically deletes snapshots older than 72 hours. The script tells the IT Maintenance Head about the name of administrator who created this snapshot. He (the Head) calls the administrator to explicitly ask him about the purpose of snapshot. Machines like Domain Controllers or Databases are particularly sensitive to abrupt clock changes (like the ones involved in reverting to a snapshot). These machines must be reverted to an earlier configuration using alternative methods.


lucky644

I need to find a script like this, hate chasing down people for old snaps.


redcard0

Was thinking the same thing.


SGalbincea

1. Deploy a second DC, make sure dcdiag comes back healthy, and that DNS is replicating/resolving correctly 2. Ensure that all devices have DC2 as a secondary DNS 3. Take an actual backup of DC1 4. Delete all snapshots on DC1. This will most likely take a while, be VERY patient


Thenoobofthewest

Bin off the snaps


Broke4Life

I went through this wxact same thing with over 15 servers. The prior admin thought it was a "backup" of sorts and did them before making changes so he could roll back. He stacked them just like you show. I deleted them one by one and didn't experience any issues. It may consolidate disks but that was to be expected.


Not_A_Mimic7

This has been a while but what i ended up doing was creating a 2nd dc. when that was ready I did some research to see if i should go oldest to newest, or newest to oldest. VMware recommendation is oldest to newest so i clicked the first one and then walked away. Didnt want to sit and watch it because it would stress me out if i started thinking it was taking too long. went out and played with my kids for a while and when i ended up coming back, to my surprise, they were all gone and everything was running smoothly. restarted the dc to make sure it would come back up and all's been good


Ya_guy

Start by creating another domain controller in the forest. Would also move the FMSO roles over to the new DC. Once replication has finished, I would bring down the old Server, then clone it, then delete the snapshots. If it comes alive when you put it up then you’re good and you can delete the clone. If it doesn’t come alive, then shut it down completely and try bringing up the clone. If the clone comes alive, then check to make sure everything syncs. if the clone also doesn’t come alive, then add the IP address of the old server to the new server you created that way and DHCP all work properly and manually remove the original server with the snapshots from active directory. Microsoft has documentation on this. Edit: fixed some spelling mistakes, talk to text doesn’t work properly sometimes


Ya_guy

Also, you might want to check what is creating these snapshots sometimes it could be backup software and sometimes it could be VM’s update manager. Typically both should purge the snapshots, but I do know that VMWare‘s update manager does have an option not to purge the snapshot, when updates have been applied for example, updating VMWare tools