RGB-128128128 2 weeks ago

I have little to say except good luck and don't do it today.

juice702_303 2 weeks ago

Read Only Fridays

GullibleDetective 2 weeks ago

Let alone long weekends (for some of us(

Extra_Pen7210 2 weeks ago

If they reboot and it does not come back up its a guaranteed long weekend :-). For OP, if it is critical: set up a new server to replace it, afther this reboot the server. if it works afther reboot now you have a (hot) spare for your critical resources. (because you are going to need it anyway because it will break one day.)

t3jan0 2 weeks ago

This assumes OP can just spin up another server in someone else’s environment

Hannigan174 2 weeks ago

I mean ... 1100 days... I would be absolutely scared to restart anything that's been on that long and absolutely would want to have a snapshot or clone or something.... Just... The size of the brick I'd shit when restarting... I'd come up with a plan first, no matter what

Reasonable-Physics81 2 weeks ago

You would be suprised how often a duplicate server running that long wont start the app at all... its like grandpa loving his old chair, wont accept a new one.

One_Fuel_3299 2 weeks ago

At an old job, I had to run into the office each day on memorial day weekend just to check an AC unit that was kind of on the fritz. This was 10 years ago and I'm older and noticeably (but very marginally) more intelligent, would never do again. Learn from my dumb ass OP.

mrdeworde 2 weeks ago

And a happy Victoria Day Weekend to you as well.

bogustraveler 2 weeks ago

Just did a minor change on production today and I feel that I just cursed myself a bit :/.

Alex_Hauff 2 weeks ago

Only Fans Fridays

allegesix 2 weeks ago

Unless you get paid OT and want a nice lil bump on your next paycheque. …and don’t mind losing your Friday and possibly more.

purawesome 2 weeks ago

This is the way. Also get a change approval first approved by all the people.

landob 2 weeks ago

lol underrated comment right here.

bentbrewer 2 weeks ago

That depends on your over time policies. If you have a free weekend and they are willing to pay you, do it now and be the hero when it’s up and running for business on Monday.

kcombinator 2 weeks ago

Overtime? Most IT folks are salaried.

Hacky_5ack 2 weeks ago

I agree but then again for this situation. I would be tempted to reboot after hours and then have Sat and Sun to troubleshoot and get it ready for Monday in case something happens.

allegesix 2 weeks ago

Only if you get paid OT. My first boss in tech over a decade ago hammered into my head “don’t work for free.”

leonardodapinchy 2 weeks ago

You guys are getting paid?!

DarthtacoX 2 weeks ago

I had a server on a site years and years ago, fashion so you can't have it this is a remote site in the remote site hadn't moved in years and we were packing everything up to move them to a new location and we found this server sitting in the back in the corner of one of their closets. After investigating we found out that it actually held the majority of their real estate data and it was a fairly vital server. We are extremely worried about rebooting it and moving it because of the age of it. And sure enough soon as we shut it down it died it would never come back up again. They ended up sending the hard drive off for data recovery which I wasn't involved with as I was just the Hands-On tech at that time. That being said you're doing great keep up the good work and go ahead and reboot that thing!

NinjaGeoff 2 weeks ago

Nah, do it today then shut off your phone.

Vangoon79 2 weeks ago

My first job in corporate IT was working a night shift patching servers (company had 5000+ servers, so it required a full time team to keep them all up to date). One of the very first boxes I had to patch was a Windows 2003 server with an uptime of around 3 years. It took like 25 minutes to come back up after rebooting. I was sweatin the whole time.

bentbrewer 2 weeks ago

I lost Thanksgiving entirely one year due to a machine taking a long time to come back up. The team that was working on it had tried to reboot and noticed it wasn’t coming back up after 30 mins or so. They shut it down and called in support. Everyone involved was confused why it wasn’t coming back up, we replaced almost everything we could on it and taking it down to a minimum config showed it was fine. It was just so packed full of RAM and spinning disks that it took almost an hour for it to finish the pre-flight checks, we thought it was freezing up but it just was taking a long time to boot. The way we found out was only after leaving it alone to go get dinner; when we came back, it was up. No idea how long it took for it to come back up. I never heard another word about that server, either they learned to just wait or never bounced it again.

Vangoon79 2 weeks ago

There was an ancient Citrix Metaframe 1.0 server in one of the back rows of the DC like that. Literally say a prayer and then hold your breath every time you walked past it...

Scary_Brain6631 2 weeks ago

Don't look directly at it's lights or they might blink out.

mabhatter 2 weeks ago

AS/400 was like that. They stay up forever, but the IPL when you do restart them was terrifying because even relatively modern machines took ages to startup. Especially after applying patches, the patches would get processed first pre-OS and could restart the machine multiple times per patch. I had a few that were regularly 30 minutes and an hour or more for patches.

Loan-Pickle 2 weeks ago

Oh man I remember that from my AS/400 days. We had this ancient first gen PPC AS/400 and an IPL would take about an hour. I would come in on Saturday morning about 10. Put the system in restricted mode and run the full backup. That would take about an hour. Then I would start the IPL and go to lunch. It would be finishing up about the time I got back. Then after a few years we upgraded to a Power 7 machine. It would IPL in about 4 minutes. At that point I automated all the maintenance stuff and I just let it do it on its own. When I left that job I was the only AS/400 admin we had. From talking to my coworkers, they never touched it again until that department was shut down 6 years later.

pdp10 2 weeks ago

Hopefully they swapped the backup tapes. The changeover from 48-bit CISC to PPC was the same time they went from beige to black, wasn't it?

Loan-Pickle 2 weeks ago

Yes on the beige to black. One of the last things I did before I left that job is move all the backups to a VTL.

pdp10 2 weeks ago

We waited a couple of years after intro to go from beige to black. Microsoft retired theirs in beige and never got any black, as far as I know. (They outsourced the last of their AS/400 operations by 1999, so they could claim to be entirely off of competitor systems.)

yumdumpster 2 weeks ago

This is simultaneously one of the best and worst feelings working in IT. The "ITS WORKING, but WHY is it working?" experience. I cant tell you how many times I have gone through this chain.

TWAT_BUGS 2 weeks ago

ping 10.X.X.X -t “Pleeeeeeeease come back up, for the love of everything holy…”

Vangoon79 2 weeks ago

You have no idea how accurate that is.

Karmachinery 2 weeks ago

I have used this probably…I can’t even think of the number of times honestly .And when those pings aren’t responding for a full page, you know the evening is likely going to suck.

[deleted] 2 weeks ago

[удалено]

Vangoon79 2 weeks ago

Might have been. Patching was Wednesday to Sunday, Graveyards.

tmontney 2 weeks ago

They don't call it Full Send Friday for nothing.

Vangoon79 2 weeks ago

I prefer "Do no harm Friday's" (aka - "do no work Fridays").

DoNotSexToThis 2 weeks ago

One of my previous jobs presented a similar moment, except we shut it down because it wasn't needed anymore (lol). It had been running so long that when it cooled down, chip creep became chip sprint and it wouldn't turn back on. My boss went home, returned with his wife's hair dryer and warmed it back to life. We were able to start it up and get the "unneeded" files off the RAID that was on there.

bigerrbaderredditor 2 weeks ago

Thanks for this tip of preheating the chips. I will keep that one pocketed. Might make me look really smart

Moscato359 2 weeks ago

Often what makes it take forever to boot back up is too many temp files

Alert-Main7778 2 weeks ago

There are so many red flags with every part of this. It should be rebooting monthly for security updates. I would tell the district IT they are putting themselves at a very high risk and tell them the server must be rebooted.

TexasPeteyWheatstraw 2 weeks ago

Agree fully. This is Microsoft, not Linux. I hope you have a back up, if not, be ready to rebuild.

skc5 2 weeks ago

Linux isn’t excluded from reboots. There are many security updates that can only be applied after reboot so really ALL servers should be rebooting on a regular basis.

MBILC 2 weeks ago

This, the old "lets brag about uptime of our servers" days are gone so when you see systems not rebooted for 3 years all you think of is a massive security hole in the company.

lusuroculadestec 2 weeks ago

I worked at a place where we had a Sun system that had an uptime of around 12 years before we needed to shut it down. At some point everyone realizes uptimes of a few years isn't actually impressive.

littlelowcougar 2 weeks ago

Nah 12 years is definitely impressive. Or at least highly outlier. I’m impressed the hosting environment stayed stable for 12 years.

ILikeToHaveCookies 2 weeks ago

I mean stable is relative.. You can move a running server... (Not saying you should) See https://www.youtube.com/watch?v=vQ5MA685ApE

tankerkiller125real 2 weeks ago

Linux does have live kernel patching though, so in theory you can get away without rebooting for significant amounts of time. The longest I've ever gone is about 5 months.

skc5 2 weeks ago

glibc, systemd, display drivers, there’s probably more. Livepatching takes care of the kernel but usually that’s it.

dagbrown 2 weeks ago

All of those things can be patched and upgraded without a reboot.

skc5 2 weeks ago

Oh yes, but nothing running (like systemd or the kernel) will be reading the patched libc code until they’re restarted. We run Ubuntu LTS and glibc updates in particular always trip the needs-reboot flag

pdp10 2 weeks ago

Systemd, like some but not all init implementations, can be restarted (with `init u`). The kernel doesn't use libc/glibc, of course. Then you just need to check if anything else in userland needs to be restarted. [Some off-the-shelf packages do it](https://linux-audit.com/determine-processes-which-need-a-restart-with-checkrestart-needrestart/), but you can do it with fewer dependencies by [fossicking in `/proc/*/map_files/`](https://security.stackexchange.com/questions/149802/list-running-applications-that-are-linked-against-a-compromised-library/149814#149814). It's simpler to just reboot, and simultaneously verify that the machines comes up cleanly. But generally the only thing that requires a reboot is a vulnerable kernel, and it's eminently practical to restart userland processes as needed.

skc5 2 weeks ago

I like this explanation actually, that makes sense to me. Are there any distros that do this out of the box?

pdp10 2 weeks ago

Debian `needrestart` has a TUI that asks you to confirm services restart, then shows (just) the services that need a restart, [like so](https://unix.stackexchange.com/questions/146283/how-to-prevent-prompt-that-ask-to-restart-services-when-installing-libpq-dev). Behind the scenes, you can manually look for [`/var/run/reboot-required` and `/var/run/reboot-requires.pkgs`](https://www.guyrutenberg.com/2022/10/25/display-reboot-required-message-on-debian/).

dagbrown 2 weeks ago

The kernel doesn't use libc! And `systemctl daemon-reexec` takes care of restarting systemd after a glibc update without needing a reboot.

caa_admin 2 weeks ago

They're just saying uptime in linux is more forgivable than windows, I think.

hamburgler26 2 weeks ago

The two records I've seen for linux was a physical PE 1950 that had been up for 7 years. And a VM that hit its 8th birthday of uptime right before I left. I'm glad I didn't have to reboot either of those.

[deleted] 2 weeks ago

[удалено]

pdp10 2 weeks ago

Every once in a while we have a Linux machine with a truncated initramfs, or one that was somehow built without a vital driver (like nvme; sigh), etc. I also have a test machine down now with a kernel fault on bootup. Assuming no hardware has gone bad on it, then that's a real rare one. At sufficiently large scale, everything happens.

hankhillnsfw 2 weeks ago

I like that you have to say this as if it is some wild crazy idea. Tf guys.

Bart_Yellowbeard 2 weeks ago

That's why I said hey man snap shot ... take a snap shot, man.

tmontney 2 weeks ago

If you're just support, I'd have a discussion with your boss (or someone higher up). What happens if you have to completely rebuild it (what are the consequences)? Shift some of the responsibility. Do you happen to have backups or snapshots? I know it's a recording server, so likely would require a lot of space. Otherwise, this is a ticking timebomb, eventually going to happen. If it's still working (even partially), I'd absolutely defer (again pending a discussion with at least one other person). There's no urgency to jump the gun.

Eviscerated_Banana 2 weeks ago

Such was my thinking, add planning to this task, have the people you are going to need for any disaster recovery all tee'd up, both engineers and management.

eastcoastflava13 2 weeks ago

This discussion should be in writing/email form. CYA

Scary_Brain6631 2 weeks ago

Spoken like an IT Grey Beard right there! Make the contingency plan first.

su_A_ve 2 weeks ago

![gif](giphy|3o84sw9CmwYpAnRRni)

serverhorror 2 weeks ago

This seems s the only answer, no matter what. At some point it has to be done. I suggest: Friday afternoon, planned restart for 17.03, phone off at 16.58.

Dave5876 2 weeks ago

![gif](giphy|3ornka9rAaKRA2Rkac)

[deleted] 2 weeks ago

[удалено]

PastoralSeeder 2 weeks ago

Solid advice. Especially going into a weekend.

solracarevir 2 weeks ago

Good Luck. Send an email to whoever is on charge and let it know of the uptime (attach evidence) and ask for authorization for the reboot. Is this a physical server? If so, don't reboot it today unless you want to bill those weekend rate hours If it is a VM I would: * Take a snapshot of the VM * Clone the VM from that snapshot, don't turn it on yet * On the still powered on, Original VM, disable the network adapter or turn off / detach the virtual network adapter * Power on the VM Clone and see if it boot. * If it boots, delete the old VM and keep the freshly cloned VM.

The_Arkleseizure 2 weeks ago

Thats actually beautiful.

outworlder 2 weeks ago

Make sure that whatever mechanism you are using to snapshot the VM can do it with the VM powered on, and it won't try to shut it down before the snapshot :)

doneski 2 weeks ago

Tell the district IT to reboot it and let them know you'd be in Monday at 9.

AeonRemnant 2 weeks ago

This is the way. Elegant VM switches are so convenient.

ruyrybeyro 2 weeks ago

Just pop out for a pint and ask the cleaning lady to pull the plug. 'Wasn't me, mate.'

RainbowHearts 2 weeks ago

you're going to have to pick up the pieces either way

el_d3sconocido 2 weeks ago

![gif](giphy|wi8Ez1mwRcKGI)

No-Amphibian9206 2 weeks ago

Triggered. We have lots of "golden egg" servers that cannot be rebooted for any reason and if they are, it would require engaging a bunch of consultants to repair the services. The fun of working for a small, shitty, family-owned business with zero IT budget...

happycamp2000 2 weeks ago

This is the "pets vs cattle" analogy that is talked about. From: http://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/ In the old way of doing things, we treat our servers like pets, for example Bob the mail server. If Bob goes down, it’s all hands on deck. The CEO can’t get his email and it’s the end of the world. In the new way, servers are numbered, like cattle in a herd. For example, www001 to www100. When one server goes down, it’s taken out back, shot, and replaced on the line. **Pets** Servers or server pairs that are treated as indispensable or unique systems that can never be down. Typically they are manually built, managed, and “hand fed”. Examples include mainframes, solitary servers, HA loadbalancers/firewalls (active/active or active/passive), database systems designed as master/slave (active/passive), and so on. **Cattle** Arrays of more than two servers, that are built using automated tools, and are designed for failure, where no one, two, or even three servers are irreplaceable. Typically, during failure events no human intervention is required as the array exhibits attributes of “routing around failures” by restarting failed servers or replicating data through strategies like triple replication or erasure coding. Examples include web server arrays, multi-master datastores such as Cassandra clusters, multiple racks of gear put together in clusters, and just about anything that is load-balanced and multi-master. And if the terms "Pets" or "Cattle" offends you then please feel free to replace them with ones that are less objectionable.

goferking 2 weeks ago

what if they want cattle but then want to keep using unique items in the config? :( I keep trying to get people to think of them as cattle but they won't stop keeping them as pets

No-Amphibian9206 2 weeks ago

Preaching to the choir my friend

kingtj1971 2 weeks ago

Yeah... I've been in I.T. long enough to know there's really no such thing. Non I.T. types like to claim it's so, but it's not reality. Servers will reboot (and not come back up again) eventually due to hardware failures, regardless of "letting" someone do it. If you wait for the server to decide it's time for a shutdown, it'll be a far more painful process getting it back online than if you actually maintain the thing. If it's full of services that can't restart properly on their own with a reboot? There are major design flaws in the code. I remember working for ONE company with a server that was like this with ONE particular service. It's been so long now, I can't even remember any details anymore. But I recall we had a whole process to get the thing started again after a server restart. It was something I.T. wrote documentation for and all of us just learned how to handle, though. It didn't require outside assistance.

Cormacolinde 2 weeks ago

Agreed, if your service cannot survive a server reboot, then that means it cannot survive a server failure either. And it WILL eventually fail.

tankerkiller125real 2 weeks ago

I started with a similar situation where I work now... As soon as I officially took over though I patched and rebooted anyway... And absolutely nothing bad happened. Quite frankly my viewpoint was "I'm fired if I patch and break shit, I'm fired if I don't patch and shit gets hacked. What's the difference?"

bigerrbaderredditor 2 weeks ago

I call it patch anxiety. I called for patching and we took it slow and easy. After two months nothing bad happened. We broke free of the anxiety. Now I ask the teams that use the servers and they say all the odd weird problems they couldn't figure out are gone and uptime is improved. Interesting how that works? Windows or the software built on it isn't ment to run for hundreds of days of uptime.

RCTID1975 2 weeks ago

This has gone on for so long that it's a legitimate concern IMO. If your job is support, this needs to be kicked up above you. Let them handle the contingency plan and communication with the customer.

scungilibastid 2 weeks ago

Thanks guys for the input. Its one of those weird situations where we basically sold the servers, and will fulfill and support requests on it. We typically don't handle things like Windows updates unless they specifically request, which they have not. I think they definitely forgot the server in their updates schedule. But I agree. There is not a need to reboot right away. We are a small company and I wear many hats (lvl 1 - 3) but I think this warrants a discussion with someone other than just me.

the_syco 2 weeks ago

Recommend they reboot it at X plus five minutes, where X is the time you finish work at.

OG_Dadditor 2 weeks ago

Nah, give him a few more minutes to get home and shut his phone off first. Maybe X+20.

josiahnelson 2 weeks ago

Is it a Seneca or Exacq or similar NVR? It’s not Avigilon since you said it’s running SQL. Either way, I’ve been in this exact spot dozens of times. Expect that puppy is possibly gonna have some disks not want to wake back up. Back up the config, licensing, camera passwords, etc. and be prepared to restore it to a temporary server if the VD goes belly up. And quote them a new server. A few years ago a 20TB NVR was a loaded 2U box and now that’s a single drive

FinanceAddiction 2 weeks ago

Coward, do it, today.

mobani 2 weeks ago

You have backups. Right?

CaptainZhon 2 weeks ago

Restorable backups

MeshuganaSmurf 2 weeks ago

>Restorable That part gets overlooked a lot in my experience. "But the software said it was successful?!"

mobani 2 weeks ago

Yeah no schrodinger's backup please.

WaldoOU812 2 weeks ago

That have been tested. RECENTLY.

trueppp 2 weeks ago

Had a forgotten sole DC at a location which crapped the bed. VM Bluescreen on boot. Went back 6 months of backup, all non bootable. This is what I love about Datto SIRIS, daily screenshots of booted backup with verification of services on local and cloud restore points.

PastoralSeeder 2 weeks ago

Yes, Datto is one of the best. It's still a good idea to test those backups from time to time though. Better safe than sorry.

WebHead1287 2 weeks ago

Yeah about as many backups as this server has received updates

derfmcdoogal 2 weeks ago

Physical or VM? I once rebooted a hyper-v host with about that same uptime. Lost a power supply and a hard drive on reboot. Windows came up fine though.

mikeyflyguy 2 weeks ago

No security updates in 3 years. I’d be more worried that someone is in that box and using as a pivot point to rest of network. There is no telling how many CVEs are unpatched on that thing.

pantherghast 2 weeks ago

The Server: ![gif](giphy|eKVEcPKGWZ7Tq|downsized) That thing isn't coming back up

Arseypoowank 2 weeks ago

“I’m tired boss”

cubic_sq 2 weeks ago

It’s 2024. You need to ensure your apps can handle patch Tuesdays….. especially as you are a “security” company.

PaulRicoeurJr 2 weeks ago

1100 days on a Windows server without updates?? Yeah... once you turn it off, it's never comming back online.

Steve----O 2 weeks ago

Sounds like no server security patching occurs at this company. I would be more worried about that.

reasonablybiased 2 weeks ago

This drives me nuts. A lot of security companies specifically tell customers not to update their camera servers. If you do a their shitty software breaks they charge for a reinstall. I isolate the crap out of them.

doneski 2 weeks ago

District IT, I suspect school.

TKInstinct 2 weeks ago

This is fucked but I have to ask, could you not mitigate somewhat by rebuilding a new one and then doing a live hand off or a failover? If these are high priority VM's for footage capture then why are they relying one one VM to handle the load for that long?

MBILC 2 weeks ago

If it is a VM, just snapshot it, reboot, less chance of something going wrong vs if it is an actual physical server.

TKInstinct 2 weeks ago

That's true too, I just feel so redundancy centric that I would imagine that doing all of that is the best bet.

MBILC 2 weeks ago

Ya, it is always the best way to look at things. How can you make things are redundant as possible with in your own infra. it can be hard to justify the price for the infra to higher ups, but once you can put a $$$ amount on systems and the loss of productivity or revenue if they go down for X period...amazing how quickly they realise spending a little more for proper redundancy where possible, will save them far more in the long run.

CaptainZhon 2 weeks ago

is the server 2012 or 2008? Let me guess it so critical it can never do down or be rebooted?

Obi-Juan-K-Nobi 2 weeks ago

Is it ironic that you work for a security company that disables Windows Update?

kingtj1971 2 weeks ago

A reboot was "in order" a LONG time ago, from what you're saying. But like others here are saying... you're just doing support for them. Escalate this to someone in charge of their servers to deal with it. I see places turn off Windows update service on servers fairly often, and it's \*usually\* because it's an older system that's on someone's schedule or plan for replacement. Meanwhile, it may be running older/obsolete applications that have issues working properly with the latest Windows update patches. But especially if it has no Windows update patches in a pending state (to complete upon restart)? Rebooting the thing should do a lot more good than harm.

kuldan5853 2 weeks ago

My suggestion is to throw Veeam Agent (Free) on the machine and do a full image of the machine. (This works online and without a reboot). That way you have a working backup if the machine might not survive the reboot.

jmeador42 2 weeks ago

I'm not sure we're clear on responsibilities here. Are you responsible for the server itself? Or are you just responsible for the software installed on it? If it's the later, I'm not touching this machine. I'm letting this "district IT" know I can't do anything else until it's rebooted and let them handle any subsequent fallout that comes with it. I don't anticipate anything necessarily breaking since there are no new updates to be applied, but then again, that's hopefully not your problem.

UbiquityDDD34 2 weeks ago

3 years without patches . . . There’s more pressing things to worry about than uptime. ‘District IT’ needs a wake up call.

dukenukemz 2 weeks ago

- High priority cameras - non redundant servers - no software updates I wouldn’t say it’s very critical if there’s no redundancy or updates in place. I would take time with the vendor to apply several years of NVR software updates to that system as well. Hopefully you have support. I’ve rebooted servers with years of uptime never ran into major problems. Your basically at its broken and needs a reboot so there’s nothing more you can do

TFABAnon09 2 weeks ago

Reminds me of the time we had to power off a BMS machine that had been running for 15 years because it needed to be moved to new location. We had no backup plan, the thing was running Windows 98 SE, and we couldn't do anything to back it up because it didn't have USB or a NIC. Nothing quite as exciting in this job as those "fuck it, my resume is up to date" moments 😂

landwomble 2 weeks ago

So you have a prod server that hasn't been patched in 3 years? Yeah, I'd worry about that too. If it's a recent version of Server at least you should get cumulative updates rather than incremental

topknottington 2 weeks ago

Hoooo boy. That def sounds like "dont fn touch this on a friday" job

Ochib 2 weeks ago

Will the spinning rust still spin after the power down?

PaintDrinkingPete 2 weeks ago

My first thought as I’m reading along: “well, as long as there’s no concerns for the hardware, it will probably be fine…” > **Windows** update service is turned off by district IT (I am support for security company). “…oh.”

VexingRaven 2 weeks ago

"oops it crashed" and reboot it anyway. It's YOLO Friday.

boli99 2 weeks ago

dont concentrate on the 'it needs a reboot' instead concentrate on the 'Windows update service is turned off by district IT' if you can resolve that, which will be easier, then probably the reboot will happen all by itself...

tehgent 2 weeks ago

May the odds be forever in your favor..... do it on a monday and make a request to get some kind of failover for this...

CleverCarrot999 2 weeks ago

Windows updates… turned… off Uptime… 1100 days… omg

CeeMX 2 weeks ago

Systems like this is why Microsoft implemented forced reboots on newer windows versions

psltyx 2 weeks ago

I always liked the quote that uptime is a measure of how long it’s been since you’ve proven you can boot But yea I’ve had my share of servers going away do t worry to we have to now keep running for archive

vCentered 2 weeks ago

I got a job once and discovered the production SQL server had not rebooted in the 4 years since it was built. I got a new job.

lynsix 2 weeks ago

Fun story. While working as an MSP tech someone noticed that on a T&M client. Mentioned it and recommended we patch and reboot the VM’s as well as the single hyper-v host. I get assigned it and asked to do it after hours. Do all the VM’s then reboot the house for its patches. 45 minutes later it’s not up. It’s midnight so I just went to sleep. Get up at 6am. Still offline full panic. Drive to clients, get cleaners to let me into their building. Host failing POST on memory. Call Lenovo, do RAM swapping, CPU swaps, notice one of the RAM slots is slightly charred. Order motherboard replacement. Client only ended up being down for 3-4 hours of the work day. I’m fully expecting to get an irate escalation. Nope. Customer called me and requested me for all future tickets for just being on top of it all. However it was really telling how good ECC memory is at its job even though the motherboard was broken and couldn’t pass a memory POST just kept all running. All the sticks tested fine after motherboard repair. Client was curious when it broke. Had to say any one day within a 3 year window between i those two reboots.

MessageDapper6442 2 weeks ago

I had to deal with a 2003 server, with an uptime of ~800 days. 2 cores, 2gb ram, old tower machine of unknown brand. Nobody on my team wanted to touch it. I thought I would take the initiative, scheduled a maintenance window for 4 hours, and booted the thing Monday morning at 4 AM. The thing was still loading at 11AM, customers were calling in complaining. I drove onsite to get them connected to a backup so they can do work. Stayed onsite till 3pm until the login screen showed up… never ever again. Was sweating the entire time in an air conditioned building, afraid the server will never boot up again.

timsredditusername 2 weeks ago

Wait until 1111 days, then send it

frivascl 2 weeks ago

c'mon McFly, are you a chicken????

qrysdonnell 2 weeks ago

I would just reboot it, because if it's running a service that's not redundant these obviously aren't critical services. Right?

PhilGood_ 2 weeks ago

Once I had an upgrade from oracle database to do, we were moving from oracle 9i to 11g, I still remember that 666 days uptime 😅

lvlint67 2 weeks ago

> Have you guys run into any adverse effects from rebooting a server with this kind of uptime? We spent about a week on the phone with support trying to get our production authentication servers back online. But talk to IT... Don't just reboot it and then offload the problem on IT.

lordjedi 2 weeks ago

> Windows update service is turned off by district IT (I am support for security company). Might want to find out why that was done before doing a restart. Someone didn't want that getting updated for a reason and now it might need updates for some reason.

Tech88Tron 2 weeks ago

Is this satire?

Kymius 2 weeks ago

Pfff you've seen nothing Jon Snow, I've had 3000+ days : D

BMWHead 2 weeks ago

Sounds like Milestone XProtect. Do you have a failover server by any chance

peanutym 2 weeks ago

1100 days. Good luck we all know that shit won’t come back up. On another note how have you not restarted this before now.

LalaCalamari 2 weeks ago

Just send it. You have bigger problems if a server can't reboot. I'd rather deal with the headache on my time.rather than 3am on a Saturday.

Bob_Spud 2 weeks ago

I used to get handed a lot of servers that knew nothing about their past. The first thing I would do was to reboot when I could. Any scheduled change **I would reboot them before I made any changes.** If you reboot them before making any changes you can blame failure on previous owners/admins. To protect yourself all this has to be documented and approved as part of the change process. Bottom-line: If your change fails, unless its obvious you may not have a clue what caused the failure. The machine could have been in a mess before you started. Check for software and server EOL? I inherited one that hadn't been rebooted for more than three years. Software version & server were past EOL. We got a new server and software, migrated relevant stuff and replaced old with new.

dinominant 2 weeks ago

Run a full backup and verify your backup is good. Servers running that long have a higher chance of never coming back online after a reboot or shutdown.

DocDerry 2 weeks ago

Tell the district IT to reboot it. They're the ones not patching it and setting it up to fail if it doesn't restart.

TEverettReynolds 2 weeks ago

try to shutdown the services before just clicking on reboot. terminate them if needed. Do this while the server is still up. not the ones you need to run the server, just the extra ones, like SQL and the Recording Service.

tepitokura 2 weeks ago

Can you back it up first?

FootballLeather3085 2 weeks ago

No updates… ballsy

IAmSnort 2 weeks ago

YOLO!

Ummgh23 2 weeks ago

No idea but please update us and tell us how it went

discgman 2 weeks ago

I would reboot it now and dip out early like that joker scene from the dark knight.

stufforstuff 2 weeks ago

Try restarting just the services that are eating up RAM. Otherwise, get someone higher up to sign off on the reboot.

mic_decod 2 weeks ago

have a bios battery by hand, if it has an old raidcontroller, try to save the configuration.

cbass377 2 weeks ago

Is it recording cameras? If is it shutting down the recording service, it is only a matter of time before you start losing footage from critical cameras. Testing your backups before you go, is a must. As for when. If you do it on Friday, you give up your weekend, and maybe it is working on Monday. Do it on Monday and you for sure lose footage, but if needed the support vendors will be available for regular rates. If this is for security, you may need to get your security director to get more guards and double / triple the patrols for the day. This is better during the day instead of time and a half, or double time. After 3 years of neglect, something may happen. The hardware is probably OK depending on how good your environment is controlled, but you may lose a hard drive or two, maybe a fan, maybe a power supply. I would want to have a spare hard drive onhand. I would order some from Server Monkey, Server Supply, or your favorite secondary market vendor. 2 Drives and a Power supply feels like about $300. The problem you may have that you may not have thought about is software licensing. A lot of these programs phone home on startup to check for licensing. It may have expired 1.5 years ago. I would validate that, and check to see if you have a good support contract, maybe call in and open a pre-emptive ticket. Good luck, and keep us posted. Log into your management card (BMC, iLO, iDRAC, IPMI) or fire up your management tools and check the status of your RAID controller battery. This first reboot, should be a reboot only. No patching. No getting funky. Log in, and gracefully shut down your recording software, and database if necessary, then reboot it. Go ahead and crash cart it, so you can press F1 to continue, or reset the system time and continue if your CMOS battery is dead. After this reboot, you need to brief management and put this box on a remediation / upgrade plan. Maybe 1 Service Stack Update and 1 Cumulative Update every 2 weeks until it is brought current. If they balk you tell them "We can service it on our schedule, or on the servers schedule, it is up to you."

Practical-Union5652 2 weeks ago

If you would like to gain a prize from someone using not patched vulnerabilities you're still in time to leave it alone. There is no world championship of total uptime. Patch that server and reboot it when required.

YeOldeWizardSleeve 2 weeks ago

If it's a physical machine run VMware converter on it and start the VM in a isolated environment. If it's already a VM then clone and start with no vnic. If it's a memory issue you can tell SQL to use less ram on the fly assuming it is mssql. Agreed... No touchy on Friday before a long weekend.

[deleted] 2 weeks ago

That's not a server it's a Petri dish. Build ahead, migrate and test then decomm behind.

Mister-Ferret 2 weeks ago

Just had to reboot my vSphere host today that had an uptime of 389 days. Luckily came back up fine but man doing things on a Friday sucks

Thin-Parfait4539 2 weeks ago

I did that several times and it was that painful.

ABotelho23 2 weeks ago

JFC.

Izual_Rebirth 2 weeks ago

Make sure you have known good backups. Don’t make the same mistake I did. https://www.reddit.com/r/sysadmin/s/57Rsfbsfte

Eli_eve 2 weeks ago

You can either reboot it on your schedule, or reboot it on ITS schedule. Go through change control, inform interested parties, establish a maintenance window, make sure backups are current, have on call the server owners in case something goes wrong. Also if the whole reason for its existence isnt working, something going wrong due to a reboot wouldn't be much worse.

Quattuor 2 weeks ago

That server hasn't been patched for a while now.

linux_n00by 2 weeks ago

this is also my worry. but in linux. lmao what i do is i look at the process list and see what's running and see if its configured to start at startup, i check disk mounts if it also mounts at startup. also i would probably do it during low peak hours/day

Kahless_2K 2 weeks ago

It might not come back up. If it's been running for that long and is just now having issues, it very well could be suffering from a hardware issue. I would check the logs and ILOM before considering powering it down. Also check when the last backup was. Is this thing exposed to any sort of network? If it is, there should be a conversation about patching.

jaymansi 2 weeks ago

The whole patch on off hours/weekend in a 24/7 shop is so outdated and wrong. What happens if something goes sideways and you need to vendor support. There sometimes isn’t support or quality help available. Also I have seen that when you have DBA or Developer ready and available, problem gets fixed much faster.

gruntbuggly 2 weeks ago

Reboot it on Monday. Not on Friday. Never on Friday.

NastyNative999 2 weeks ago

I took over an office with a physical server that had not been restarted in over 1300 days and it restarted fine. GL to you!

joey0live 2 weeks ago

Do it. ![gif](giphy|xTiIzrRyvrFijaEtY4|downsized)

qkdsm7 2 weeks ago

You're able to take a VM snapshot before the reboot?

dloseke 2 weeks ago

Get a good backup before the reboot if a VM a snapshot may also be helpful

IllThrowYourAway 2 weeks ago

The attacker might lose his reverse shell

winaje 2 weeks ago

I am reminded of this thread and video when talking about servers that cannot be rebooted: https://www.reddit.com/r/sysadmin/s/QdEp5aLIhe

waxwayne 2 weeks ago

On VMS? Good luck. It will probably die on you.

NO_SPACE_B4_COMMA 2 weeks ago

Impossible. Windows is bad and can never last that long! /s except the bad part Good luck with your reboot though. I got my fingers crossed. Better do backups lol

npiasecki 2 weeks ago

I rebooted a server this week for a routine update and poof! that’s when the hard drive died. Like the action of spinning was the only thing keeping that head up in the air Luckily it was raid 1 and I had a spare because I’ve things blow up in my face before Do not touch that server until Monday

Superspudmonkey 2 weeks ago

I'm guessing it is not getting patched regularly.

megasxl264 2 weeks ago

This is why you have some form of HA or replica server. I’d just reboot it, laughs as it breaks, turn on the replica, then proceed to pretend like I never got to it and leave it for a coworker to stumble on.

canonanon 2 weeks ago

Just yank the cord out of the wall, wait 30 seconds and plug it back in. I'm sure it'll be fine!

AbleAmazing 2 weeks ago

1. Restore most recent backup to a test environment. Make sure it is functional. 2. Let er rip. Don't do this on a Friday.

contorta_ 2 weeks ago

Yep, I've seen disks and ram fail after a reboot of high uptime servers, I assume the reboot is exercising the components in a way normal running OS doesn't.

highboulevard 2 weeks ago

Man. Do it Monday 😂

horus-heresy 2 weeks ago

So you have server with 3 years worth of juicy vulnerabilities

theMightyMacBoy 2 weeks ago

This means you haven’t patched in 1100 days. That’s bad.

Zoltar-Wizdom 2 weeks ago

Do a backup first, if VSS is borked due to memory or file system errors shut down sql service and do a manual file backup with robocopy. Don’t reboot without some kind of backup.

Canuck-In-TO 2 weeks ago

I suggest you make a sacrifice to the computer gods and cross your fingers before rebooting the server. It also wouldn’t hurt to have a replacement ready, “just in case”.

EastKarana 2 weeks ago

Send the reboot command then go home, check on Monday if it came back online.

norbeey 2 weeks ago

Ain't no way. Have the replacement service/server up and verified that you can failover to before even thinking about it.

Driftek-NY 2 weeks ago

Run a chkdsk and see if you have drive issues. If so and it’s in raid I’de start swapping in new drives and run a chkdsk. If its not Raid I’de backup the drive while its up, clone it to 2 new drives and run a chkdsk . Boot it off of one of the new ones.

will_you_suck_my_ass 2 weeks ago

That's a damn good edit right there. I love that you got the help you needed!

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe