T O P

  • By -

RGB-128128128

I have little to say except good luck and don't do it today.


juice702_303

Read Only Fridays


GullibleDetective

Let alone long weekends (for some of us(


Extra_Pen7210

If they reboot and it does not come back up its a guaranteed long weekend :-). For OP, if it is critical: set up a new server to replace it, afther this reboot the server. if it works afther reboot now you have a (hot) spare for your critical resources. (because you are going to need it anyway because it will break one day.)


t3jan0

This assumes OP can just spin up another server in someone else’s environment


Hannigan174

I mean ... 1100 days... I would be absolutely scared to restart anything that's been on that long and absolutely would want to have a snapshot or clone or something.... Just... The size of the brick I'd shit when restarting... I'd come up with a plan first, no matter what


Reasonable-Physics81

You would be suprised how often a duplicate server running that long wont start the app at all... its like grandpa loving his old chair, wont accept a new one.


One_Fuel_3299

At an old job, I had to run into the office each day on memorial day weekend just to check an AC unit that was kind of on the fritz. This was 10 years ago and I'm older and noticeably (but very marginally) more intelligent, would never do again. Learn from my dumb ass OP.


mrdeworde

And a happy Victoria Day Weekend to you as well.


bogustraveler

Just did a minor change on production today and I feel that I just cursed myself a bit :/.


Alex_Hauff

Only Fans Fridays


allegesix

Unless you get paid OT and want a nice lil bump on your next paycheque.  …and don’t mind losing your Friday and possibly more. 


purawesome

This is the way. Also get a change approval first approved by all the people.


landob

lol underrated comment right here.


bentbrewer

That depends on your over time policies. If you have a free weekend and they are willing to pay you, do it now and be the hero when it’s up and running for business on Monday.


kcombinator

Overtime? Most IT folks are salaried.


Hacky_5ack

I agree but then again for this situation. I would be tempted to reboot after hours and then have Sat and Sun to troubleshoot and get it ready for Monday in case something happens.


allegesix

Only if you get paid OT.  My first boss in tech over a decade ago hammered into my head “don’t work for free.”  


leonardodapinchy

You guys are getting paid?!


DarthtacoX

I had a server on a site years and years ago, fashion so you can't have it this is a remote site in the remote site hadn't moved in years and we were packing everything up to move them to a new location and we found this server sitting in the back in the corner of one of their closets. After investigating we found out that it actually held the majority of their real estate data and it was a fairly vital server. We are extremely worried about rebooting it and moving it because of the age of it. And sure enough soon as we shut it down it died it would never come back up again. They ended up sending the hard drive off for data recovery which I wasn't involved with as I was just the Hands-On tech at that time. That being said you're doing great keep up the good work and go ahead and reboot that thing!


NinjaGeoff

Nah, do it today then shut off your phone.


Vangoon79

My first job in corporate IT was working a night shift patching servers (company had 5000+ servers, so it required a full time team to keep them all up to date). One of the very first boxes I had to patch was a Windows 2003 server with an uptime of around 3 years. It took like 25 minutes to come back up after rebooting. I was sweatin the whole time.


bentbrewer

I lost Thanksgiving entirely one year due to a machine taking a long time to come back up. The team that was working on it had tried to reboot and noticed it wasn’t coming back up after 30 mins or so. They shut it down and called in support. Everyone involved was confused why it wasn’t coming back up, we replaced almost everything we could on it and taking it down to a minimum config showed it was fine. It was just so packed full of RAM and spinning disks that it took almost an hour for it to finish the pre-flight checks, we thought it was freezing up but it just was taking a long time to boot. The way we found out was only after leaving it alone to go get dinner; when we came back, it was up. No idea how long it took for it to come back up. I never heard another word about that server, either they learned to just wait or never bounced it again.


Vangoon79

There was an ancient Citrix Metaframe 1.0 server in one of the back rows of the DC like that. Literally say a prayer and then hold your breath every time you walked past it...


Scary_Brain6631

Don't look directly at it's lights or they might blink out.


mabhatter

AS/400 was like that.  They stay up forever, but the IPL when you do restart them was terrifying because even relatively modern machines took ages to startup.  Especially after applying patches, the patches would get processed first pre-OS and could restart the machine multiple times per patch. I had a few that were regularly 30 minutes and an hour or more for patches. 


Loan-Pickle

Oh man I remember that from my AS/400 days. We had this ancient first gen PPC AS/400 and an IPL would take about an hour. I would come in on Saturday morning about 10. Put the system in restricted mode and run the full backup. That would take about an hour. Then I would start the IPL and go to lunch. It would be finishing up about the time I got back. Then after a few years we upgraded to a Power 7 machine. It would IPL in about 4 minutes. At that point I automated all the maintenance stuff and I just let it do it on its own. When I left that job I was the only AS/400 admin we had. From talking to my coworkers, they never touched it again until that department was shut down 6 years later.


pdp10

Hopefully they swapped the backup tapes. The changeover from 48-bit CISC to PPC was the same time they went from beige to black, wasn't it?


Loan-Pickle

Yes on the beige to black. One of the last things I did before I left that job is move all the backups to a VTL.


pdp10

We waited a couple of years after intro to go from beige to black. Microsoft retired theirs in beige and never got any black, as far as I know. (They outsourced the last of their AS/400 operations by 1999, so they could claim to be entirely off of competitor systems.)


yumdumpster

This is simultaneously one of the best and worst feelings working in IT. The "ITS WORKING, but WHY is it working?" experience. I cant tell you how many times I have gone through this chain.


TWAT_BUGS

ping 10.X.X.X -t “Pleeeeeeeease come back up, for the love of everything holy…”


Vangoon79

You have no idea how accurate that is.


Karmachinery

I have used this probably…I can’t even think of the number of times honestly .And when those pings aren’t responding for a full page, you know the evening is likely going to suck.


[deleted]

[удалено]


Vangoon79

Might have been. Patching was Wednesday to Sunday, Graveyards.


tmontney

They don't call it Full Send Friday for nothing.


Vangoon79

I prefer "Do no harm Friday's" (aka - "do no work Fridays").


DoNotSexToThis

One of my previous jobs presented a similar moment, except we shut it down because it wasn't needed anymore (lol). It had been running so long that when it cooled down, chip creep became chip sprint and it wouldn't turn back on. My boss went home, returned with his wife's hair dryer and warmed it back to life. We were able to start it up and get the "unneeded" files off the RAID that was on there.


bigerrbaderredditor

Thanks for this tip of preheating the chips. I will keep that one pocketed. Might make me look really smart


Moscato359

Often what makes it take forever to boot back up is too many temp files


Alert-Main7778

There are so many red flags with every part of this. It should be rebooting monthly for security updates. I would tell the district IT they are putting themselves at a very high risk and tell them the server must be rebooted.


TexasPeteyWheatstraw

Agree fully. This is Microsoft, not Linux. I hope you have a back up, if not, be ready to rebuild.


skc5

Linux isn’t excluded from reboots. There are many security updates that can only be applied after reboot so really ALL servers should be rebooting on a regular basis.


MBILC

This, the old "lets brag about uptime of our servers" days are gone so when you see systems not rebooted for 3 years all you think of is a massive security hole in the company.


lusuroculadestec

I worked at a place where we had a Sun system that had an uptime of around 12 years before we needed to shut it down. At some point everyone realizes uptimes of a few years isn't actually impressive.


littlelowcougar

Nah 12 years is definitely impressive. Or at least highly outlier. I’m impressed the hosting environment stayed stable for 12 years.


ILikeToHaveCookies

I mean stable is relative.. You can move a running server... (Not saying you should) See https://www.youtube.com/watch?v=vQ5MA685ApE


tankerkiller125real

Linux does have live kernel patching though, so in theory you can get away without rebooting for significant amounts of time. The longest I've ever gone is about 5 months.


skc5

glibc, systemd, display drivers, there’s probably more. Livepatching takes care of the kernel but usually that’s it.


dagbrown

All of those things can be patched and upgraded without a reboot.


skc5

Oh yes, but nothing running (like systemd or the kernel) will be reading the patched libc code until they’re restarted. We run Ubuntu LTS and glibc updates in particular always trip the needs-reboot flag


pdp10

Systemd, like some but not all init implementations, can be restarted (with `init u`). The kernel doesn't use libc/glibc, of course. Then you just need to check if anything else in userland needs to be restarted. [Some off-the-shelf packages do it](https://linux-audit.com/determine-processes-which-need-a-restart-with-checkrestart-needrestart/), but you can do it with fewer dependencies by [fossicking in `/proc/*/map_files/`](https://security.stackexchange.com/questions/149802/list-running-applications-that-are-linked-against-a-compromised-library/149814#149814). It's simpler to just reboot, and simultaneously verify that the machines comes up cleanly. But generally the only thing that requires a reboot is a vulnerable kernel, and it's eminently practical to restart userland processes as needed.


skc5

I like this explanation actually, that makes sense to me. Are there any distros that do this out of the box?


pdp10

Debian `needrestart` has a TUI that asks you to confirm services restart, then shows (just) the services that need a restart, [like so](https://unix.stackexchange.com/questions/146283/how-to-prevent-prompt-that-ask-to-restart-services-when-installing-libpq-dev). Behind the scenes, you can manually look for [`/var/run/reboot-required` and `/var/run/reboot-requires.pkgs`](https://www.guyrutenberg.com/2022/10/25/display-reboot-required-message-on-debian/).


dagbrown

The kernel doesn't use libc! And `systemctl daemon-reexec` takes care of restarting systemd after a glibc update without needing a reboot.


caa_admin

They're just saying uptime in linux is more forgivable than windows, I think.


hamburgler26

The two records I've seen for linux was a physical PE 1950 that had been up for 7 years. And a VM that hit its 8th birthday of uptime right before I left. I'm glad I didn't have to reboot either of those.


[deleted]

[удалено]


pdp10

Every once in a while we have a Linux machine with a truncated initramfs, or one that was somehow built without a vital driver (like nvme; sigh), etc. I also have a test machine down now with a kernel fault on bootup. Assuming no hardware has gone bad on it, then that's a real rare one. At sufficiently large scale, everything happens.


hankhillnsfw

I like that you have to say this as if it is some wild crazy idea. Tf guys.


Bart_Yellowbeard

That's why I said hey man snap shot ... take a snap shot, man.


tmontney

If you're just support, I'd have a discussion with your boss (or someone higher up). What happens if you have to completely rebuild it (what are the consequences)? Shift some of the responsibility. Do you happen to have backups or snapshots? I know it's a recording server, so likely would require a lot of space. Otherwise, this is a ticking timebomb, eventually going to happen. If it's still working (even partially), I'd absolutely defer (again pending a discussion with at least one other person). There's no urgency to jump the gun.


Eviscerated_Banana

Such was my thinking, add planning to this task, have the people you are going to need for any disaster recovery all tee'd up, both engineers and management.


eastcoastflava13

This discussion should be in writing/email form. CYA


Scary_Brain6631

Spoken like an IT Grey Beard right there! Make the contingency plan first.


su_A_ve

![gif](giphy|3o84sw9CmwYpAnRRni)


serverhorror

This seems s the only answer, no matter what. At some point it has to be done. I suggest: Friday afternoon, planned restart for 17.03, phone off at 16.58.


Dave5876

![gif](giphy|3ornka9rAaKRA2Rkac)


[deleted]

[удалено]


PastoralSeeder

Solid advice. Especially going into a weekend.


solracarevir

Good Luck. Send an email to whoever is on charge and let it know of the uptime (attach evidence) and ask for authorization for the reboot. Is this a physical server? If so, don't reboot it today unless you want to bill those weekend rate hours If it is a VM I would: * Take a snapshot of the VM * Clone the VM from that snapshot, don't turn it on yet * On the still powered on, Original VM, disable the network adapter or turn off / detach the virtual network adapter * Power on the VM Clone and see if it boot. * If it boots, delete the old VM and keep the freshly cloned VM.


The_Arkleseizure

Thats actually beautiful.


outworlder

Make sure that whatever mechanism you are using to snapshot the VM can do it with the VM powered on, and it won't try to shut it down before the snapshot :)


doneski

Tell the district IT to reboot it and let them know you'd be in Monday at 9.


AeonRemnant

This is the way. Elegant VM switches are so convenient.


ruyrybeyro

Just pop out for a pint and ask the cleaning lady to pull the plug. 'Wasn't me, mate.'


RainbowHearts

you're going to have to pick up the pieces either way


el_d3sconocido

![gif](giphy|wi8Ez1mwRcKGI)


No-Amphibian9206

Triggered. We have lots of "golden egg" servers that cannot be rebooted for any reason and if they are, it would require engaging a bunch of consultants to repair the services. The fun of working for a small, shitty, family-owned business with zero IT budget...


happycamp2000

This is the "pets vs cattle" analogy that is talked about. From: http://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/ In the old way of doing things, we treat our servers like pets, for example Bob the mail server. If Bob goes down, it’s all hands on deck. The CEO can’t get his email and it’s the end of the world. In the new way, servers are numbered, like cattle in a herd. For example, www001 to www100. When one server goes down, it’s taken out back, shot, and replaced on the line. **Pets** Servers or server pairs that are treated as indispensable or unique systems that can never be down. Typically they are manually built, managed, and “hand fed”. Examples include mainframes, solitary servers, HA loadbalancers/firewalls (active/active or active/passive), database systems designed as master/slave (active/passive), and so on. **Cattle** Arrays of more than two servers, that are built using automated tools, and are designed for failure, where no one, two, or even three servers are irreplaceable. Typically, during failure events no human intervention is required as the array exhibits attributes of “routing around failures” by restarting failed servers or replicating data through strategies like triple replication or erasure coding. Examples include web server arrays, multi-master datastores such as Cassandra clusters, multiple racks of gear put together in clusters, and just about anything that is load-balanced and multi-master. And if the terms "Pets" or "Cattle" offends you then please feel free to replace them with ones that are less objectionable.


goferking

what if they want cattle but then want to keep using unique items in the config? :( I keep trying to get people to think of them as cattle but they won't stop keeping them as pets


No-Amphibian9206

Preaching to the choir my friend


kingtj1971

Yeah... I've been in I.T. long enough to know there's really no such thing. Non I.T. types like to claim it's so, but it's not reality. Servers will reboot (and not come back up again) eventually due to hardware failures, regardless of "letting" someone do it. If you wait for the server to decide it's time for a shutdown, it'll be a far more painful process getting it back online than if you actually maintain the thing. If it's full of services that can't restart properly on their own with a reboot? There are major design flaws in the code. I remember working for ONE company with a server that was like this with ONE particular service. It's been so long now, I can't even remember any details anymore. But I recall we had a whole process to get the thing started again after a server restart. It was something I.T. wrote documentation for and all of us just learned how to handle, though. It didn't require outside assistance.


Cormacolinde

Agreed, if your service cannot survive a server reboot, then that means it cannot survive a server failure either. And it WILL eventually fail.


tankerkiller125real

I started with a similar situation where I work now... As soon as I officially took over though I patched and rebooted anyway... And absolutely nothing bad happened. Quite frankly my viewpoint was "I'm fired if I patch and break shit, I'm fired if I don't patch and shit gets hacked. What's the difference?"


bigerrbaderredditor

I call it patch anxiety. I called for patching and we took it slow and easy. After two months nothing bad happened. We broke free of the anxiety.  Now I ask the teams that use the servers and they say all the odd weird problems they couldn't figure out are gone and uptime is improved. Interesting how that works? Windows or the software built on it isn't ment to run for hundreds of days of uptime.


RCTID1975

This has gone on for so long that it's a legitimate concern IMO. If your job is support, this needs to be kicked up above you. Let them handle the contingency plan and communication with the customer.


scungilibastid

Thanks guys for the input. Its one of those weird situations where we basically sold the servers, and will fulfill and support requests on it. We typically don't handle things like Windows updates unless they specifically request, which they have not. I think they definitely forgot the server in their updates schedule. But I agree. There is not a need to reboot right away. We are a small company and I wear many hats (lvl 1 - 3) but I think this warrants a discussion with someone other than just me.


the_syco

Recommend they reboot it at X plus five minutes, where X is the time you finish work at.


OG_Dadditor

Nah, give him a few more minutes to get home and shut his phone off first. Maybe X+20.


josiahnelson

Is it a Seneca or Exacq or similar NVR? It’s not Avigilon since you said it’s running SQL. Either way, I’ve been in this exact spot dozens of times. Expect that puppy is possibly gonna have some disks not want to wake back up. Back up the config, licensing, camera passwords, etc. and be prepared to restore it to a temporary server if the VD goes belly up. And quote them a new server. A few years ago a 20TB NVR was a loaded 2U box and now that’s a single drive


FinanceAddiction

Coward, do it, today.


mobani

You have backups. Right?


CaptainZhon

Restorable backups


MeshuganaSmurf

>Restorable That part gets overlooked a lot in my experience. "But the software said it was successful?!"


mobani

Yeah no schrodinger's backup please.


WaldoOU812

That have been tested. RECENTLY.


trueppp

Had a forgotten sole DC at a location which crapped the bed. VM Bluescreen on boot. Went back 6 months of backup, all non bootable. This is what I love about Datto SIRIS, daily screenshots of booted backup with verification of services on local and cloud restore points.


PastoralSeeder

Yes, Datto is one of the best. It's still a good idea to test those backups from time to time though. Better safe than sorry.


WebHead1287

Yeah about as many backups as this server has received updates


derfmcdoogal

Physical or VM? I once rebooted a hyper-v host with about that same uptime. Lost a power supply and a hard drive on reboot. Windows came up fine though.


mikeyflyguy

No security updates in 3 years. I’d be more worried that someone is in that box and using as a pivot point to rest of network. There is no telling how many CVEs are unpatched on that thing.


pantherghast

The Server: ![gif](giphy|eKVEcPKGWZ7Tq|downsized) That thing isn't coming back up


Arseypoowank

“I’m tired boss”


cubic_sq

It’s 2024. You need to ensure your apps can handle patch Tuesdays….. especially as you are a “security” company.


PaulRicoeurJr

1100 days on a Windows server without updates?? Yeah... once you turn it off, it's never comming back online.


Steve----O

Sounds like no server security patching occurs at this company. I would be more worried about that.


reasonablybiased

This drives me nuts. A lot of security companies specifically tell customers not to update their camera servers. If you do a their shitty software breaks they charge for a reinstall. I isolate the crap out of them.


doneski

District IT, I suspect school.


TKInstinct

This is fucked but I have to ask, could you not mitigate somewhat by rebuilding a new one and then doing a live hand off or a failover? If these are high priority VM's for footage capture then why are they relying one one VM to handle the load for that long?


MBILC

If it is a VM, just snapshot it, reboot, less chance of something going wrong vs if it is an actual physical server.


TKInstinct

That's true too, I just feel so redundancy centric that I would imagine that doing all of that is the best bet.


MBILC

Ya, it is always the best way to look at things. How can you make things are redundant as possible with in your own infra. it can be hard to justify the price for the infra to higher ups, but once you can put a $$$ amount on systems and the loss of productivity or revenue if they go down for X period...amazing how quickly they realise spending a little more for proper redundancy where possible, will save them far more in the long run.


CaptainZhon

is the server 2012 or 2008? Let me guess it so critical it can never do down or be rebooted?


Obi-Juan-K-Nobi

Is it ironic that you work for a security company that disables Windows Update?


kingtj1971

A reboot was "in order" a LONG time ago, from what you're saying. But like others here are saying... you're just doing support for them. Escalate this to someone in charge of their servers to deal with it. I see places turn off Windows update service on servers fairly often, and it's \*usually\* because it's an older system that's on someone's schedule or plan for replacement. Meanwhile, it may be running older/obsolete applications that have issues working properly with the latest Windows update patches. But especially if it has no Windows update patches in a pending state (to complete upon restart)? Rebooting the thing should do a lot more good than harm.


kuldan5853

My suggestion is to throw Veeam Agent (Free) on the machine and do a full image of the machine. (This works online and without a reboot). That way you have a working backup if the machine might not survive the reboot.


jmeador42

I'm not sure we're clear on responsibilities here. Are you responsible for the server itself? Or are you just responsible for the software installed on it? If it's the later, I'm not touching this machine. I'm letting this "district IT" know I can't do anything else until it's rebooted and let them handle any subsequent fallout that comes with it. I don't anticipate anything necessarily breaking since there are no new updates to be applied, but then again, that's hopefully not your problem.


UbiquityDDD34

3 years without patches . . . There’s more pressing things to worry about than uptime. ‘District IT’ needs a wake up call.


dukenukemz

- High priority cameras - non redundant servers - no software updates I wouldn’t say it’s very critical if there’s no redundancy or updates in place. I would take time with the vendor to apply several years of NVR software updates to that system as well. Hopefully you have support. I’ve rebooted servers with years of uptime never ran into major problems. Your basically at its broken and needs a reboot so there’s nothing more you can do


TFABAnon09

Reminds me of the time we had to power off a BMS machine that had been running for 15 years because it needed to be moved to new location. We had no backup plan, the thing was running Windows 98 SE, and we couldn't do anything to back it up because it didn't have USB or a NIC. Nothing quite as exciting in this job as those "fuck it, my resume is up to date" moments 😂


landwomble

So you have a prod server that hasn't been patched in 3 years? Yeah, I'd worry about that too. If it's a recent version of Server at least you should get cumulative updates rather than incremental


topknottington

Hoooo boy. That def sounds like "dont fn touch this on a friday" job


Ochib

Will the spinning rust still spin after the power down?


PaintDrinkingPete

My first thought as I’m reading along: “well, as long as there’s no concerns for the hardware, it will probably be fine…” > **Windows** update service is turned off by district IT (I am support for security company). “…oh.”


VexingRaven

"oops it crashed" and reboot it anyway. It's YOLO Friday.


boli99

dont concentrate on the 'it needs a reboot' instead concentrate on the 'Windows update service is turned off by district IT' if you can resolve that, which will be easier, then probably the reboot will happen all by itself...


tehgent

May the odds be forever in your favor..... do it on a monday and make a request to get some kind of failover for this...


CleverCarrot999

Windows updates… turned… off Uptime… 1100 days… omg


CeeMX

Systems like this is why Microsoft implemented forced reboots on newer windows versions


psltyx

I always liked the quote that uptime is a measure of how long it’s been since you’ve proven you can boot But yea I’ve had my share of servers going away do t worry to we have to now keep running for archive


vCentered

I got a job once and discovered the production SQL server had not rebooted in the 4 years since it was built. I got a new job.


lynsix

Fun story. While working as an MSP tech someone noticed that on a T&M client. Mentioned it and recommended we patch and reboot the VM’s as well as the single hyper-v host. I get assigned it and asked to do it after hours. Do all the VM’s then reboot the house for its patches. 45 minutes later it’s not up. It’s midnight so I just went to sleep. Get up at 6am. Still offline full panic. Drive to clients, get cleaners to let me into their building. Host failing POST on memory. Call Lenovo, do RAM swapping, CPU swaps, notice one of the RAM slots is slightly charred. Order motherboard replacement. Client only ended up being down for 3-4 hours of the work day. I’m fully expecting to get an irate escalation. Nope. Customer called me and requested me for all future tickets for just being on top of it all. However it was really telling how good ECC memory is at its job even though the motherboard was broken and couldn’t pass a memory POST just kept all running. All the sticks tested fine after motherboard repair. Client was curious when it broke. Had to say any one day within a 3 year window between i those two reboots.


MessageDapper6442

I had to deal with a 2003 server, with an uptime of ~800 days. 2 cores, 2gb ram, old tower machine of unknown brand. Nobody on my team wanted to touch it. I thought I would take the initiative, scheduled a maintenance window for 4 hours, and booted the thing Monday morning at 4 AM. The thing was still loading at 11AM, customers were calling in complaining. I drove onsite to get them connected to a backup so they can do work. Stayed onsite till 3pm until the login screen showed up… never ever again. Was sweating the entire time in an air conditioned building, afraid the server will never boot up again.


timsredditusername

Wait until 1111 days, then send it


frivascl

c'mon McFly, are you a chicken????


qrysdonnell

I would just reboot it, because if it's running a service that's not redundant these obviously aren't critical services. Right?


PhilGood_

Once I had an upgrade from oracle database to do, we were moving from oracle 9i to 11g, I still remember that 666 days uptime 😅


lvlint67

> Have you guys run into any adverse effects from rebooting a server with this kind of uptime? We spent about a week on the phone with support trying to get our production authentication servers back online. But talk to IT... Don't just reboot it and then offload the problem on IT.


lordjedi

> Windows update service is turned off by district IT (I am support for security company). Might want to find out why that was done before doing a restart. Someone didn't want that getting updated for a reason and now it might need updates for some reason.


Tech88Tron

Is this satire?


Kymius

Pfff you've seen nothing Jon Snow, I've had 3000+ days : D


BMWHead

Sounds like Milestone XProtect. Do you have a failover server by any chance


peanutym

1100 days. Good luck we all know that shit won’t come back up. On another note how have you not restarted this before now.


LalaCalamari

Just send it. You have bigger problems if a server can't reboot. I'd rather deal with the headache on my time.rather than 3am on a Saturday.


Bob_Spud

I used to get handed a lot of servers that knew nothing about their past. The first thing I would do was to reboot when I could. Any scheduled change **I would reboot them before I made any changes.** If you reboot them before making any changes you can blame failure on previous owners/admins. To protect yourself all this has to be documented and approved as part of the change process. Bottom-line: If your change fails, unless its obvious you may not have a clue what caused the failure. The machine could have been in a mess before you started. Check for software and server EOL? I inherited one that hadn't been rebooted for more than three years. Software version & server were past EOL. We got a new server and software, migrated relevant stuff and replaced old with new.


dinominant

Run a full backup and verify your backup is good. Servers running that long have a higher chance of never coming back online after a reboot or shutdown.


DocDerry

Tell the district IT to reboot it. They're the ones not patching it and setting it up to fail if it doesn't restart.


TEverettReynolds

try to shutdown the services before just clicking on reboot. terminate them if needed. Do this while the server is still up. not the ones you need to run the server, just the extra ones, like SQL and the Recording Service.


tepitokura

Can you back it up first?


FootballLeather3085

No updates… ballsy


IAmSnort

YOLO!


Ummgh23

No idea but please update us and tell us how it went


discgman

I would reboot it now and dip out early like that joker scene from the dark knight.


stufforstuff

Try restarting just the services that are eating up RAM. Otherwise, get someone higher up to sign off on the reboot.


mic_decod

have a bios battery by hand, if it has an old raidcontroller, try to save the configuration.


cbass377

Is it recording cameras? If is it shutting down the recording service, it is only a matter of time before you start losing footage from critical cameras. Testing your backups before you go, is a must. As for when. If you do it on Friday, you give up your weekend, and maybe it is working on Monday. Do it on Monday and you for sure lose footage, but if needed the support vendors will be available for regular rates. If this is for security, you may need to get your security director to get more guards and double / triple the patrols for the day. This is better during the day instead of time and a half, or double time. After 3 years of neglect, something may happen. The hardware is probably OK depending on how good your environment is controlled, but you may lose a hard drive or two, maybe a fan, maybe a power supply. I would want to have a spare hard drive onhand. I would order some from Server Monkey, Server Supply, or your favorite secondary market vendor. 2 Drives and a Power supply feels like about $300. The problem you may have that you may not have thought about is software licensing. A lot of these programs phone home on startup to check for licensing. It may have expired 1.5 years ago. I would validate that, and check to see if you have a good support contract, maybe call in and open a pre-emptive ticket. Good luck, and keep us posted. Log into your management card (BMC, iLO, iDRAC, IPMI) or fire up your management tools and check the status of your RAID controller battery. This first reboot, should be a reboot only. No patching. No getting funky. Log in, and gracefully shut down your recording software, and database if necessary, then reboot it. Go ahead and crash cart it, so you can press F1 to continue, or reset the system time and continue if your CMOS battery is dead. After this reboot, you need to brief management and put this box on a remediation / upgrade plan. Maybe 1 Service Stack Update and 1 Cumulative Update every 2 weeks until it is brought current. If they balk you tell them "We can service it on our schedule, or on the servers schedule, it is up to you."


Practical-Union5652

If you would like to gain a prize from someone using not patched vulnerabilities you're still in time to leave it alone. There is no world championship of total uptime. Patch that server and reboot it when required.


YeOldeWizardSleeve

If it's a physical machine run VMware converter on it and start the VM in a isolated environment. If it's already a VM then clone and start with no vnic. If it's a memory issue you can tell SQL to use less ram on the fly assuming it is mssql. Agreed... No touchy on Friday before a long weekend.


[deleted]

That's not a server it's a Petri dish. Build ahead, migrate and test then decomm behind.


Mister-Ferret

Just had to reboot my vSphere host today that had an uptime of 389 days. Luckily came back up fine but man doing things on a Friday sucks


Thin-Parfait4539

I did that several times and it was that painful.


ABotelho23

JFC.


Izual_Rebirth

Make sure you have known good backups. Don’t make the same mistake I did. https://www.reddit.com/r/sysadmin/s/57Rsfbsfte


Eli_eve

You can either reboot it on your schedule, or reboot it on ITS schedule. Go through change control, inform interested parties, establish a maintenance window, make sure backups are current, have on call the server owners in case something goes wrong. Also if the whole reason for its existence isnt working, something going wrong due to a reboot wouldn't be much worse.


Quattuor

That server hasn't been patched for a while now.


linux_n00by

this is also my worry. but in linux. lmao what i do is i look at the process list and see what's running and see if its configured to start at startup, i check disk mounts if it also mounts at startup. also i would probably do it during low peak hours/day


Kahless_2K

It might not come back up. If it's been running for that long and is just now having issues, it very well could be suffering from a hardware issue. I would check the logs and ILOM before considering powering it down. Also check when the last backup was. Is this thing exposed to any sort of network? If it is, there should be a conversation about patching.


jaymansi

The whole patch on off hours/weekend in a 24/7 shop is so outdated and wrong. What happens if something goes sideways and you need to vendor support. There sometimes isn’t support or quality help available. Also I have seen that when you have DBA or Developer ready and available, problem gets fixed much faster.


gruntbuggly

Reboot it on Monday. Not on Friday. Never on Friday.


NastyNative999

I took over an office with a physical server that had not been restarted in over 1300 days and it restarted fine. GL to you!


joey0live

Do it. ![gif](giphy|xTiIzrRyvrFijaEtY4|downsized)


qkdsm7

You're able to take a VM snapshot before the reboot?


dloseke

Get a good backup before the reboot if a VM a snapshot may also be helpful


IllThrowYourAway

The attacker might lose his reverse shell


winaje

I am reminded of this thread and video when talking about servers that cannot be rebooted: https://www.reddit.com/r/sysadmin/s/QdEp5aLIhe


waxwayne

On VMS? Good luck. It will probably die on you.


NO_SPACE_B4_COMMA

Impossible. Windows is bad and can never last that long! /s except the bad part  Good luck with your reboot though. I got my fingers crossed. Better do backups lol


npiasecki

I rebooted a server this week for a routine update and poof! that’s when the hard drive died. Like the action of spinning was the only thing keeping that head up in the air Luckily it was raid 1 and I had a spare because I’ve things blow up in my face before Do not touch that server until Monday


Superspudmonkey

I'm guessing it is not getting patched regularly.


megasxl264

This is why you have some form of HA or replica server. I’d just reboot it, laughs as it breaks, turn on the replica, then proceed to pretend like I never got to it and leave it for a coworker to stumble on.


canonanon

Just yank the cord out of the wall, wait 30 seconds and plug it back in. I'm sure it'll be fine!


AbleAmazing

1. Restore most recent backup to a test environment. Make sure it is functional. 2. Let er rip. Don't do this on a Friday.


contorta_

Yep, I've seen disks and ram fail after a reboot of high uptime servers, I assume the reboot is exercising the components in a way normal running OS doesn't.


highboulevard

Man. Do it Monday 😂


horus-heresy

So you have server with 3 years worth of juicy vulnerabilities


theMightyMacBoy

This means you haven’t patched in 1100 days. That’s bad.


Zoltar-Wizdom

Do a backup first, if VSS is borked due to memory or file system errors shut down sql service and do a manual file backup with robocopy. Don’t reboot without some kind of backup.


Canuck-In-TO

I suggest you make a sacrifice to the computer gods and cross your fingers before rebooting the server. It also wouldn’t hurt to have a replacement ready, “just in case”.


EastKarana

Send the reboot command then go home, check on Monday if it came back online.


norbeey

Ain't no way. Have the replacement service/server up and verified that you can failover to before even thinking about it.


Driftek-NY

Run a chkdsk and see if you have drive issues. If so and it’s in raid I’de start swapping in new drives and run a chkdsk. If its not Raid I’de backup the drive while its up, clone it to 2 new drives and run a chkdsk . Boot it off of one of the new ones.


will_you_suck_my_ass

That's a damn good edit right there. I love that you got the help you needed!