Sunday, January 28, 2007

Upgrading Plesk, what a nightmare!

I started upgrading 3 servers at 8pm tonight (Saturday). It's now 12:27am (Sunday). No server is up yet.

First, 1 server is hung in the reimaging process. I probably could have avoided the reimage altogether by using the stock plesk key and clicking retrieve key but I realized that too late. As it is I'm stuck waiting for some stupid 4 hour window to lapse before 1&1 will escalate the problem to a tech that can actually fix the problem. The person I talked to didn't even know where you went in the control panel to re-image the server. Oh boy!

So the other servers reimaged in record time and then I loaded my trusty CentOS load on there and let my cool little script load plesk 8.0.1. Everything is fine until.... I try to restore that backup that I made on the 7.5.4 version of the server. Ugggggg! It seems you can't restore a 7.5.4 backup onto a 8.0.1 server. Strange because I would have sworn that I had done that on another system. Apparently not!

So now I'm sitting here waiting for that one server to get reimaged, the other 2 servers are reloading a fresh install of CentOS 4.4 and then I'll install Plesk 7.5.4 manually using the autoinstall app. Then I should be able to restore the 7.5.4 backup. Then I plan to make a plesk 8.0 backup using their tool that I can download. Afterwards I'll update the servers in place, since it'll just be a plesk update and not an OS update.

So it's now 1:20am. I've got plesk 7.5.4 installed on the 2 servers that reimaged earlier. Guess what? They won't retrieve 7.5.4 keys. Ugggg!!!! OK so 2 out of the 3 servers have backups of the /etc directory before all this mess started. I'll pull the keys out of there. Hopefully I can use the same key on 2 servers as longs as it's not at the same time. I have 8.0 keys for all 3 servers. The only really big down side to this step? Both of the keys are stored in really large backup sets that have to be ftp'd across from the backup server. Uggg!

Still waiting for 2am when hopefully(fingers crossed) that 1&1 oxygen wasting being will escalate to a real tech. We'll see.

So 2am has come and gone. All I could get out of the 1&1 folks was that it had been escalated to the server techs and I'm supposed to wait around for some magical email. Hmmm, I wonder what they would do if my email was on the server that was down????

Progress is definitely being made on the other 2 servers though. One is almost done now. I made an 8.0 compatible backup and stored it on the FTP server. Right now I'm just waiting on a few additional Plesk 8.0.1 updates to finish installing. Then I need to add my favorite AtomicRocketTurtle packages and that server will be done.... SWEET! Just got the report that server is up to date! Now I can turn the email back on and forget about it.

Well I found a few more things I could do on that server that I thought I had finished. Now I'm just waiting for a nagios alert to clear. Presto! It's clear. Now I can mark off one server.

So what's the status on the downed server? Still nothing from 1&1 and the control panel is still telling me the server is being reimaged. I'll call them at 6am when it's been down for 8 hours. Maybe I can get some attention then. I'm starting to think the techs are asleep in the data center somewhere.

And the other server that made it through the reimage? It finally completed it's restore of the 7.5.4 backup. Now I'm running the yum update psa to upgrade it to 8.0.1. Crossing my fingers...

Yes, I'm still at it. 2 servers down and 1 to go. Here's the kicker to this last server. I finally got someone at 1&1 to admit they don't have server techs in the data center on the weekends. He started talking about being short handed and stuff, and then I asked him point blank. Did they staff techs on the weekend. Aaaaaaah, no. Then he tried to convince me that they weren't needed because this was all automated. Right......

OK, so the plan is to set up a vmware on another network and snag the plesk key. That should be good for a few weeks at least. That's more than enough time to get this stuff straightened out. Once I have a stable server running again and I do a migration one evening and be done with it!

Oh, and here's a valuable note on Plesk 8.0 on CentOS. It seems with the newer versions of Plesk you now have the ability to turn 'safe_mode' on and off at the domain level. That's fine except during the upgrade from 7.5.4 to 8.0.1 the system turned 'safe_mode' on for all the domains. I probably chased my tail for 2 hours working on Joomla sites. It kept telling me it couldn't find the templates. Google searches kept talking about corrupted files and permissions. Then finally I found a blog post comment that mentioned turning it off. Presto!

So things are going as planned, sort of. For some reason the plesk 7.5.4 installation on CentOS 4.4 fought me tooth and nail the whole way. I think I got sloppy in the vm and added the Atomic repo too early. None the less I now have a updated 7.5.4 plesk server awaiting a dump file. Oh that. Yes I'm waiting on the download of the dump file from the 1&1 servers. I'm tapping out the circuit at 5M on the VM side. At least I know that IP Cop firewall protecting it isn't a bottleneck. 33 minutes to go according to the scp progress bar.

I've kind of sort of given up on 1&1 for today. Hopefully they'll get the server fixed by tomorrow, in the mean time I think I've got a slick way to get this customer back up thanks to VMware.

Hmmm, 30 minutes that's not enough time to get some rest. Maybe I'll go fumble around with VTBook card. I've got it working with FC4 but I'd like to have it working with FC6 because that's what is on my main laptop.

Hopefully I can take a break. The third server is up. Web pages are being served and email is flowing in. The spam filtering needs some tweaking and the postini mailmgr wedge needs to be put in but those things can wait as I have to wait for the full system backups to download. That'll be another hour or so but it can wait.

Wonder when someone from 1&1 is going to look at the server?

