• Don't overlook our Welcome Package, it contains many links to important and helpful information about functions at VHT like posting pictures and sending PMs (private messages), as well as finding the parts you need.

    AD

VHT downtime yesterday (and again) 10/2020

ancientdad

Administrator
Staff member
Joined
May 22, 2020
Total Posts
24,364
Total likes
2,461
Location
Nature Coast, FL
As everyone likely knows already, our site suffered serious downtime yesterday caused by "a major building power outage in our Mitsubishi UPS system that cut power to all of our equipment" (statement by our host on their site). While we never expected to have this length of downtime when we signed up with them, based on the experience of many others (including our own nineteen70moto and his site ApexSpeed.com, which has been with our host for 20 years) this is the very first outage of any more than literally minutes during that 20 year period. So, while it was discouraging yesterday to say the least, I fully expect things to return to their usual exceptional reliability going forward. We are fortunate to not suffer any loss of business or ad revenue as did many others, certainly one of the benefits of being self-funded.

Our apologies for the downtime, and thanks for everyone's patience. Let's get back to the "business" of enjoying our favorite vintage Honda twins. (y)
 
Last edited:
It just goes to show that this place gets addicting ! I was lost yesterday ! Lol !!!
 
My VHT withdrawal symptoms got SO bad that I actually got some wrenching time in before my afternoon nap.......
 
Here's some further information.

"A little over 24 hours ago there was a massive power outage in Orlando, partially due to the passing weather from Hurricane Zeta. Our server host uses a very sophisticated backup power generator when situations like this happen that kicks in with a bridge that transfers power from the grid to the backup generators without allowing for any loss of server uptime. That bridge between power sources blew up taking the whole backup system with it. No data was lost anywhere, but all of the redundancies that were built into this system that has been running for 23 years without incident collapsed into darkness. When this failure happened, motherboards and power supplies were lost across the system. They have been slowly repaired and replaced as servers and power are restored."

This is the power outage reported that began the catastrophic chain of events that led to our downtime. Note that it began early yesterday EDT.

https://downdetector.com/status/duke-energy/news/344639-problems-at-duke-energy/
 
Are stickies and other "important" stuff backed up somehow ? I'd hate to see things like Carburetor Rebuilding and stuff like that vanishing !!!!
 
Oof. Same UPS and generator for 23 years? They might’ve been overdue for this sort of thing [emoji2371]


Ed
1972 Honda CL350
 
Lucky (or unlucky) for me I was way too busy yesterday fighting escalations at work myself to have had any serious VHT withdrawal. But yeah it kinda sucked not finding anything when I went to my favourite forum and found crickets.
 
Oof. Same UPS and generator for 23 years? They might’ve been overdue for this sort of thing [emoji2371]


Ed
1972 Honda CL350

I think that statement was a generality about their total uptime and not necessarily a statement of the age of those components. I can't imagine them having the uptime success they've had over the last 23 years by using the same equipment that long.
 
Are stickies and other "important" stuff backed up somehow ? I'd hate to see things like Carburetor Rebuilding and stuff like that vanishing !!!!

As the creator of some of the stickies, I can assure you I have copies of the ones I did, but know that all the data is backed up by our server service regularly as they can't afford any client losses either.
 
Can't disagree with that part, they didn't say nearly enough yesterday to put anyone at ease and it would have been simple enough to just say it was a power outage that caused the overall failure without divulging anything further.
 
Even a simple "we're still working on it" update regularly would have been more than enough I think to put their customers at ease. I don't blame their customers for getting pissed off and wanting to go elsewhere for service. If you run a business and rely on your web presence for sales and revenue, you would be pissed off as well if you don't hear anything but crickets. How would you even know if FQ was even viable anymore?

It's really unacceptable actually from their customer standpoint. Hopefully they can chalk this up as a lesson learned and plan things better.
 
Yeah, I agree. And hopefully they've also learned a hardware lesson and we won't have to worry about this happening anymore either.
 
A backup web page (offsite) for status would be a good thing to have. That way you still have a place for your customers to go to for updates when your entire domain goes dark. I can't believe that they didn't even plan for this to be honest. This is a rookie mistake and not one you'd expect from an experienced hosting provider.
 
Thanks for keeping us informed Tom. Early AM I was afraid Daniel had found a way to Nuke us!:cry:
 
I realize that, and I question the severity of the storms that resulted from the bands of rain that came our way from Zeta, but there's no denying the Duke Energy power outage reported for the time yesterday morning when we went down. Trust me, when Duke Energy actually reports something negative, it's only because they can't spin their way out of it. We're fortunate to have Withlacoochee River Electric Coop where I live, but Duke is all around us both north and south and I'm glad we don't have to put up with them and their thieving corporate tactics (taking money from the State of Florida years back for a nuke plant they started but never completed, just one of their actions).
 
A backup web page (offsite) for status would be a good thing to have. That way you still have a place for your customers to go to for updates when your entire domain goes dark. I can't believe that they didn't even plan for this to be honest. This is a rookie mistake and not one you'd expect from an experienced hosting provider.

Can't disagree with that either. Unfortunately you or me telling them that probably wouldn't have any affect on their decisions. Considering their overall record across 20+ years, I seriously doubt this will ever happen again since they're bound to learn something from it, whether or not it changes their methods of communicating the situation to their clients better. If they actually lost a fair amount of those clients as a result of yesterday's event, that alone should be the wake-up call to make some changes.
 
No this isn't the first outage nor will it be the last

https://futurequest.net/forums/showthread.php?t=27557


It's the lack of response and updates that's the problem.

Okay, I get it - so the lack of downtime that Doug experienced over 20 years obviously isn't company-wide across all their servers. I'm sure you can find similar deficiencies for most server hosts if you dig around in their service archives just as any major manufacturer has defects in their mass-produced products, some of which you buy and some of which your friends or neighbors or millions of others buy, so all mileage may vary. YES, they should communicate better, totally agree with that again.

However, I'm not jumping ship with all the effort it would take (and my lack of knowledge to comfortably do it) just to start over with someone else after being with FQ all of 5 months.
 
It's pretty obvious that FQ's Crisis Management is in pretty bad shape if there even is one.
While it's an inconvenience for us I can understand the business's that rely on them being upset and possibly reconsidering remaining as a customer. I don't think there's need for concern at this point in time for us but time will tell. If this repeats then definitely it'll be looked into.
 
So they're clearly going through a few more issues yet, the response time got slower about 5 minutes ago and we were offline briefly. After what FQ described that they went through yesterday, it isn't surprising that they're still catching up to all that needs to be addressed.

Have patience.
 
So they're clearly going through a few more issues yet, the response time got slower about 5 minutes ago and we were offline briefly. After what FQ described that they went through yesterday, it isn't surprising that they're still catching up to all that needs to be addressed.

Have patience.

Yup. Watch their actions for a while. All your points concerning hosting providers are valid.

If you ever decide to jump ship I can be of assistance if needed. I’ve moved domains and databases around the world as part of my IT tenure. It’s not difficult at all for those familiar with the process.


Sent from my iPhone using Tapatalk
 
I'm revisiting this situation to make sure everyone's PMs are working.

If you can't PM someone since VHT's server crashed last week, please let me know and I'll look into it further.
 
I get the feeling they're isolated, possibly related to who might have been logged in at the time... like the many who log in with their mobile device and never actually log out. But, it also affected my PMs, nothing else, and the morning it happened I was just logging in when I discovered the server was down. I actually saw the splash screen but then it went away.

Hopefully that's all we have to chase.
 
My CX forum that is owned by VS is experiencing tech difficulties related to storage, wonder if HT is having similar issues?
 
Interesting. I saw HT was briefly down this morning but it was only about 2 minutes, and since it's Tuesday I figured it might have been software updates because that's when they've done them in the past.
 
I've been watching the server load numbers in the Admin section of VHT and noticed that they are higher since the 24 hour outage, so I asked Doug about it this morning. He has his site and a few of his client's sites on the same server as ours, and he said the equipment that is used to bridge between power supplies, which blew up in October during a massive local power outage, has not been replaced yet. They are waiting on new hardware to arrive but are experiencing supply chain issues due to Covid. We had what seemed to be a short outage about a week ago but it turns out it was actually longer (but just overnight for most of us in the US) and it was related to the same issue that they are waiting for the parts to repair.

Just wanted to update the status of our situation and add yet another thing to the laundry list of effects of the pandemic and the crazy year of 2020.
 
I'm sure many are aware we had yet another roughly 3 hours span of downtime that ended about a half hour ago or so. As I understand it from Doug, who knows the people at our host as well as anyone, they are still awating the parts to repair their power bridge due to Covid. Still, this does not excuse the beyond terrible lack of communication to us about any status information we should be regularly apprised of when these things happen.

At the present time we are still committed to FutureQuest as we still have a little over 5 months of hosting prepaid as well as Doug's long-standing personal experience with them, but I'm open to suggestions for another host if someone has a fair amount of experience with whoever they suggest. We're not going to jump without some concrete level of personal experience offered to go on, but my faith in FQ is no longer as strong as it once was if only because of their poor communication.
 
I'm sure many are aware we had yet another roughly 3 hours span of downtime that ended about a half hour ago or so. As I understand it from Doug, who knows the people at our host as well as anyone, they are still awating the parts to repair their power bridge due to Covid. Still, this does not excuse the beyond terrible lack of communication to us about any status information we should be regularly apprised of when these things happen.

At the present time we are still committed to FutureQuest as we still have a little over 5 months of hosting prepaid as well as Doug's long-standing personal experience with them, but I'm open to suggestions for another host if someone has a fair amount of experience with whoever they suggest. We're not going to jump without some concrete level of personal experience offered to go on, but my faith in FQ is no longer as strong as it once was if only because of their poor communication.


Have a look at Rochen Hosting out of the UK. They have servers in the US as well. Many benefits - active multiple backup - you can restore a complete domain to any of multiple points in the past weeks, full redundancy between host locations (outages are not noticeable). Multiple db and hosting packages in the same account (by allowing subdomains and forwarders). Their support staff are available 24/7. I have used them for twenty years - they have not changed hands once during that time period. I currently host four domains on my reseller account (don't use a reseller - make your own account - resellers can disappear - the host is responsible for being available for you).

I have had zero downtime and in rare instances where I have screwed things beyond my talents - their technical staff has saved me within hours.


They are a little pricier than some, but the feature package is justification alone. Have a look at their solutions. I'd be happy to be a part of any move if required. As I intimated a while ago I have moved domains and DB's around the world multiple times, it's not a difficult process - it just requires knowledge. However - note they offer free migration of your current domain, data and files. That is a 24 hour process.

I can discuss their merits by phone any day with a bit of notice.
 
Now that our host is back up, I decided to look at their forum to see if they said anything about yesterday. They did:

"[FQuest Notice] Secondary core router power supply failure

The secondary core routing network was taken offline from a failed power supply and we have switched our network fully back over to the primary. Unfortunately this requires a fair amount of manual work to perform as the fail over routines are engineered for automatic Primary => Secondary failures. There are many technical reasons for this, but mostly to ensure that the secondaries aren't in a marginal state causing major MSTP storms to the inner network. This was one of the problems we saw a few years ago and developed procedures to remove any risk of that happening.

We are still investigating the cause of the secondary network hardware failure, but all indications are pointing to a failed power supply.

During the last power outage, while we are getting everything back online, the primary routing core was responding in a marginal manner where the decision was made to cut out the primary network and drop back to the secondary which appeared to have been solid. Once everything was back online we left the secondary backup network in operation as the primary conduits (manually pinned to be safe). A few days after the event, the problems with the primary network were sorted out and fixed but due to being so deep into the holiday season we elected to hold off on switching everything back around till after the New Year. Historically the backup network has never really had any major problems which is why we left it pinned up and isolated out the primaries to ensure there was no unwanted cross talk until primary => secondary could be fully meshed back together - which in of itself is a disruptive event that needs to have a scheduled maintenance window.

All in all, we really did try to do what was best for the stability of our network after coming back from the chaos caused by the last major power outage. Ergo, we didn't want to rock the boat with networking. Yet it is now apparent there was hidden damage to the power supply that didn't even show up in our monitoring system. We watch power supplies for fluctuations thorough onboard chipset monitoring systems, and this one was all green - until it instantly cut out.

As it stands now, everything is now back on the primary network - which is how it normally runs and we'll be replacing the blown secondary core router. The primary core router was fully checked out while it was offline and we don't believe there are any power supply issues with it.

The Secondary=>Primary meshing work does not need a maintenance window as it isn't a disruptive event. Even if it might think about being disruptive, we completely isolate the secondary network while doing the work.
__________________
--
Terra
sysAdmin
FutureQuest, Inc."

And as you'd expect, some of the comments to that were pretty brutal from those who have been with FQ for a long time (like ApexSpeed, though he was not one of them who commented) and the prevailing tone was exactly what you'd expect, same as last time:

"Well, I'm glad you're back on primary power. I've continued to give FQ the benefit of the doubt regarding these outages, but the total silence on Twitter or Facebook or any other channel for the 3rd or 4th time in the past two months is the last straw. After being with you for the past 11 years, it's time to say goodbye. (My account and those of several of my colleagues had been managed by Artemis, who sadly passed away last year, which is why it looks like I've only been part of the community for a few years.) We'll all be leaving by the end of the year. I hope you work out your stability issues and, more importantly, you learn how to communicate with your customers.
__________________
Joe
Cetacean Research Technology"
----------------------------------------------------------
"Not to sounds like a broken record, but....I will.

WHERE WAS THE *#&*(#&$ COMMUNICATION DURING THIS LATEST ISSUE?
WHERE?
NO, SERIOUSLY!
NOT ON TWITTER. NOT ON FACEBOOK. NOT ON THE NON-FUNCTIONING FUTUREQUEST.
It's a simple question, and one we've all been asking for going on two months now.
When the crap hits the fan, WHERE do we get the information?

I literally JUST now got a notification that FQ just posted to Twitter. Great. Where were you 4 hours ago?!

SERIOUSLY! Unbelievable."
-------------------------------------------------------------
"When I have to tell people, "I don't know and have no way to find out", my reputation is shot. Your fault, my fault or nobody's fault I have to live with the consequencies. After your last snafu we trusted that you understood the importance of communication and that you would put a priority on that. Even a major technical issue should not come out looking like the end of the world. You do not need an eighteen-wheeler to deliver a wheel barrow of information. You did not learn your lesson. Unfortunately, we are learning ours."
-------------------------------------------------------------
"My site went down today 15 minutes before I was to direct over a dozen of my clients to go there for a time-sensitive document. It was absolute dumb luck that I had alternative means to get them this information today. If I didn't I'd have been well and truly screwed.
I've been really patient. I've been really faithful. I've LONG praised FQ to the skies and beyond. But the ongoing communication issues are unconscionable, disgraceful, and bordering on unethical.

Now, for the final time: WHAT IS THE GAMEPLAN: SHORT-TERM, MEDIUM-TERM, LONG-TERM? When your site (and everyone else's) unexpectedly goes out HOW DO WE GET AN UPDATE? Don't tell me the long-term plan first. I don't care. You can tell me that when it happens. The email went out completely, what, two weeks ago? While you're making pans, today happened! If our sites and email go out tomorrow, I need to know--RIGHT NOW--where can I go to get the simple message from FQ, "we know; we're on it." Not 4 hours after the fact. Not 2 hours. Not 20 minutes. I'd say a reasonable timeline: within 5 minutes of FQ being aware of a problem, I should know where to go to get that simple message: "we know; we're on it."

You know, it's almost like I (and about a gazillion others) have mentioned this once or twice (or a million times) since October.

So: WHAT IS THE GAMEPLAN?"
----------------------------------------------------------------------
This is, in part, their response:

"I was called in at the tail end of the event, as I had been working on the SAN all night and was out to get rest for tonight's SAN work. Once I got in and helped to assess the postmortem and got a clearer picture of what was going I was able to get up a post here and also on Twitter. Due to a multitude of hacking attempts against our Twitter account, we have it locked down and currently I'm the only one that can unlock it (tied to my phone and private external email server) until we find a better way to ensure security. This account lock down is only temporary and what was needed to be done at the time, even if sub-optimal. It is also quite high on the priority list to resolve.

In regards to Facebook, we are looking at shutting down our presence there due to our disagreements with privacy concerns."
----------------------------------------------------------------------

So... they are aware, and obviously many others are as well. We're giving serious consideration to the information birdland has provided and we will keep everyone posted.
 
Wow.... Really? Their entire and single mean of communication channel is Twitter and on one single employee's phone? WTF!!?

How in the world is FQ still in business with this kind of nonchalant and rookie practice?
How hard would it be to just fire off an email to your customers (they do have a customer contact list I would hope) just to say "we're working on it."
Pretty lame if you ask me trying to deflect the blame to twitter and facebook.

Bottom line is FQ dropped the ball yet again. Outages happen, everyone knows that but the lack of transparency and communication will kill whatever trust and loyalty your customers have pretty quick IME.

Seems to me that FQ's management team needs to go back to Business 101.
 
Wow.... Really? Their entire and single mean of communication channel is Twitter and on one single employee's phone? WTF!!?

How in the world is FQ still in business with this kind of nonchalant and rookie practice?
How hard would it be to just fire off an email to your customers (they do have a customer contact list I would hope) just to say "we're working on it."
Pretty lame if you ask me trying to deflect the blame to twitter and facebook.
I strongly suspect the FQ is running their website, email, Face Book, Twitter and everything else thru their own servers, servers crash out and instant dead access to all of that. A smart Disaster Plan would be to run that stuff thru a different ISP.
 
I strongly suspect the FQ is running their website, email, Face Book, Twitter and everything else thru their own servers, servers crash out and instant dead access to all of that. A smart Disaster Plan would be to run that stuff thru a different ISP.


EXACTLY!

Apparently they learned nothing from the last major outage or other outages before.
Given all the outages and the complaints about lack of communication on threads over the years, I don't think they're taking this very seriously.
 
Back
Top Bottom