When redundancy isn't
Sep. 20th, 2018 12:33 am![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
I'm reminded of a joking phrase used in WAN operations: backhoe fade
That's when you lose your connection because some idiot dug up the cable with a backhoe.
A rather infamous incident shut down the Internet in New England back in the late 70s.
Seems that while the customer had specified separate routing for the pair of T-1(?) lines that carried the Internet, the provider had routed the connections via separate cables... in the *same* trench.
Needless to say the customer had words with the provider. And the provider revised their rules so the "separate routing" meant different cables *routed* differently so that one accident couldn't take both out...
A manufacturing place I used to work at got separate power feeds from two different power companies because they had processes that didn't take well to sudden power loss.
One power line came in from the north, one from the south. Only single point of failure was the company's substation that they both connected to.
And they had a *large* room full of batteries to enable shutting down those critical processes gracefully.
Sadly, we still lost power a few times in the dozen years I worked there.
That's when you lose your connection because some idiot dug up the cable with a backhoe.
A rather infamous incident shut down the Internet in New England back in the late 70s.
Seems that while the customer had specified separate routing for the pair of T-1(?) lines that carried the Internet, the provider had routed the connections via separate cables... in the *same* trench.
Needless to say the customer had words with the provider. And the provider revised their rules so the "separate routing" meant different cables *routed* differently so that one accident couldn't take both out...
A manufacturing place I used to work at got separate power feeds from two different power companies because they had processes that didn't take well to sudden power loss.
One power line came in from the north, one from the south. Only single point of failure was the company's substation that they both connected to.
And they had a *large* room full of batteries to enable shutting down those critical processes gracefully.
Sadly, we still lost power a few times in the dozen years I worked there.
no subject
Date: 2018-09-20 10:52 am (UTC)Although, when you think about it...the same reasoning could apply to humanity in general. We need an off-planet back-up.
no subject
Date: 2018-09-20 02:42 pm (UTC)That way they had the current backup in the other building.
And if something clobbered both buildings, then the backups weren't needed anyway because we'd have been out of business.
Common failure mode for backups is having just *one* set of tapes/disks/whatever.
If something going wrong *during* a backup, you've just hosed your only backup.
Also a good idea to make sure the backups are *readable* and actually contain the data they are supposed to.
Lots of places have cheerfully plugged in the backup, only to find that either it isn't readable or the process was mis-configured and the critical data was never written to the tapes.
recommendation from *way* back (late 70s) is to have a backup for every day of the week. They can be "incremental" backups.
On the last working day of the week, you do a complete backup, and archive it. You also shuffle the tapes by one day and open a new one which will become the "first day of the week" tape. With the former "first day" becoming second day, etc.
This eliminates worn out media as a problem.
You also make "end of month", end of quarter and end of year backups (extra tapes in this case). And you archive them. This way you can go back and figure out the way things were in the past with reasonable granularity.
A backup for the planet would really have to be more like the "disaster recovery" centers some big corporations have. Not just "stuff" at a different site, but something set up to be able to run things until you can rebuild.
That'll require a *very* self-sufficient colony.
Recommended is
no subject
Date: 2018-09-21 12:36 pm (UTC)For some reason I know several people who do computer stuff for businesses. One told the tale of how he got called in (new client who didn't have a regular IT) for a server emergency and had to replace several failed components. When he tried to restore from backup he discovered there was no tape in the backup drive.
no subject
Date: 2018-09-21 05:37 pm (UTC)I was good friends with the manager and the "tech support" person at the Radio Shack Computer Center back in the 80s.
I was talking with the tech support person one day and she told me about one long term problem she finally solved with this one customer.
TRS-DOS had this built-in capability to limit backup copies of programs. Basically as part of the file properties, there was a byte that got checked when you did a "backup" (equivalent of MS-DOS's DISKCOPY) of a floppy.
If the byte was 255 (the default) the file was "unlimited". If the value was 1-254, it got decremented by one, and the copy had it set to 0. If it was zero, the file wasn't copied.
So, on to the problem.
This one customer kept having his backups of the business program he was using go bad.
So he'd have to bring in the master disk and she'd have to reset the number of backups.
She'd talked him thru the backup process over the phone many times and was about ready to pull her hair out.
She'd been on the phone with him yet another time, going thru things step by step. Having him go into excruciating detail.
cust: "It says backup complete. I take the master out of drive 0 and put it back in the sleeve in the binder."
tech: "Ok, sounds good."
cust: "I take the backup out of drive 1 . put it in the sleeve and stick it up on the file cabinet"
tech: "Wait a sec. 'Stick it up on the file cabinet?"
cust: "Yeah, I take it out of the drive, put it in the sleeve and use this magnet to stick it to the side of the file cabinet..."
tech (with *great* restraint): "Ok, I think we've found the problem.."
no subject
Date: 2018-09-20 01:17 pm (UTC)I worked in the Division of Planning in the Kentucky Transportation Cabinet, Department of Highways. Among other important tasks, the highway maps were made there. That meant multiple, expensive Intergraph graphics machines.
Everyone was glad when we moved out of a ratty old building (originally constructed in the Thirties (!) as a garage (!)) into a brand new one. As specified in the contract, in case of power failure we had a building-wide battery backup designed to keep the power stable until the natural gas fueled generator came up to speed.
The first time the power went off the battery backup came on with a bang. Actually, several of them. There were burn spots on the carpet where single-use surge protectors overloaded.
The problem was that the building contract was awarded to the lowest bidder. Afterwards, there were no funds to correct the deficiencies. We couldn't even get the elevators working right until the Highway Commissioner came over for a meeting and got stuck.
no subject
Date: 2018-09-20 02:56 pm (UTC)At that company I worked for, the plumbing contractor had to do that. Seems they'd ignored the *explicit* instruction that they were to use some special stuff (teflon tape? something else?) to seal the joints in the DI water lines.
Instead they used regular "pipe dope" or some such. So the hyper pure water was contaminated and messed stuff up.
Even better, the lines run *thru* the concrete floor slabs. So to fix them they had to snake "liners" thru them.
Many years later they had a *different* problem with the DI lines. Since they couldn't be chlorinated, they developed bacterial colonies. Which caused stains on the silicon wafers (they went *nuts* trying to figure out what was causing the stains until somebody noticed that in cleaning the hoses for the lapping machines this "sludge" came out (bacterial mats).
That led to having to shut down stuff one day and flush the lines with 20% (or was it 40%?) hydrogen peroxide. Nasty stuff, but since it breaks down to water and oxygen (and converted the organics to things like water, CO2 and nitrogen) it did the job without contaminating things.
I don't know what the long term solution was, but me, I'd have bought an ozonator like some places use for water purification, and treated the DI water with that.
no subject
Date: 2018-09-20 08:40 pm (UTC)Ozone treatment of water is an idea which should have been pursued sooner.
H2O2 breaks down into water, oxygen and *heat*. Great for cleaning out organics and some other materials, with little chance of contamination. Just mind the evolved heat.
Like Ozone, it's pretty neat stuff, but its potential has never been reached, due to certain problems.
Here are some of Dr. John D. Clark's comments on the attempts to use high-test peroxide as a storable oxidizer. "The cleanliness required was not merely surgical - it was levitical. Merely preparing an aluminum tank to hold peroxide was a project, a diverting ceremonial that could take days. Scrubbing, alkaline washes, acid washes, flushing, passivation with dilute peroxide —it went on and on. And even when it was successfully completed, the peroxide would still decompose slowly; not enough to start a runaway chain reaction, but enough to build up an oxygen pressure in a sealed tank, and make packaging impossible. And it is a nerve-wracking experience to put your ear against a propellant tank and hear it go "glub" - long pause - "glub" and so on. After such an experience many people, myself (particularly) included, tended to look dubiously at peroxide and to pass it by on the other side."
no subject
Date: 2018-09-20 09:46 pm (UTC)The Me 163 Komet used peroxide (80-85%!!) & hydrazine. Both would do nasty things to anything organic.
Oh yeah, one of thing things that will act as a catalyst to breakdown peroxide is blood. So be sure not to bleed around the stuff... :-)
Still, those are tame compared to chlorine tri-flouride (ClF3) and FOOF.