![]() ![]() Once the software's out in the wild and interacting with other systems, all that falls away. In devops heaven, test scripts and protocols are developed alongside the actual software - well, maybe. Go back to flight simulators, which for regulatory reasons have to be developed alongside the aircraft they train pilots for. This sort of thinking is a failure of imagination and engineering. Where such models are used, they necessarily involve a lot of abstraction, certainly too much to capture detailed configuration changes in components. This works, except when it doesn't - large, complex, dynamic systems are usually too much all those things to be modelled realistically, if they can be at all. Why isn't this principle adopted in large, complex systems such as those in major service and network outfits? To some extent, they are - the testing and validation processes take place in some sort of system that tries to react to something like the real thing. In aviation, those places are called flight simulators. Don't.) This principle, of making your mistakes in a place that mercilessly demonstrates their consequences without them being consequential, is the gold standard in safety nets. ( El Reg takes no responsibility if you type into the wrong window. Just spin up a new virtual Linux machine and have at it. Nobody who has seen this happen ever forgets - or repeats - it. The error messages that follow as you try to make sense of your apocalypse will be a lesson in digital madness.ĭon't try this at home? You absolutely should. You may not even notice at first, as it will set about its suicide mission with quiet efficiency, but eventually some weird error will appear as what's left of the running software reaches for an essential file that's not there. ![]() Run it with root privileges at the root directory, which is just a sudo and a cd / away, and you have wiped out your entire universe. For those of you whose palms aren’t sweating instinctively at the sight of this little beauty, it means "Remove all files in this directory and all directories beneath it." It's hugely useful when freeing up space or removing old installations, and the system won't let you get rid of things you don't have access privileges to. Take the infamous Unix/Linux command 'rm -r *'. Part of the problem is that our technology will do what we tell it, and the difference between very useful and existentially threatening can be wafer thin. Simple typos and their cousin, Mr Misconfiguration, can unleash chaos to anyone. But these excrescences are corporate and cultural: the typo-induced Azure outage is an industry wide phenomenon that good people perpetrate. Leaning on everyone to move to Windows 11 while nearly half of Windows 10 PCs fail the hardware requirements. ![]() Stuffing its OS with adware disguised as a help system. It is easy to rag on Microsoft for oh so many reasons. A typo triggered unforeseen cascading errors, which continued attempts to restore order dragged on for ten embarrassing hours. You can read the gory details here, they’re just as compelling as an episode of Air Crash Investigation. A careful, intricate, tested and approved rewiring of the Azure DevOps suite got sent out into the world, only for South Brazil to go dark as it started to eat customer instances. Some poor sod in Microsoft just had the mother of all elys. Opinion “ELY (n.) – The first, tiniest inkling that something, somewhere has gone terribly wrong.” One the finest definitions from Douglas Adams and John Lloyd’s The Meaning of Liff, it describes perfectly the start of that nightmare scenario for ops in a big service provider - the rising tide of alerts after that update went live. ![]()
0 Comments
Leave a Reply. |