…and in particular, often goes very badly. not only in life in general, but also in the IT world :)
you probably have dozens of stories to tell, if not hundreds. someone configured the port badly, everything worked until it stopped … and when it stopped, it dragged the whole network behind. big time. whole data center.
why do we make the same mistakes all the time? automation slightly improves the situation, but sometimes it may dramatically speed up things going bad. I have already written about the ‘black box’ method - we do not use it in IT. yes, we met during various ‘fuckup nights’ type of events (kudos) for Bartek Górczyński, who, using his extensive experience of many drama cases in IT, spoke bluntly and openly), but for ‘hard’ infrastructure - there is not much, and if it is, is treated rather, as an example, rather than a ready recipe for what conclusions to draw. Andrzej Gab and Robert Woźny started such a cycle to organize at PLNOG.
and we decided to take this subject more broadly, with special attention focused on the subject of ‘lessons learned’.
i encourage you to submit your dramas. i will definitely share a few that i can. and by the way - maybe another one in front of me, because soon i’m taking my blog from ghost to wordpress;)
Happy New Year!