Designing distributed systems means considering failure scenarios—both likely and less so. Will the network let you down? (Almost assuredly.) Will some portion of your IaaS misbehave? (Have you met computers?) We build in graceful degradation for much of our automation but often neglect the (just-as-essential) human interactions.
The classic “hard problems” of cache invalidation and naming things revolve around our understanding of what's correct and true and our agreements with one another on scope and relevance. Communication is essential for making context-dependent decisions. Whether we're attempting to determine the current state of reality or distinguish logical boundaries, democratized observability is key to answering our questions.
As the fractal complexity of our distributed systems grows, we need to mindfully choose practices that work with our tooling. You can't buy a silver bullet, but you can forge one from the collaborative efforts of your team.
Bridget is a Principal Cloud Developer Advocate at Microsoft. Her CS degree emphasis was in theory, but she now deals with the concrete (if “cloud” can be considered tangible). After 15 years as an operations engineer, she traded being on call for being on a plane. A frequent speaker and program committee member for tech conferences, she leads the devopsdays organization globally and the devops community at home in Minneapolis. She podcasts with Arrested DevOps, blogs at bridgetkromhout.com, and is active in a Twitterverse near you.