Improving System Resiliency via Chaos Engineering

Submitted by Joseph Woodward

Talk Abstract:

Advances in Cloud technology means systems are becoming increasingly more distributed and complex. Large monoliths are being split up into microservices, we're depending on more remote services and Functions as a Service (FaaS)/Serverless are becoming increasingly common. The very nature of distributed systems mean they're far more prone to failures than similarly-scoped monoliths; this makes predicting or preventing possible failure modes increasingly more difficult.

In this talk we'll look at how we can harness Chaos Engineering, a discipline pioneered by Netflix, to better understand our systems, their failure modes and how we can use this information to improve system overall resiliency and reliability.