With technology integration into the eyewear vertical, and keeping a deep focus on consumer happiness, Lenskart has grown more than 100% Year on Year in the last 2 years to become the largest optical business of India today.
Lenskart not only serves fashion products to customers but also power eyeglasses and contact lenses which are custom made for every customer. This makes it a very niche product as well as a company serving essentials commodities for its customers. Delivery and ease of ordering from online and offline stores are what we take very seriously. Any downtime could cost not only business but also could impact the delivery of his/her eyeglasses or contact lenses. As an engineering team that is always accepting challenges, we started looking at chaos engineering very seriously once we also faced some major downtimes in our platform due to system failure or DDOS attacks. It didn't take us time to realize that just running shop on the cloud was not good enough we have to be prepared for chaotic situations, not just that, we also have to simulate it and find weak points in our architecture. So, we first started with manual and scripted chaos, but the problem was that they were hard to reproduce and involved a lot of effort to plan and execute them. Then we started exploring if we could have a framework that could help us maintain our chaos experiments in form of templates. After looking at a couple of tools we narrowed down on Litmus.
We started using Litmus from our DevOps Kubernetes cluster, where we first started to test stateful services like Redis cluster or elastic search. We then gradually started testing our own application services using Litmus. We have gradually moved from our integration environments to pre-production. We are regularly maturing our hypothesis and writing relevant experiments for these services. Currently, we haven't integrated these experiments into our CI/CD pipelines but in the future, we have plans to run these experiments with every release.
We are quite new to the Litmus framework but what we have really gained from Litmus is templating of our chaos experiments and able to maintain them in our CVS. This has really helped us with the reproducibility of the experiments. It has opened new opportunities for us where we can write our own custom experiments which might be very specific targetting our in-house and public services. We have started ranking the experiments and adding them to our experiment suite so that we could now measure the resiliency of our services. This would also help us in the future to be more targeted in our resiliency journey.
Litmus is highly extensible and integrates with other tools to enable the creation of custom experiments. Kubernetes developers & SREs use Litmus to manage chaos in a declarative manner and find weaknesses in their applications and infrastructure.
Get started with Litmus