Lots and lots of users gathered to bring our servers to their knees in the first Twingly Meltdown. Our tech team were quite sure that a group of regular users couldn’t do any damage to our servers. So towards the end of the Meltdown Hour I shared a small script with everyone on the Twingly Skype Chat that made it possible for all users to send thousands of simultaneous requests.
Before people started using the script, execution went flawless. Our database servers who are protected by caching did hardly show any load at all. On the web fronts the traffic produced visible CPU load. Some users in faraway countries experienced periods of slow searches. Overall, search was fast.
When the more aggressive load script was deployed by lots of users, the tune turned a little more hostile in the tech team chat. Since the script used the undocumented and far less tested json API, it managed to bring out a bug in the code. The web servers are put under much more strain when they have to deal with errors in execution. Technically, this is the difference between performance testing and stress testing. So this actually affected the experience of other users, who at times got an error message while searching.
Big thank you to all participants, hope you had a lot of fun trying to crash Twingly. It helped a lot to see how the system behaves under strain, real world traffic is really hard to simulate properly.
Top image is the web log stats from the day of the event. Clearly visible is the fact that by the end of the event, a lot less users managed to send a lot more searches.
One thought on “Twingly Meltdown Report”