We've been closely monitoring our systems throughout the day today (Feb 3rd) and site speed has been better than average for a Tuesday. It's not quite time to declare victory, but we do think that ongoing site performance should remain up to our standards moving forward.
In brief, we experienced excessive slowness while user sessions were being passed across our farm of web servers. (A bit of background: all actions taken by our users are spread out across numerous web servers for scalability and maximum speed. We pass information between those servers behind-the-scenes to maintain a seamless user experience as users bounce from one machine to another.) This slowness led our users to re-attempt their actions and logins more and more often, which created more slowness, and this compounded itself. We believe this happened yesterday because a threshold was crossed where our session-management found its limit and became congested.
So we've done several things to fix this: yesterday afternoon we made improvements to streamline our session-management. This is a step towards fixing the underlying cause of the original speed issue. More changes along those lines will come today. In coming releases we're going to limit the ability to re-try certain actions to avoid multiplying the load of any one user. Lastly we are accelerating our plans to totally replace our session-handling system with something many times faster.
Some customers have asked if we're unprepared to handle the load of a growing customer base. Ironically, yesterday's problems were the result of mistakes made pursuing that goal. We recently expanded our hardware capacity to handle future transactional load, but the plumbing between those new systems couldn't keep up. We've now got a clear idea of how to remedy those flaws.
I hope anyone with questions will contact us: firstname.lastname@example.org