Thursday, August 14, 2008

Debugging at APT - Part 3: Fiddler

Fiddler has been around for a while, but I was only introduced to it a few months ago. Earlier this year, we were able to budget some time toward improving our software's performance, rather than continuing to add new features as quickly as we had been. Fiddler was especially useful for telling us just how much work our web servers (and our clients' browsers) were doing, and where some of our bottlenecks and improvement opportunities were.

In my first post I mentioned that we have some pretty complex pages throughout our software. Before optimizing them, we were making hundreds of HTTP requests per page (including the now-infamous 85 iframes), and we were applying the same super-strict "no cache" rules to our static files that we applied to our dynamically-generated content. As a result, even our simplest pages required well over 100K of HTTP traffic, and some of the most complex pages were closer to 2MB. Fiddler's biggest strength for us was seeing how each of the various caching directives you can send in HTTP headers affects the browser and network traffic.

Coincidentally, right after I learned about Fiddler, I was asked to help troubleshoot a performance concern raised by one of our newer clients. They were sporadically and unpredictably experiencing "This page cannot be displayed (cannot find server or DNS error)" messages. It was inconsistent but seemed to happen primarily during the client's core business hours, and none of our other clients were experiencing comparable symptoms, so we figured it could be a network problem on their end. A combination of Fiddler (on one client user's computer) and Wireshark (on our web server) showed us a pretty clear pattern of dropped and resubmitted requests at the HTTP level. At the same time, our own internal logging showed an extraordinarily high variance in round-trip performance.

It turned out that the client's internal firewall was sporadically dropping some outgoing requests and incoming responses during times of peak traffic on their general-purpose network. The solution was to establish a VPN tunnel, using a separate infrastructure that they already had in place for their vendors and business partners. In hindsight, we might have reached the same conclusion even more quickly if we had run Fiddler on both sides and compared the traces.

No comments: