Friday, May 2, 2008

Fault-Tolerant ColdFusion Error Reporting


ColdFusion’s default reporting mechanism for uncaught errors is to dump the error information to the console. This has several disadvantages, two of them being that the console would have to be monitored and whoever is monitoring it is limited to the dumped information. To address this issue, ColdFusion Administrator has a setting for a Site-wide Error Handler. This is a custom template to execute when an error is encountered, and allows for sophisticated post-processing of the error.

At APT, we take advantage of this feature and have a very robust Site-Wide Error Handler. Among other things, it retrieves additional information about the user and setup, retrieves information about the product’s state, diagnoses the error, logs the error to a database, and sends an e-mail to relevant engineering and delivery team members. But there is a catch: if an error is encountered within the Site-wide Error Handler it stops processing and dumps that error to the console. Should this happen, we are right back where we started.

The solution is simple: make sure your Site-wide Error Handler never fails. However, if you have a complicated template, this is easier said than done. In this situation, we must make the Site-wide Error Handler fault-tolerant.

How do we achieve fault-tolerance? The foundation is liberal use of try-catch blocks within your Site-wide Error Handler. While it may at first appear sloppy and clutter up the code, it is critical for isolating the effects of errors, making sure the maximum amount of processing takes place, and ensuring the most information possible makes it back to you.

Look at this simple example template:


<cftry>
    Retrieve user information
    <cfcatch> Handle exception </cfcatch>
</cftry>
<cftry>
    Retrieve information about the product’s state
    <cfcatch> Handle exception </cfcatch>
</cftry>
<cftry>
    Send an e-mail with error and additional
     retrieved information

    <cfcatch> Handle exception </cfcatch>
</cftry>



If retrieving user information fails, we will still be sent an e-mail containing the error information and information about the product’s state. This is exactly the isolation we are going for. Subcomponents within in the try-catch blocks can be wrapped in their own try-catch blocks as well, creating an even more granular and fault-tolerant template.

When errors with your Site-wide Error Hander are caught, be sure you attempt to report information on them instead of letting them get swallowed up silently. In the above example, the information about the error encountered while retrieving user information should be included in the final e-mail sent. This feedback is critical in debugging and improving the Site-wide Error Handler—you will be glad you have it. However, be sure to wrap this reporting in its own try-catch in case there is an error with reporting the error. The overall theme is: the more paranoid the better!

By treating your Site-wide Error Handler as a series of granular tasks contained within try-catch blocks, the effect of failures will be limited while the maximum amount of error information safely makes its way back to you. This will be speed up the time it takes to identify, diagnose, and fix bugs and thus improve the quality of your software. <cftry> and <cfcatch> tags are essentially free, but engineering time is not.

No comments: