HA Systems and Techniques
Design Patterns for High Availability
is a good, brief introduction to HA design principles.
A Modern Taxonomy of High Availability
by Ron I. Resnick does a good job defining many of the common terms used in the HA community and provide an overview of the field.
Getting HA with Fault Management
from OSE is a nice introductory articles about dealing with faults in a HA system.
Practical Byzantine Fault Tolerance
- Miguel Castro and Barbara Liskov
Patterns for Fault Tolerance
A Perspective on the State of Research in Fault-Tolerant Systems
by Charles Weinstock & David Gluch
P.G. Neumann, Practical Architectures for Survivable Systems and Networks, Phase-One Final Report. January 28, 1999.
http://www.csl.sri.com/neumann/arl-one.html
Fred B. Schneider
has a number of interesting papers related to HA.
Fault-Tolerant CORBA and Java
- HA middleware
Checklist for Banking Risk Management (Revised 1998 Edition)
Philip Koopman's Home at CMU