In Symposium on Reliable Distributed Systems, pages 41-50, September 2009.
Acceptance rate: 22%. Number of submissions: 104.
This paper develops reliable distributed publish/subscriber algorithms with service availability in the face of concurrent crash failure of up to Î´ brokers. The reliability of service in our context refers to per-source in-order and exactly-once delivery of publications to matching subscribers. To handle failures, brokers maintain data structures that enable them to reconnect the topology and compute new forwarding paths on the fly. This enables fast reaction to failures and improves the system's availability. Moreover, we present a recovery procedure that recovering brokers execute in order to re-enter the system, and synchronize their routing information.