Partition-tolerant Distributed Publish/Subscribe Systems

Reza Sherafat Kazemzadeh and Hans-Arno Jacobsen.

University of Toronto, 2010.
Pages 1-10.


In this paper, we develop {\em reliable} distributed publish/subscribe algorithms that can tolerate concurrent failure of up to $\delta$ brokers or links. In our approach, $\delta$ is a configuration parameter which determines the level of fault-tolerance of the system, and reliability refers to exactly-once and per-source in-order delivery of publications to clients with matching subscriptions. We propose protocols to address three problems in presence of broker or link failures: {\em (i)} subscription propagation; {\em (ii)} event forwarding; and {\em (iii)} broker recovery. To precisely study the effect of multiple failures on the operation of the system, we introduce two types of network partitions which we term {\em partition islands} and {\em partition barriers}. Our approach is able to transparently bypass partition islands while guaranteeing reliable publication delivery at all times. For barriers however, we maintain reliability by excluding delivery of publications that may violate its requirements. Finally, we study the effectiveness of our approach when the number of concurrent failures exceed delta. Via experimental evaluations, we demonstrate that a system configured with a modest value of $\delta=3$ is able to reliably deliver $97\%$ of publications in presence of failure of up to $17\%$ of brokers.


Readers who enjoyed the above work, may also like the following: