Detecting Internet Outages with Precise Active Probing (extended)

Detecting Internet Outages with Precise Active Probing (extended)

Quan, Lin and Heidemann, John and Pradkin, Yuri
USC/Information Sciences Institute

Lin Quan, John Heidemann and Yuri Pradkin 2012. Detecting Internet Outages with Precise Active Probing (extended). Technical Report ISI-TR-2012-678b. USC/Information Sciences Institute.

Abstract

Parts of the Internet are down every day, from the intentional shutdown of the Egyptian Internet in Jan.%7e2011 and natural disasters such as the Mar.%7e2011 Japanese earthquake, to the thousands of small outages caused by localized accidents, and human error, maintenance, or choices. Understanding these events requires efficient and accurate detection methods, motivating our new system to detect network outages by active probing. We show that a \emphsingle computer can track outages across the entire analyzable IPv4 Internet, probing a sample of 20 addresses in all 2.5M responsive /24 address blocks. We show that our approach is \emphsignificantly more accurate than the best current methods, with 31% fewer false conclusions, while providing 14% greater coverage and requiring about the same probing traffic. We develop new algorithms to identify outages and cluster them to events, providing the first visualization of outages. We carefully validate our approach, showing consistent results over two years and from three different sites. Using public BGP archives and news sources we confirm 83% of large events. For a random sample of 50 observed events, we find 38% in partial control-plane information, reaffirming prior work that small outages are often not caused by BGP\@. Through controlled emulation we show that our approach detects 100% of full-block outages that last at least twice our probing interval. Finally, we report on Internet stability \emphas a whole, and the size and duration of typical outages, using core-to-edge observations with much larger coverage than prior mesh-based studies. We find that about 0.3% of the Internet is likely to be unreachable at any time, suggesting the Internet provides only 2.5 “nines” of availability.

Reference

@techreport{Quan12a,
  author = {Quan, Lin and Heidemann, John and Pradkin, Yuri},
  title = {Detecting Internet Outages with Precise
                    Active Probing (extended)},
  institution = {USC/Information Sciences Institute},
  year = {2012},
  sortdate = {2012-02-01},
  project = {ant, lacrend, lander, madcat, duoi},
  jsubject = {routing},
  number = {ISI-TR-2012-678b},
  month = feb,
  note = {Updated May 2012; TR-678 superceeds ISI-TR-2011-672},
  location = {johnh: pafile},
  keywords = {routing outage detection, active probing,
                    ntework outages, revision of [Quan11a]},
  url = {http://www.isi.edu/%7ejohnh/PAPERS/Quan12a.html},
  pdfurl = {http://www.isi.edu/%7ejohnh/PAPERS/Quan12a.pdf},
  otherurl = {ftp://ftp.isi.edu/isi-pubs/tr-678.pdf},
  myorganization = {USC/Information Sciences Institute},
  copyrightholder = {authors}
}