AggNet: Cost-Aware Aggregation Networks for Geo-distributed Streaming Analytics

In Proceedings of the 6th IEEE/ACM Symposium on Edge Computing (SEC 2021)

Dhruv Kumar

University of Minnesota, Twin Cities

Sohaib Ahmad

University of Massachusetts, Amherst

Abhishek Chandra

University of Minnesota, Twin Cities

Principal Investigator

Ramesh Sitaraman

University of Massachusetts, Amherst

Principal Investigator

Abstract

Large-scale real-time analytics services continuously collect and analyze data from end-user applications and devices distributed around the globe. Such analytics requires data to be transferred over the wide-area network (WAN) to data centers (DCs) capable of processing the data. Since WAN bandwidth is expensive and scarce, it is beneficial to reduce WAN traffic by partially aggregating the data closer to end-users. We propose aggregation networks for performing aggregation on a geo-distributed edge-cloud infrastructure consisting of edge servers, transit and destination DCs. We identify a rich set of research questions aimed at reducing the traffic costs in an aggregation network. We present an optimization formulation for solving these questions in a principled manner, and use insights from the optimization solutions to propose an efficient, near-optimal practical heuristic. We implement the heuristic in AggNet, built on top of Apache Flink. We evaluate our approach using a geo-distributed deployment on Amazon EC2 as well as a WAN-emulated local testbed. Our evaluation using real-world traces from Twitter and Akamai shows that our approach is able to achieve 47% to 83% reduction in traffic cost over existing baselines without any compromise in timeliness.

This space for any disclamers, grant information, affiliations, etc.

Website made by Kanishk Kacholia