Nebula Edge Cloud
Overview
Centralized cloud systems are the de-facto platform for large-scale data analysis in many domains. However, this model is often unfit for many kinds of data analysis whose data itself is produced in a distributed fashion around the globe. For geo-distributed data analysis, the centralized nature of traditional clouds requires data to be brought into a central location for processing over the wide area network (WAN) which is highly heterogeneous, slow, and often costly. Nebula edge cloud project explores the use of distributed edge resources to mitigate the overhead faced by centralized clouds for geo-distributed data-intensive applications. Nebula utilizes distributed storage and compute (volunteer) resources that are connected over a wide-area network. Nebula provides lightweight architecture and implements a number of optimizations to enable efficient exploitation of edge resources for in-situ data-intensive computing, including location-aware data and computation placement, replication, and recovery. Nebula has been designed with the following goals in mind:
- Support for geo-distributed data-intensive computing
- Location-aware resource management
- Secure execution environment
- Ease of use
- Fault Tolerance
Nebula components
- Nebula Central:
Nebula Central is the front-end for the Nebula ecosystem. It provides a simple, easy-to-use web-based portal that allows nodes to join the system, users to upload/download their data, application writers to inject applications into the system, and tools to manage and monitor application execution.
- DataStore:
The DataStore is a simple per-application storage service that supports efficient and location-aware data storage in Nebula. Each DataStore consists of data nodes that store the actual data, and a DataStore Master that keeps track of the storage system metadata and makes data placement decisions.
- ComputePool:
The ComputePool provides per-application computational resources through a set of compute nodes. Compute nodes within a ComputePool are scheduled by a ComputePool Master that coordinates their execution. The compute nodes use the DataStore to access and retrieve data, and they are assigned tasks based on application-specific requirements and data location.
- Nebula Monitor:
The Nebula Monitor does performance monitoring of nodes and network characteristics. This monitoring information consists of node computation speeds, memory and storage capacities, and network bandwidth, as well as health information such as node and link failures. This information is dynamically updated and is used by the DataStore and ComputePool Masters for data placement, scheduling and fault tolerance.
- Resource Manager:
The Resource Manager provides support for sharing compute nodes among multiple ComputePools. This resource sharing mechanism is performed dynamically depending on factors such as the current availability of the resources, the number of concurrent applications, the reliability of the resources, and so on.
Demo Video
People
- Principal Investigators
- Students
- Albert Jonathan - albert@cs.umn.edu
- Kwangsung Oh - ohkwang@cs.umn.edu
Publications
-
Ensuring Reliability in Geo-Distributed Edge Cloud
Albert Jonathan, Muhammed Uluyol, Kwansgung Oh, Abhishek Chandra, and Jon Weissman
IEEE International Symposium on Resilient Communication Systems (ISRCS) 2017
-
Nebula: Distributed Edge Cloud for Data Intensive Computing
Albert Jonathan, Mathew Ryden, Kwansgung Oh, Abhishek Chandra, and Jon Weissman
IEEE Transactions on Parallel and Distributed Systems (TPDS) 2017
-
Awan: Locality-aware Resource Manager for Geo-distributed Data-intensive Applications
Albert Jonathan, Abhishek Chandra, and Jon Weissman
IEEE International Conference on Cloud Engineering (IC2E), Berlin, Germany, April 2016.
-
Nebula: Distributed Edge Cloud for Data Intensive Computing
Mathew Ryden, Kwangsung Oh, Abhishek Chandra and Jon B. Weissman
IEEE International Conference on Cloud Engineering (IC2E), Boston, Massachusetts, March 2014.
- Nebula: Data Intensive Computing over Widely Distributed Voluntary Resources
Tech Report
Mathew Ryden, Abhishek Chandra and Jon Weissman
Technical Report TR12-007, Department of Computer Science and Engineering, University of Minnesota, March 2013. - Early Experience with the Distributed Nebula Cloud
Pradeep Sundarrajan, Mathew Ryden, Rohit Nair, Abhishek Gupta, Abhishek Chandra and Jon B. Weissman
4th International Workshop of Data-Intensive Computing (DIDC with HPDC), 2011.
- Nebulas: Using Distributed Voluntary Resources to Build Clouds
Abhishek Chandra and Jon B. Weissman
Workshop on Hot Topics in Cloud Computing (HotCloud), San Diego, CA, June 2009.
Posters


Sponsor
We would like to acknowledge NSF Grant: NSFCSR 1162405, which supported this research.