The common theme of my research is data systems. I am interested in supporting new complex workloads, integrating new technologies, offering algorithmic improvements in system design, and building new data management technology for the cloud.

Data Systems for Hybrid Analytics

Data-intensive real-time applications need data systems that support very efficient updates for incoming data. In addition, they need to provide fast access to old and fresh data alike for both historical analytics (i.e., long-term decisions) and real-time analytics. Building systems that can support such hybrid transactional/analytical processing (HTAP), however, is challenging the state-of-the-art data systems architectures; it requires rethinking of aspects like resource management, data layout, data organization, indexing, physical design, and query processing.

In my research, I propose new solutions that balance the conflicting requirements of read-intensive and update-intensive workloads by proposing new access method designs (like Upbit, Monkey, and Slalom), and leveraging optimization techniques that have not been traditionally used for real-time decisions for data systems design and tuning. An organic result of this research is a new classification of access method designs that enables both researchers and practitioners to have a better understanding of the access method design space and its tradeoffs. Several open questions remain, including how to combine seemingly incompatible designs, how to adapt to incoming workloads, and how to monitor and predict workload changes.

Persistent Deletes

The new paradigm of large-scale data management in the cloud has focused on supporting fast ingestion rates and efficient access times, leaving data privacy, data stewardship, and minimizing data retention cost as secondary goals. A key aspect of data privacy and stewardship is to offer efficient and definite deletion when it is required by an application, a user, or even the legislation. In addition to privacy, efficient deletion also helps to manage storage resources, by limiting the storage utilization and the corresponding energy consumption over data that are scheduled to be deleted. In the context of cloud computing, excessive storage utilization, and energy consumption amount to wasting petabytes of storage space and tens of millions of dollars.

In light of these challenges, we need to rethink the design of modern data systems for the cloud. In addition to classic performance goals, data systems should also support efficient persistent deletes, that is, erasure of a data object physically from all layers of the cloud hierarchy including data systems, caching, file systems, and device drivers.

Integrating New Storage Hardware

New storage and processing devices are challenging traditional assumptions when designing a data system. Rotational disks have been very well modeled and studied, however, modern devices have very different behavior. In my research, I propose a new way to view this problem by augmenting the hardware modeling flexibility without proposing a complex model; usability of the model is the primary goal. I focus on capturing the fundamental behavioral changes of storage hardware: concurrency and cost asymmetry of random accesses. By exploiting better modeling and understanding how new hardware behaves, we can efficiently integrate storage devices, and build algorithms that can better exploit them.

Access Method Design Using Workload and Dataset Knowledge

In this line of work I study how we can use knowledge for the nature of the workload and the (pre-existing or inherent) organization of the data to (i) avoid redundant work, (ii) reduce the update and/or the read cost, and (iii) select the best access and query processing strategy. In my research, I propose new ways to identify workload and dataset properties by monitoring and learning, and new access method designs that can operate optimally exploiting any degree of pre-existing data organization transparently.

Opportunities

MS & PhD Opportunities: I am actively recruiting motivated individuals for the Master and PhD programs. If you are interested in research on data systems, especially in the areas of data analytics, cloud data management, and hardware-aware data systems architectures you are encouraged to apply and to contact me directly! This year's graduate program deadline is December 15. Details on the application requirements are available on the BU CS website.