The main focus of my current research is on three areas: data systems architectures for hybrid transactional analytical workloads, integration of new storage technologies in the data systems software stack, and algorithmic improvements of access method design using workload and dataset knowledge.
Data-intensive real-time applications need data systems that support very efficient updates for incoming data. In addition they need to provide fast access to old and fresh data alike for both historical analytics (i.e., long-term decisions) and real-time analytics. Building systems that can support such hybrid transactional/analytical processing (HTAP), however, is challenging the state-of-the-art data systems architectures; it requires rethinking of aspects like resource management, data layout, data organization, indexing, physical design, and query processing.
In my research I propose new solutions that balance the conflicting requirements of read-intensive and update-intensive workloads by proposing new access method designs (like Upbit, Monkey, and Slalom), and leveraging optimization techniques that have not been traditionally used for real-time decisions for data systems design and tuning. An organic result of this research is a new classification of access method designs that enables both researchers and practitioners to have a better understanding of the access method design space and its tradeoffs. Several open questions remain, including how to combine seemingly incompatible designs, how to adapt to incoming workloads, and how to monitor and predict workload changes.
New storage and processing devices are challenging the traditional assumptions when designing a data system. Rotational disks have been very well modelled and studied, however, modern devices have very different behavior. In my research I propose a new way to view this problem by augmenting the hardware modeling flexibility without proposing a complex model; usability of the model is the main goal. I focus on capturing the fundamental behavioral changes of storage hardware: concurrency and cost asymmetry of random accesses. By exploiting better modelling and understanding how new hardware behaves, we can efficiently integrate storage devices, and build algorithms that can better exploit them.
In this line of work I study how we can use knowledge for the nature of the workload and the (pre-existing or inherent) organization of the data to (i) avoid redundant work, (ii) reduce the update and/or the read cost, and (iii) select the best access and query processing strategy. In my research I propose new ways to identify workload and dataset properties by monitoring and learning, and new access method designs that can operate optimally exploiting any degree of pre-existing data organization transparently.
MS & PhD Opportunities: I am actively recruiting motivated individuals for the Master and PhD programs. If you are interested in research on Data Systems, especially in the areas of data analytics and hardware-aware data systems architectures you are encouraged to apply and to contact me directly! This year's graduate program deadline is December 15. Details on the application requirements are available on the BU CS website.
Postdoc Opportunities: I am also looking for a motivated post-doctoral researcher in the area of data systems for hybrid analytics. To apply please contact me with a CV, a research statement, and the names of three references.