About Us
Our Computer Systems Architecture and Data Science (CASS) laboratory have primary research interests which cover a wide spectrum in the areas of high-performance computing and storage systems. A common thread among our research projects focuses on fast data access and resource sharing with cost- and energy-efficient management at different levels of memory and storage hierarchies in supercomputer, parallel, and distributed computer systems. More recently, CASS research interests cover the areas of interdisciplinary computing, computer architecture, and low-power computing. A significant complementary thread among CASS’s extensive research projects emphasizes developing new software tools and hardware platforms to stimulate advances in science and engineering research, where large digital data collections are increasingly prevalent. CASS research projects have been sponsored by several federal funding agencies such as the National Science Foundation, the Department of Energy, and NASA.
News
Our research paper ALISE: Accelerating Large Language Model Serving with Speculative Scheduling to be presented in 2024 ACM/IEEE International Conference on Computer-Aided Design. https://2024.iccad.com/
Exciting news! Our research paper “ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching” has been accepted by the prestigious 51st IEEE/ACM International Symposium on Computer Architecture (ISCA ’24)! 🎉📜
In this work, we propose a novel algorithm-system co-design approach to alleviate the I/O bottlenecks in LLM inference. Our approach employs a Sparse Window Attention (SWA) mechanism that identifies the high-importance tokens for generative inference with low overhead and minimal accuracy drop, as well as a three-phase dynamic scheduling strategy that optimizes the trade-offs between computation and caching.
https://lnkd.in/eUmy4_Wf
Stay tuned for more details about our presentation at ISCA ’24!
Professor Wang is serving as Program Co-Chair for the 29th IEEE International Conference on High-Performance Computing, Data, and Analytics, December 18-21, 2022, Bangalore, India, and the 16th International Conference on Networking, Architecture, and Storage (NAS 2022) will be held at Philadelphia, PA.
-
CT-Net: Channel Tensorization Network for Video Classification. International Conference on Learning Representations (ICLR’21), 2021.
-
Lelantus: Fine-Granularity CopyOn-WriteOperations for Secure Non-Volatile Memories. In: ACM/IEEE 47th International Symposium on Computer Architecture (ISCA). pp:597-609. https://www.iscaconf.org/isca2020
Current Research Projects
-
- SHF: Small: Revamping I/O Architectures Using Machine Learning Techniques on Big Compute Machines.
- NSF PPoSS Planning: Data-Centric Computing for Scalable Heterogeneous Memory and Storage Systems Architecture.
- SHF: Small: Developing a Highly Efficient and Accurate Approximation System for Warehouse-Scale Computers with the Sub-dataset Distribution Aware Approach.