Jun Wang’s achievements – Computer Systems Architecture and Data Science Laboratory

Professor Jun Wang’s Achievements:

research or creative achievements, new discoveries, major contributions to the field, creativeness, originality, significant breakthroughs, and so on.

My primary research interests cover a wide spectrum in the areas of high-performance and computer/big data systems. A common thread among my research projects focuses on fast data accesses and resource sharing with cost- and energy-efficient management at different levels of memory and storage hierarchies in supercomputer, parallel and distributed computer systems. My secondary research interests cover the areas of data science, interdisciplinary computing, computer architecture, and low-power computing. A significant complimentary thread among my extensive research projects emphasize at developing new software tools and hardware platforms to stimulate advances in science and engineering research, where large digital data collections are increasingly prevalent. I summarize my contributions as follows:

Today scientists are conjoining two key aspects of computing which will cause a great influx in the oceans of data generated from scientific computing. These two tenets, raw experimental data and computation-rich high-resolution simulations, when combined, have the ability to revolutionize all fields of science, from biology to astrophysics. Since 2008, my group has developed a new computing paradigm — data-intensive HPC analytics for scientists and engineers to better handle this onslaught of data towards scientific inquiry. Our new interdisciplinary research framework being developed in 2009 and 2010 is known as one new data-intensive HPC analytics platform, enabling many scientists and engineers to conduct their big data analyses with complex access patterns in both a super faster and easier way compared with the state-of-the-art solutions. We estimate this could not only save millions of dollarsof physicists’ labor in Los Alamos National Lab, but also significantly shorten the development cycle of analyses programs. Recently, we leverage such data-intensive HPC analytics in Clouds.

The power consumption has always been a huge issue for data centers and data servers. According to the Environmental Protection Agency report in 2007, national energy consumption by servers and data centers attributed 38 percent of the electricity use to enterprise-class data centers and could nearly double again in another five years. Since 2005, I started investigating low-power computing. My group is among the first to not only conserve power and energy but also boost the performance for RAID storage using coding techniques, which is a building block of multi-billion-dollar industry of servers and data centers. Our new energy-efficient storage system prototype demonstrates that more than 30%-50% energy could be conserved. This consequently leads to billions of dollars of potential energy savings in today’s servers and data centers.

In 2007, my research has delivered a high-performance storage software package in the line of 2,000 C codes to Parallel Virtual File System (PVFS) group in Argonne national laboratory.

In 2008, we invent a new data distribution scheme for ever important multi-replication storage system that is deployed in today’s data centers. This work is the first to provide a complete and theoretical answer on how to realize maximum parallelism in multi-replication data storage architectures.

In addition, my research group has investigated the following works:

In 2011, we propose a novel power management scheme called MAR (modeless, adaptive, rule-based) in multiprocessor systems to minimize the CPU power consumption under performance constraints. In 2011, we propose a new I/O library for visualization applications: VisIO. A representative dataset, VPIC, across 128 nodes showed a 64.4% read performance improvement compared to the provided Lustre installation. In 2008, we present a novel technique called Hierarchical Bloom Filter Arrays (HBA) to map filenames to the metadata servers holding their metadata. Two levels of probabilistic arrays, namely, the Bloom filter arrays with different levels of accuracies, are used on each metadata server. In 2006, we propose to exploit two-dimensional locality to improve P2P system search efficiency. We present a locality-aware P2P system architecture called Foreseer, which explicitly exploits geographical locality and temporal locality by constructing a neighbor overlay and a friend overlay, respectively. In 2002, we developed a novel reordering write buffer for Log-structured File Systems (LFS), slashing the system?s overall write cost by up to 53%. In 2002, we developed a novel, User-space, Customized File System called UCFS that can drastically improve I/O performance of proxy servers.