arg logo

      Alian Research Group

            Computer Systems Laboratory
            School of Electrical and Computer Engineering
            Cornell University

Links: LinkedIn | github

About

The computation in future datacenters will be distributed over a heterogenous array of processing elements, packaged modularly within a servers boundaries. The inter- and intra-server data movement will bottleneck such a computing landscape. The vision of ARG is to seamlessly integrate processor, memory, and network architecture through a co-design with operating systems, network software stack, and software libraries to minimize the data movement in future datacenters. ARG is part of Computer Systems Lab at ECE Cornell.

We have several open Ph.D. and one Post Doc position. Please read this if you are interested to join us!

Current Projects

System Design for Near-Memory Acceleration at Scale

The strict separation of responsibilities between compute and memory has given rise to the memory wall. While there is extensive research on accelerating application kernels near or inside memory, the adoption of such accelerators at the system level remains an open research question. We are designing cross-stack solutions to distribute computations to memory without intrusive hardware and software changes, aiming to accelerate large-scale services.

  • Near-memory acceleration of datacenter taxes [MICRO'23][HPCA'24]
  • Application-transparent, near-memory distributed processing [MICRO'18]

Fusion of Accelerators, IO, and General Purpose Cores

Future datacenters will integrate a diverse array of heterogeneous accelerators alongside general-purpose cores. As applications execute on this platform, each phase will run on either a general-purpose or specialized compute element. We are developing solutions to seamlessly fuse these general-purpose and specialized compute elements at runtime, creating a composable, accelerated compute chain.

Development and Deployment Co-Design of Micro-Services

Due to the sheer scale of the cloud services we use every minute of our lives, clear abstraction layers have emerged in the development and deployment of large-scale services. While this decoupling enhances programmer productivity, it can also lead to significant inefficiencies, especially with the ever-increasing heterogeneity in datacenter compute fabrics. We are designing hardware and software solutions to efficiently deploy large-scale services in the heterogeneous cloud.

  • Profiling service weaver [Under Development]
  • Specialized hardware threads for networking [Under Development]

Acceleration of Compund AI Systems

Today's AI systems are not just LLMs; they consist of other components that form a Compund AI System. We are designing systems to accelerate end-to-end Compound AI systems.

  • Near-memory Acceleration of Retrieval Augmented Generation (RAG) [ASPLOS'25]

Specializing Memory Subsystem

There is significant focus on specializing compute, but less emphasis on specializing the memory hierarchy to support that compute. We are developing a holistic understanding of various memory and interconnection technologies to tailor the memory subsystem accordingly.

  • Survey of various DRAM technologies [Under Development]
  • Per-bank bandwidth regulation for shared Last-Level Cache [RTSS'24]
  • Near-memory datacenter network [MICRO'19]

Architectural Simulation and Tool Development

Software-based simulation is the backbone of computer architecture research and development. Architectural simulators such as gem5 are widely used by academia and industry. However, traditionally, the focus of architectural simulators has been primarily on simulating CPU and memory subsystems, often overlooking the I/O subsystem and the complex interplay between software, OS, hardware, and network. We are extending the gem5 simulator to model modern network technologies and run the latest software stack. Additionally, we are using generative AI to reduce the steep learning curve of gem5 and to increase the productivity of design space exploration.

Publications

  • Derrick Quinn, Mohammad Nouri, Neel Patel, Alireza Salimi, Sukhan Lee, Hamed Zamani, and Mohammad Alian, "Accelerating Retrieval-Augmented Generation," ASPLOS 2025 [paper][slides]
  • Connor Sullivan, Alex Manley, Mohammad Alian, Heechul Yun, "Per-Bank Bandwidth Regulation of Shared Last-Level Cache for Real-Time Systems," RTSS 2024 [paper][slides]
  • Johnson Umeike, Siddharth Agarwal, Nikita Lazarev, Mohammad Alian, "Userspace Networking in gem5," ISPASS 2024 [paper][ open source][slides]
  • Rohan Mahapatra, Soroush Ghodrati, Byung Hoon Ahn, Sean Kinzer, Shu-Ting Wang, Hanyang Xu, Lavanya Karthikeyan, Hardik Sharma, Amir Yazdanbakhsh, Mohammad Alian, and Hadi Esmaeilzadeh, "Domain-Specific Computational Storage for Serverless Computing," ASPLOS 2024 [paper][slides]
  • Neel Patel, Amin Mamandipoor, Mohammad Nouri, and Mohammad Alian, "SmartDIMM: In-Memory Acceleration of Upper Layer I/O Protocols," HPCA 2024 [paper][slides] [artifacts available]
  • Shu-Ting Wang, Hanyang Xu, Amin Mamandipoor, Rohan Mahapatra, Byung Hoon Ahn, Soroush Ghodrati, Krishnan Kailas, Mohammad Alian, Hadi Esmaeilzadeh, "Data Motion Acceleration: Chaining Cross-Domain Multi Accelerators," HPCA 2024 [paper][slides]
  • Neel Patel, Amin Mamandipoor, Derrick Quinn, and Mohammad Alian, "XFM: Accelerated Software-Defined Far Memory," MICRO 2023 [artifacts available, functional, and reproduced][paper][slides]
  • Johnson Umeike, Neel Patel, Alex Manley, Amin Mamandipoor, Heechul Yun, Mohammad Alian, "Profiling gem5 Simulator," ISPASS 2023 [paper][slides]
  • More
  • Mohammad Alian, Siddharth Agarwal, Jongmin Shin, Neel Patel, Yifan Yuan, Daehoon Kim, Ren Wang, Nam Sung Kim, "IDIO: Network-driven, inbound network data orchestration on server processors," MICRO 2022 [paper][slides]
  • Ki-Dong Kang, Gyeongseo Park, Hyosang Kim, Mohammad Alian, Nam Sung Kim, and Daehoon Kim, "NMAP: Power Management Based on Network Packet Processing Mode Transition for Latency-Critical Workloads," MICRO 2021 [paper]
  • Yifan Yuan, Mohammad Alian, Yipeng Wang, Ilia Kurakin, Ren Wang, Charlie Tai, Nam Sung Kim, "Don't Forget the I/O When Allocating Your LLC," ISCA 2021 [technology adapted by Intel®] [paper][slides]
  • Mohammad Alian, Jongmin Shin, Ki-Dong Kang, Ren Wang, Alexandros Daglis, Daehoon Kim, Nam Sung Kim, "IDIO: Orchestrating Inbound Network Data on Server Processors," IEEE Computer Architecture Letters (CAL) 2020 [paper]
  • Soroush Ghodrati, Byung Hoon Ahn, Joon Kyung Kim, Sean Kinzer, Brahmendra Yatham, Navateja Alla, Hardik Sharma, Mohammad Alian, Eiman Ebrahimi, Nam Sung Kim, Cliff Young, Hadi Esmaeilzadeh, "Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks," MICRO 2020 [paper][slides]
  • Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, et al. "The gem5 simulator: Version 20.0+," arXiv preprint 2020 [paper]
  • Mohammad Alian, Yifan Yuan, Jie Zhang, Ren Wang, Myoungsoo Jung, and Nam Sung Kim, "Data direct I/O characterization for future I/O system exploration," ISPASS 2020 [paper][slides]
  • Mohammad Alian, and Nam Sung Kim, "NetDIMM: Low-latency, near-memory network interface architecture," MICRO 2019 [paper][slides]
  • Mohammad Alian, Seung Won Min, Hadi Asgharimoghaddam, Ashutosh Dhar, Dong Kai Wang, Thomas Roewer, Adam McPadden, Oliver OHalloran, Deming Chen, Jinjun Xiong, Daehoon Kim, Wen-mei Hwu, and Nam Sung Kim, "Application-transparent near-memory processing architecture with memory vhannel network," MICRO 2018 [best paper nominee][industry product] [paper][slides]
  • Youjie Li, Jongsea Park, Mohammad Alian, Yifan Yuan, Qu Zheng, Petian Pan, Ren Wang, Alexander Gerhard Schwing, Hadi Esmaeilzadeh, and Nam Sung Kim, "A network-centric hardware/argorithm co-design to accelerate distributed training of deep neural networks," MICRO 2018 [hardware prototype demonstration] [paper][slides]
  • Mohammad Alian, Krishna Parasuram Srinivasan, and Nam Sung Kim, "Simulating PCI-Express interconnect for future system exploration," IISWC 2018 [best paper nominee] [paper][slides]
  • Mohammad Alian, Gabor Dozsa, Umur Darbaz, Stephan Diestelhorst, Daehoon Kim, and Nam Sung Kim, "dist-gem5: Distributed simulation of computer clusters," ISPASS 2017 [best paper nominee][open source] [paper][slides]
  • Mohammad Alian, Ahmed Abulila, Lokesh Jindal, Daehoon Kim, and Nam Sung Kim, "NCAP: Network-driven, packet context-aware power management for client-server architecture," HPCA 2017 [best paper nominee][IEEE Micro honerable mention] [paper][slides]

Current Members

Past Members

    Johnson Umeike   MS. Graduate   First Employment: Ph.D. student at the University of Maryland

Research Sponsors

  • National Science Foundation
  • Semiconductor Research Corporation
  • Samsung Electronics
  • NVIDIA (Equipment Donation)
  • Ampere Computing (Equipment Donation)