Alian Research Group

Computer Systems Laboratory
School of Electrical and Computer Engineering
Cornell University

Links: LinkedIn | github

About

The computation in future datacenters will be distributed over a heterogenous array of processing elements, packaged modularly within a servers boundaries. The inter- and intra-server data movement will bottleneck such a computing landscape. The vision of ARG is to seamlessly integrate processor, memory, and network architecture through a co-design with operating systems, network software stack, and software libraries to minimize the data movement in future datacenters. ARG is part of Computer Systems Lab at ECE Cornell.

Please read this if you are interested to join us!

Current Projects

System Design for Near-Memory Acceleration at Scale

The strict separation of responsibilities between compute and memory has given rise to the memory wall. While there is extensive research on accelerating application kernels near or inside memory, the adoption of such accelerators at the system level remains an open research question. We are designing cross-stack solutions to distribute computations to memory without intrusive hardware and software changes, aiming to accelerate large-scale services.

Near-memory acceleration of datacenter taxes [MICRO'23][HPCA'24]
Application-transparent, near-memory distributed processing [MICRO'18]

Fusion of Accelerators, IO, and General Purpose Cores

Future datacenters will integrate a diverse array of heterogeneous accelerators alongside general-purpose cores. As applications execute on this platform, each phase will run on either a general-purpose or specialized compute element. We are developing solutions to seamlessly fuse these general-purpose and specialized compute elements at runtime, creating a composable, accelerated compute chain.

Data motion acceleration [HPCA'24]
Intelligent data direct IO [ISCA'21][MICRO'22]

Development and Deployment Co-Design of Micro-Services

Due to the sheer scale of the cloud services we use every minute of our lives, clear abstraction layers have emerged in the development and deployment of large-scale services. While this decoupling enhances programmer productivity, it can also lead to significant inefficiencies, especially with the ever-increasing heterogeneity in datacenter compute fabrics. We are designing hardware and software solutions to efficiently deploy large-scale services in the heterogeneous cloud.

Profiling service weaver [Under Development]
Simultaneous Data Delivery Threads [CAL'25]

Acceleration of Compund AI Systems

Today's AI systems are not just LLMs; they consist of other components that form a Compund AI System. We are designing systems to accelerate end-to-end Compound AI systems.

Near-memory Acceleration of Retrieval Augmented Generation (RAG) [ASPLOS'25]

Specializing Memory Subsystem

There is significant focus on specializing compute, but less emphasis on specializing the memory hierarchy to support that compute. We are developing a holistic understanding of various memory and interconnection technologies to tailor the memory subsystem accordingly.

Survey of various DRAM technologies [Under Development]
Per-bank bandwidth regulation for shared Last-Level Cache [RTSS'24]
Near-memory datacenter network [MICRO'19]

Architectural Simulation and Tool Development

Software-based simulation is the backbone of computer architecture research and development. Architectural simulators such as gem5 are widely used by academia and industry. However, traditionally, the focus of architectural simulators has been primarily on simulating CPU and memory subsystems, often overlooking the I/O subsystem and the complex interplay between software, OS, hardware, and network. We are extending the gem5 simulator to model modern network technologies and run the latest software stack. Additionally, we are using generative AI to reduce the steep learning curve of gem5 and to increase the productivity of design space exploration.

Accurate network simulation in gem5 [IISWC'18][ISPASS'20][ISPASS'24]
Accelerating gem5 [ISPASS'17][ISPASS'23]
Open source DRAM datapath model [ISPASS'25]

Publications

2025

Derrick Quinn, Ezgi Yücel, José Martínez, Mohammad Alian, "Multi-Stage Data-Centric Dense Retrieval," IEEE MICRO 2025 [paper]
Zuoming Fu, Alex Manley, Mohammad Alian, "gem5 Co-Pilot: AI Assistant Agent for Architectural Design Space Acceleration," CAMS 2025 [paper]
Derrick Quinn, Ezgi Yücel, Jinkwon Kim, José Martínez, Mohammad Alian, "LongSight: Compute-Enabled Memory to Accelerate Large-Context LLMs via Sparse Attention," MICRO 2025 [paper]
Neel Patel, Ren Wang, Mohammad Alian, "RACER: Avoiding End-to-End Slowdowns in Accelerated Chip Multi-Processors," ACM TACO 2025 [paper]
Derrick Quinn, Neel Patel, Mohammad Alian, "Compute-Enabled CXL Memory Expansion for Efficient Retrieval Augmented Generation," IEEE MICRO 2025 [paper]
S M Mojahidul Ahsan, Mohammad Nouri, Ramesh Reddy Ganapam, Mohammad Alian and Tamzidul Hoque, "A Reconfigurable and Accurate Circuit-Level Substrate for DRAM Design and Analysis," GLSVLSI 2025 [paper]
Neel Patel, Mohammad Alian, "XRT: An Accelerator-Aware Runtime for Accelerated Chip Multiprocessors," USENIX ATC 2025 [paper]
Derrick Quinn, Ezgi Yücel, Martin Prammer, Zhenxing Fan, Kevin Skadron, Jignesh Patel, José Martínez, Mohammad Alian, "DReX: Accurate and Scalable Dense Retrieval Acceleration via Algorithmic-Hardware Codesign," ISCA 2025 [paper]
Amin Mamandipoor, Huy Tran, and Mohammad Alian, "SDT: Cutting Datacenter Tax Through Simultaneous Data-Delivery Threads," CAL 2025 [paper]
Ahsan Mojahidul, Mohammad Nouri, Ramesh Reddy Ganapam, Mohammad Alian, and Tamzidul Hoque "A Flexible and Accurate Circuit-Level Substrate for Future DRAM Design and Analysis," ISPASS 2025 [poster]
Derrick Quinn, Mohammad Nouri, Neel Patel, Alireza Salimi, Sukhan Lee, Hamed Zamani, and Mohammad Alian, "Accelerating Retrieval-Augmented Generation," ASPLOS 2025 [paper][slides]

2024

Connor Sullivan, Alex Manley, Mohammad Alian, Heechul Yun, "Per-Bank Bandwidth Regulation of Shared Last-Level Cache for Real-Time Systems," RTSS 2024 [paper][slides]
Johnson Umeike, Siddharth Agarwal, Nikita Lazarev, Mohammad Alian, "Userspace Networking in gem5," ISPASS 2024 [paper][ open source][slides]
Rohan Mahapatra, Soroush Ghodrati, Byung Hoon Ahn, Sean Kinzer, Shu-Ting Wang, Hanyang Xu, Lavanya Karthikeyan, Hardik Sharma, Amir Yazdanbakhsh, Mohammad Alian, and Hadi Esmaeilzadeh, "Domain-Specific Computational Storage for Serverless Computing," ASPLOS 2024 [paper][slides]
Neel Patel, Amin Mamandipoor, Mohammad Nouri, and Mohammad Alian, "SmartDIMM: In-Memory Acceleration of Upper Layer I/O Protocols," HPCA 2024 [paper][slides] [artifacts available]
Shu-Ting Wang, Hanyang Xu, Amin Mamandipoor, Rohan Mahapatra, Byung Hoon Ahn, Soroush Ghodrati, Krishnan Kailas, Mohammad Alian, Hadi Esmaeilzadeh, "Data Motion Acceleration: Chaining Cross-Domain Multi Accelerators," HPCA 2024 [paper][slides]

Neel Patel, Amin Mamandipoor, Derrick Quinn, and Mohammad Alian, "XFM: Accelerated Software-Defined Far Memory," MICRO 2023 [artifacts available, functional, and reproduced][paper][slides]
Johnson Umeike, Neel Patel, Alex Manley, Amin Mamandipoor, Heechul Yun, Mohammad Alian, "Profiling gem5 Simulator," ISPASS 2023 [paper][slides]

2022 and Older

Mohammad Alian, Siddharth Agarwal, Jongmin Shin, Neel Patel, Yifan Yuan, Daehoon Kim, Ren Wang, Nam Sung Kim, "IDIO: Network-driven, inbound network data orchestration on server processors," MICRO 2022 [paper][slides]
Ki-Dong Kang, Gyeongseo Park, Hyosang Kim, Mohammad Alian, Nam Sung Kim, and Daehoon Kim, "NMAP: Power Management Based on Network Packet Processing Mode Transition for Latency-Critical Workloads," MICRO 2021 [paper]
Yifan Yuan, Mohammad Alian, Yipeng Wang, Ilia Kurakin, Ren Wang, Charlie Tai, Nam Sung Kim, "Don't Forget the I/O When Allocating Your LLC," ISCA 2021 [technology adapted by Intel®] [paper][slides]
Mohammad Alian, Jongmin Shin, Ki-Dong Kang, Ren Wang, Alexandros Daglis, Daehoon Kim, Nam Sung Kim, "IDIO: Orchestrating Inbound Network Data on Server Processors," IEEE Computer Architecture Letters (CAL) 2020 [paper]
Soroush Ghodrati, Byung Hoon Ahn, Joon Kyung Kim, Sean Kinzer, Brahmendra Yatham, Navateja Alla, Hardik Sharma, Mohammad Alian, Eiman Ebrahimi, Nam Sung Kim, Cliff Young, Hadi Esmaeilzadeh, "Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks," MICRO 2020 [paper][slides]
Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, et al. "The gem5 simulator: Version 20.0+," arXiv preprint 2020 [paper]
Mohammad Alian, Yifan Yuan, Jie Zhang, Ren Wang, Myoungsoo Jung, and Nam Sung Kim, "Data direct I/O characterization for future I/O system exploration," ISPASS 2020 [paper][slides]
Mohammad Alian, and Nam Sung Kim, "NetDIMM: Low-latency, near-memory network interface architecture," MICRO 2019 [paper][slides]
Mohammad Alian, Seung Won Min, Hadi Asgharimoghaddam, Ashutosh Dhar, Dong Kai Wang, Thomas Roewer, Adam McPadden, Oliver OHalloran, Deming Chen, Jinjun Xiong, Daehoon Kim, Wen-mei Hwu, and Nam Sung Kim, "Application-transparent near-memory processing architecture with memory vhannel network," MICRO 2018 [best paper nominee][industry product] [paper][slides]
Youjie Li, Jongsea Park, Mohammad Alian, Yifan Yuan, Qu Zheng, Petian Pan, Ren Wang, Alexander Gerhard Schwing, Hadi Esmaeilzadeh, and Nam Sung Kim, "A network-centric hardware/argorithm co-design to accelerate distributed training of deep neural networks," MICRO 2018 [hardware prototype demonstration] [paper][slides]
Mohammad Alian, Krishna Parasuram Srinivasan, and Nam Sung Kim, "Simulating PCI-Express interconnect for future system exploration," IISWC 2018 [best paper nominee] [paper][slides]
Mohammad Alian, Gabor Dozsa, Umur Darbaz, Stephan Diestelhorst, Daehoon Kim, and Nam Sung Kim, "dist-gem5: Distributed simulation of computer clusters," ISPASS 2017 [best paper nominee][open source] [paper][slides]
Mohammad Alian, Ahmed Abulila, Lokesh Jindal, Daehoon Kim, and Nam Sung Kim, "NCAP: Network-driven, packet context-aware power management for client-server architecture," HPCA 2017 [best paper nominee][IEEE Micro honerable mention] [paper][slides]

Current Members

Mohammad Alian		Principal Investigator, Assistant Professor
Jinkwon Kim		PostDoc
Neel Maulik Patel		PhD Student
Derrick Quinn		PhD Student
Mohammad Nouri		PhD Student
Egor Glukhov		PhD Student
Ethan Berkley		Undergrad Research Assistant
Kian Mahmoodi		Undergrad Research Assistant
Nikhil Sampath		Undergrad Research Assistant

Past Members

Alex Manley		MS. Graduate (KU)		2025		First Employment: NVIDIA
Johnson Umeike		MS. Graduate (KU)		2024		First Employment: Ph.D. student at the University of Maryland

Tools

Research Sponsors

National Science Foundation
Semiconductor Research Corporation
Samsung Electronics
NVIDIA (Equipment Donation)
Ampere Computing (Equipment Donation)