• Home
  • CV
  • View photo


Harvard University August 2016 - Present

PhD Computer Science

I am a PhD student studying Computer Science in the Harvard Architecture, Circuits and Compilers group at Harvard University working with Professor David Brooks and Professor Gu-Yeon Wei.

My research is focused on hardware specialization for deep learning by co-designing solutions across the computing stack. This includes designing hardware accelerators for sparse RNNs, building and integrating DNN accelerator in a 16nm mobile SoC, lossy compression techniques for DNNs, and quanitfying the fault tolerance of DNNs.

More recently, my research focuses on optimizing the at-scale deployment of deep learning based personalized recommendation . This work includes in-depth analysis and characterization of the architectural implications of recommendation models across production-scale datacenters. Building on this characterization we optimize at-scale recommendation model inference.

Cornell University September 2012 - May 2016

Bachelor of Science, GPA 4.00
Major: Electrical and Computer Engineering
Minor: Computer Science

As an undergraduate student at Cornell University, I worked with Professor Zhiru Zhang on improving the programmability, performance and energy efficiency of heterogeneous systems. My research explored software-programmable FPGAs by leveraging intelligent design-automation tools and evaluating high-level synthesis compilers targeting FPGAs. I was also an active member of Cornell's Eta Kappa Nu (HKN) and president of Cornell's IEEE chapter.

Select publications

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference
Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S. Lee, David Brooks, Carole-Jean Wu

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing
Liu Ke, Udit Gupta, Carole-Jean Wu, Benjamin Youngjae Cho, Mark Hempstead, Brandon Reagen, Xuan Zhang, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Meng Li, Bert Maher, Dheevatsa Mudigere, Maxim Naumov, Martin Schatz, Mikhail Smelyanskiy, Xiaodong Wang

Architectural Implications of Facebook’s DNN-based Personalized Recommendation
Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Andrey Malevich, Dheevatsa Mudigere, Mikhail Smelyanskiy, Liang Xiong, Xuan Zhang
To appear in IEEE International Symposium on High-Performance Computer Architecture (HPCA 2019)

Deep Learning Recommendation Model for Personalization and Recommendation Systems
Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta , Carole-Jean Wu, Alisson G Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii, Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko, Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong, Misha Smelyanskiy

MASR: A Modular Accelerator for Sparse RNNs
Udit Gupta, Brandon Reagen, Lillian Pentecost, Marco Donato, Thierry Tambe, Alexander Rush, Gu-Yeon Wei, David Brooks
Parallel Architectures and Compilation Techniques (PACT 2019)
Best Paper Nominee
[PDF], [Slides], [ArXiv]

Other publications

Conference Publications

MLPerf training benchmark
Peter Mattson, et. al.

MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation
Lillian Pentecost, Marco Donato, Brandon Reagen, Udit Gupta, Siming Ma, Gu-Yeon Wei, and David Brooks
IEEE/ACM International Symposium on Microarchitecture (MICRO 2019)

A 16nm 25mm2 SoC with a 54.5× Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53, to eFPGA, and Cache-Coherent Accelerators
Paul Whatmough, Sae Kyu Lee, Marco Donato, Hsea-Ching Hseuh, Sam Xi, Udit Gupta, Lillian Pentecost, Glenn Ko, David Brooks, and Gu-Yeon Wei.
Symposia on VLSI Technology and Circuits (VLSI 2019)

SMIV: A 16nm SoC with Efficient and Flexible DNN Acceleration for Intelligent IoT Devices.
Paul Whatmough, Sae Kyu Lee, Sam Xi, Udit Gupta, Lillian Pentecost, Marco Donato, Hsea-Ching Hseuh, David Brooks, and Gu-Yeon Wei.
30th Hot CHips (Hot Chips 2018)

Weightless: Lossy Weight Encoding for Deep Neural Network Compression
Brandon Reagen, Udit Gupta Robert Adolf, Michael Mitzenmacher, Alexander Rush, Gu-Yeon Wei, David Brooks
35th International Conference on Machine Learning (ICML 2018)

Ares: A Framework for Quanitfying the Resilience of Deep Neural Networks
Brandon Reagen, Udit Gupta , Lillian Pentecost, Paul Whatmough, Sae Kyu Lee, Niamh Mulholland, Gu-Yeon Wei, David Brooks
55th Design Automation Conference (DAC 2018)
Best Paper Nominee

On-chip Deep Neural Network Storage with Multi-level eNVM
Marco Donato, Brandon Reagen, Udit Gupta , Lillian Pentecost, David Brooks, Gu-Yeon Wei
55th Design Automation Conference (DAC 2018)

Rosetta: A Realistic Benchmark Suite for Software Programmable FPGAs
Yuan Zhou, Udit Gupta, Steve Dai, Ritchie Zhao, Nitish Srivastava, Hanchen Jin, Joseph Featherston, Yi-Hsiang Lai, Gai Liu, Gustavo Velasquez, Wenping Wang, Zhiru Zhang
International Symposium on Field-Programmable Gate Arrays (FPGA 2018).

Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis
Steve Dai, Ritchie Zhao, Gai Liu, Shreesha Srinath, Udit Gupta, Christopher Batten and Zhiru Zhang.
International Symposium on Field-Programmable Gate Arrays (FPGA 2017).

Mapping-Aware Constrained Scheduling for LUT-Based FPGAs
Mingxing Tan, Steve Dai, Udit Gupta, and Zhiru Zhang.
International Symposium on Field-Programmable Gate Arrays (FPGA 2015).

Tehcnical Articles

Deep Learning: It’s Not All About Recognizing Cats and Dogs
Carole-Jean Wu, David Brooks, Udit Gupta , Hsien-Hsin Lee, and Kim Hazelwood
ACM SIGARCH, Computer Architecture Today

Designing AI-Enabled Technology for Society
Udit Gupta, Lillian Pentecost
Harvard SITN, October 2018

Software-Programmable FPGAs
Udit Gupta
Circuit Cellar ("Tech the Future" series), July 2016

Professional Experience

Facebook, Inc. September 2018 - Present

AI Infrastructure Research Intern

Analyzing, characterizing, and optimizing the at-scale deep-learning based personalized recommendation systems.

Algo Logic Systems Inc. May 2015 - August 2015

Hardware Design and Verification Engineer

Designed OpenCL board support package for clients to develop and integrate software kernels with existing low latency network IP for the Tick-to-Trade system. Developed software interface for configuring FPGA and OpenCL financial data parsers and trading algorithms.

Teaching Experience

Harvard CS 141 Spring 2019
EdX MOOC: The Computing Inside Your Smart Phone Summer 2014
Cornell ECE 2300: Introduction to Digital Logic and Computer Organization Spring & Fall 2014, Spring 2015
Cornell CS 3420 / ECE 3140: Embedded Systems Spring 2016

Honors and Awards

Harvard Smith Family Fellowship 2017
NSF GRFP Honorable Mention 2016
Richard A. Newton Young Fellow Scholarship 2015
Cornell ECE Early Research Career Scholarship 2013
Cornell Eta Kappa Nu - Electrical Engineering Honor Society 2013 - 2016

Professional Activities

Harvard SITN Blog Editor 2018 - 2019
Harvard SITN Lecture Director 2018 - 2019
Cornell IEEE Corporate Director 2013 - 2015
Cornell IEEE President 2015 - 2016
Cornell Eta Kappa Nu (HKN) 2013 - 2016

© 2015 Curriculum Vitae All Rights Reseverd | Design by W3layouts