Harvard University August 2016 - Present
PhD Computer Science
I am a PhD student studying Computer Science in the Harvard Architecture, Circuits and Compilers group at Harvard University working with Professor David Brooks and Professor Gu-Yeon Wei.
My research is focused on hardware specialization for deep learning by co-designing solutions across the computing stack. This includes designing hardware accelerators for sparse RNNs, building and integrating DNN accelerator in a 16nm mobile SoC, lossy compression techniques for DNNs, and quanitfying the fault tolerance of DNNs.
More recently, my research focuses on optimizing the at-scale deployment of deep learning based personalized recommendation . This work includes in-depth analysis and characterization of the architectural implications of recommendation models across production-scale datacenters. Building on this characterization we optimize at-scale recommendation model inference.
Cornell University September 2012 - May 2016
Bachelor of Science, GPA 4.00
Major: Electrical and Computer Engineering
Minor: Computer Science
As an undergraduate student at Cornell University, I worked with Professor Zhiru Zhang on improving the programmability, performance and energy efficiency of heterogeneous systems. My research explored software-programmable FPGAs by leveraging intelligent design-automation tools and evaluating high-level synthesis compilers targeting FPGAs. I was also an active member of Cornell's Eta Kappa Nu (HKN) and president of Cornell's IEEE chapter.
DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference
Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S. Lee, David Brooks, Carole-Jean Wu
RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing
Liu Ke, Udit Gupta, Carole-Jean Wu, Benjamin Youngjae Cho, Mark Hempstead, Brandon Reagen, Xuan Zhang, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Meng Li, Bert Maher, Dheevatsa Mudigere, Maxim Naumov, Martin Schatz, Mikhail Smelyanskiy, Xiaodong Wang
Architectural Implications of Facebook’s DNN-based Personalized Recommendation
Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Andrey Malevich, Dheevatsa Mudigere, Mikhail Smelyanskiy, Liang Xiong, Xuan Zhang
To appear in IEEE International Symposium on High-Performance Computer Architecture (HPCA 2019)
Deep Learning Recommendation Model for Personalization and Recommendation Systems
Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta , Carole-Jean Wu, Alisson G Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii, Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko, Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong, Misha Smelyanskiy
MASR: A Modular Accelerator for Sparse RNNs
Udit Gupta, Brandon Reagen, Lillian Pentecost, Marco Donato, Thierry Tambe, Alexander Rush, Gu-Yeon Wei, David Brooks
Parallel Architectures and Compilation Techniques (PACT 2019)
Best Paper Nominee
[PDF], [Slides], [ArXiv]
MLPerf training benchmark
Peter Mattson, et. al.
MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation
Lillian Pentecost, Marco Donato, Brandon Reagen, Udit Gupta, Siming Ma, Gu-Yeon Wei, and David Brooks
IEEE/ACM International Symposium on Microarchitecture (MICRO 2019)
A 16nm 25mm2 SoC with a 54.5× Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53, to eFPGA, and Cache-Coherent Accelerators
Paul Whatmough, Sae Kyu Lee, Marco Donato, Hsea-Ching Hseuh, Sam Xi, Udit Gupta, Lillian Pentecost, Glenn Ko, David Brooks, and Gu-Yeon Wei.
Symposia on VLSI Technology and Circuits (VLSI 2019)
SMIV: A 16nm SoC with Efficient and Flexible DNN Acceleration for Intelligent IoT Devices.
Paul Whatmough, Sae Kyu Lee, Sam Xi, Udit Gupta, Lillian Pentecost, Marco Donato, Hsea-Ching Hseuh, David Brooks, and Gu-Yeon Wei.
30th Hot CHips (Hot Chips 2018)
Weightless: Lossy Weight Encoding for Deep Neural Network Compression
Brandon Reagen, Udit Gupta Robert Adolf, Michael Mitzenmacher, Alexander Rush, Gu-Yeon Wei, David Brooks
35th International Conference on Machine Learning (ICML 2018)
Ares: A Framework for Quanitfying the Resilience of Deep Neural Networks
Brandon Reagen, Udit Gupta , Lillian Pentecost, Paul Whatmough, Sae Kyu Lee, Niamh Mulholland, Gu-Yeon Wei, David Brooks
55th Design Automation Conference (DAC 2018)
Best Paper Nominee
On-chip Deep Neural Network Storage with Multi-level eNVM
Marco Donato, Brandon Reagen, Udit Gupta , Lillian Pentecost, David Brooks, Gu-Yeon Wei
55th Design Automation Conference (DAC 2018)
Rosetta: A Realistic Benchmark Suite for Software Programmable FPGAs
Yuan Zhou, Udit Gupta, Steve Dai, Ritchie Zhao, Nitish Srivastava, Hanchen Jin, Joseph Featherston, Yi-Hsiang Lai, Gai Liu, Gustavo Velasquez, Wenping Wang, Zhiru Zhang
International Symposium on Field-Programmable Gate Arrays (FPGA 2018).
Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis
Steve Dai, Ritchie Zhao, Gai Liu, Shreesha Srinath, Udit Gupta, Christopher Batten and Zhiru Zhang.
International Symposium on Field-Programmable Gate Arrays (FPGA 2017).
Mapping-Aware Constrained Scheduling for LUT-Based FPGAs
Mingxing Tan, Steve Dai, Udit Gupta, and Zhiru Zhang.
International Symposium on Field-Programmable Gate Arrays (FPGA 2015).
Deep Learning: It’s Not All About Recognizing Cats and Dogs
Carole-Jean Wu, David Brooks, Udit Gupta , Hsien-Hsin Lee, and Kim Hazelwood
ACM SIGARCH, Computer Architecture Today
Designing AI-Enabled Technology for Society
Udit Gupta, Lillian Pentecost
Harvard SITN, October 2018
Circuit Cellar ("Tech the Future" series), July 2016
Facebook, Inc. September 2018 - Present
AI Infrastructure Research Intern
Analyzing, characterizing, and optimizing the at-scale deep-learning based personalized recommendation systems.
Algo Logic Systems Inc. May 2015 - August 2015
Hardware Design and Verification Engineer
Designed OpenCL board support package for clients to develop and integrate software kernels with existing low latency network IP for the Tick-to-Trade system. Developed software interface for configuring FPGA and OpenCL financial data parsers and trading algorithms.
|Harvard CS 141||Spring 2019|
|EdX MOOC: The Computing Inside Your Smart Phone||Summer 2014|
|Cornell ECE 2300: Introduction to Digital Logic and Computer Organization||Spring & Fall 2014, Spring 2015|
|Cornell CS 3420 / ECE 3140: Embedded Systems||Spring 2016|
Honors and Awards
|Harvard Smith Family Fellowship||2017|
|NSF GRFP Honorable Mention||2016|
|Richard A. Newton Young Fellow Scholarship||2015|
|Cornell ECE Early Research Career Scholarship||2013|
|Cornell Eta Kappa Nu - Electrical Engineering Honor Society||2013 - 2016|
|Harvard SITN Blog Editor||2018 - 2019|
|Harvard SITN Lecture Director||2018 - 2019|
|Cornell IEEE Corporate Director||2013 - 2015|
|Cornell IEEE President||2015 - 2016|
|Cornell Eta Kappa Nu (HKN)||2013 - 2016|
© 2015 Curriculum Vitae All Rights Reseverd | Design by W3layouts