NPU and Domain-Specific Hardware

Designing Neural Processing Unit (NPU) and domain-specific processors (신경처리장치 및 도메인 특화 프로세서 설계)

Research Description

We design a high-performance, energy-efficient Neural Processing Unit (NPU), a new type of processor dedicated for a wide range of AI workloads. It generally exploits the high degree of parallelism inherent in deep learning algorithms. We are also looking for an opportunity to design domain-specific processors for latest state-of-the-art algorithms.

저희는 다양한 AI 작업에 특화된 고성능, 에너지 효율적인 신경처리장치 (NPU)를 설계합니다. 이 프로세서는 딥러닝 알고리즘에 내재된 높은 수준의 병렬성을 활용합니다. 또한 최신 최첨단 알고리즘을 위한 도메인 특화 프로세서 설계 기회도 모색하고 있습니다.

Your Job

Don’t worry! We don’t fabricate a real silicon (Area of EE, not CS).
Understanding basic computer architecture and digital logics.
Evaluating existing NPUs and improving them.
Designing a novel microarchitecture for NPUs or domain-specific processors.
Studying a software stack (compilers, firmwares, device drivers) for accelerators.
실제 칩을 제작하지는 않습니다 (전기공학 분야, 컴퓨터공학 아님).
기본적인 컴퓨터 아키텍처와 디지털 논리 이해.
기존 NPU를 평가하고 개선하기.
NPU 또는 도메인 특화 프로세서를 위한 새로운 마이크로아키텍처 설계.
가속기를 위한 소프트웨어 스택 (컴파일러, 펌웨어, 디바이스 드라이버) 연구.

Related Papers:

Accepted

GATHER: A Gated-Attention Accelerator for Efficient LLM Inference

Eunjin Lee, Eunseo Kim, Eunjoung Yoo, and Jaehyeong Sim

In 2025 22st International SoC Design Conference (ISOCC)
An Energy-Efficient Hardware Accelerator for On-Device Inference of YOLOX

Kyungmi Kim, Soeun Choi, Eunkyeol Hong, Yoonseo Jang, and Jaehyeong Sim

In 2024 21st International SoC Design Conference (ISOCC)

DOI
BS2: Bit-Serial Architecture Exploiting Weight Bit Sparsity for Efficient Deep Learning Acceleration

Eunseo Kim, Subean Lee, Chaeyun Kim, HaYoung Lim, Jimin Nam, and Jaehyeong Sim

In 2024 21st International SoC Design Conference (ISOCC)

DOI
SCIE

CREMON: Cryptography Embedded on the Convolutional Neural Network Accelerator

Yeongjae Choi, Jaehyeong Sim, and Lee-Sup Kim

IEEE Transactions on Circuits and Systems II: Express Briefs, vol.67, num.12, pp.3337–3341, 2020

DOI
SCIE

An Energy-Efficient Deep Convolutional Neural Network Training Accelerator for In Situ Personalization on Smart Devices

Seungkyu Choi, Jaehyeong Sim, Myeonggu Kang, Yeongjae Choi, Hyeonuk Kim, and Lee-Sup Kim

IEEE Journal of Solid-State Circuits, vol.55, num.10, pp.2691–2702, 2020

DOI
Major

A 47.4 uJ/epoch Trainable Deep Convolutional Neural Network Accelerator for In-Situ Personalization on Smart Devices

Seungkyu Choi, Jaehyeong Sim, Myeonggu Kang, Yeongjae Choi, Hyeonuk Kim, and Lee-Sup Kim

In 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC)

DOI
SCIE

An Energy-Efficient Deep Convolutional Neural Network Inference Processor with Enhanced Output Stationary Dataflow in 65-nm CMOS

Jaehyeong Sim, Somin Lee, and Lee-Sup Kim

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.28, num.1, pp.87–100, 2019

DOI
Major

TrainWare: A Memory Optimized Weight Update Architecture for On-Device Convolutional Neural Network Training

Seungkyu Choi, Jaehyeong Sim, Myeonggu Kang, and Lee-Sup Kim

In 2018 ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED)

DOI
SCIE

Energy-Efficient Design of Processing Element for Convolutional Neural Network

Yeongjae Choi, Dongmyung Bae, Jaehyeong Sim, Seungkyu Choi, Minhye Kim, and Lee-Sup Kim

IEEE Transactions on Circuits and Systems II: Express Briefs, vol.64, num.11, pp.1332–1336, 2017

DOI
Top-Tier

A Kernel Decomposition Architecture for Binary-Weight Convolutional Neural Networks

Hyeonuk Kim, Jaehyeong Sim, Yeongjae Choi, and Lee-Sup Kim

In 2017 IEEE/ACM 54th Annual Design Automation Conference (DAC)

DOI
Top-Tier

A 1.42 TOPS/W Deep Convolutional Neural Network Recognition Processor for Intelligent IoE Systems

Jaehyeong Sim, Jun-Seok Park, Minhye Kim, Dongmyung Bae, Yeongjae Choi, and Lee-Sup Kim

In 2016 IEEE International Solid-State Circuits Conference (ISSCC)

DOI