Efficient AI

Quantization, pruning, and knowledge distillation

Research Description

Quantization is a technique used to reduce the precision of the numbers used to represent a model’s parameters or input activations. In neural networks, weights and activations are typically stored as 32-bit floating-point numbers. Quantization reduces this precision to 16-bit, 8-bit, or even lower, thereby reducing the model’s size and increasing its inference speed. We are also looking for an opportunity to make quantization hardware-friendly as many existing quantization methods require specialization in hardware resource.

Pruning involves removing less important or redundant weights from a neural network. The goal is to reduce the size of the model and improve its efficiency without significantly sacrificing accuracy. Pruning helps in reducing the model’s computational requirements and memory footprint, making it faster and more efficient. Structured pruning is our special interest as it exploits underlying hardware architecture to improve pruning efficacy.

Knowledge distillation is a process where a smaller, simpler model (student model) is trained to mimic the behavior of a larger, more complex model (teacher model). This technique allows the student model to achieve performance close to that of the teacher model while being much more compact and efficient, making it suitable for deployment in scenarios with limited computational resources.

Your Job

  • Understanding of the above concepts.
  • Understanding of interaction between these methods and computing hardware.
  • Devising a hardware-friendly methodology.

Related Papers:

  1. SCIE
    Q-LAtte: An Efficient and Versatile LSTM Model for Quantized Attention-Based Time Series Forecasting in Building Energy Applications
    Jieui Kang ,  Jihye Park ,  Soeun Choi , and Jaehyeong Sim
    IEEE Access, vol.12, pp.69325-69341, 2024
  2. Top-Tier
    eSRCNN: A Framework for Optimizing Super-Resolution Tasks on Diverse Embedded CNN Accelerators
    Youngbeom Jung , Yeongjae Choi , Jaehyeong Sim, and Lee-Sup Kim
    In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)