As AI models continue to grow in size, they require vast amounts of energy, making sustainable AI unfeasible if current trends persist. Consequently, the importance of robust computing HW/SW infrastructure underpinning AI will become critical.

Our main research goal is to compute AI models in a faster and energy-efficient way through HW/SW co-design. Specifically, our research interests include:

  • Neural Processing Unit (NPU), domain-specific hardware, FPGA
  • Quantization, pruning, and knowledge distillation
  • Hardware-aware neural architecture search (HW-Aware NAS) and neural architecture accelerator search (NAAS)
  • Processing-in-memory (PIM)
  • Efficient LLM serving including KV Caching and other optimizations
  • On-Device AI

Latest Updates

  • Paper

    Our paper "SHARP: Structured Hierarchical Attention Rank Projection for Efficient Language Model Distillation" has been accepted at IEEE Access.

    2026-03-22

  • News

    Yejin Lee continues research in our group by pursuing Ph.D. course (from M.S.).

    2026-03-01

  • News

    Jaelin Lee has joined our group as an undergraduate intern. Welcome!

    2026-03-01

  • News

    Jaeyoung Choi continues research in our group by pursuing M.S. course (from undergraduate).

    2026-03-01

  • News

    Eunkyeol Hong and Yejin Lee has graduated. Congratulations!

    2026-02-23

Recent Publications

  • SHARP: Structured Hierarchical Attention Rank Projection for Efficient Language Model Distillation

    Jieui Kang, Eunjoung Yoo, Soeun Choi, Yeonhui Kim, Jaehyeong Sim

    ACCESS

  • ProgressiveServe: 서버리스 LLM 콜드 스타트 완화를 위한 점진적 모델 로딩 및 복구 기법

    박나담, 이나경, 이주원, 심재형

    KSC2025

  • DS-CAE: a Dual-Stream Cross-Attentive Autoencoder for Robust and Cluster-Aware Retrieval-Augmented Generation

    Soeun Choi, Yejin Lee, Juhee Kim, Minji Kim, Jaehyeong Sim

    CCCI2025

  • GATHER: A Gated-Attention Accelerator for Efficient LLM Inference

    Eunjin Lee, Eunseo Kim, Eunjoung Yoo, Jaehyeong Sim

    ISOCC2025

  • LoRA-PIM: In-Memory Delta-Weight Injection for Multi-Adapter LLM Serving

    Soeun Choi, Jaehyeong Sim

    ISOCC2025