- Article
Mitigating Quantization Errors Due to Activation Spikes in Gated Linear Unit-Based Large Language Models
- Jaewoo Yang,
- Hayun Kim,
- Junyung Ji and
- Younghoon Kim
Modern large language models (LLMs) achieve state-of-the-art performance through architectural advancements but require high computational costs for inference. Post-training quantization is a widely adopted approach to reduce these costs by quantizin...

