Nn-lut: neural approximation of non-linear operations for efficient transformer inference J Yu, J Park, S Park, M Kim, S Lee, DH Lee, J Choi Proceedings of the 59th ACM/IEEE Design Automation Conference, 577-582, 2022 | 31 | 2022 |
Token-scaled logit distillation for ternary weight generative language models M Kim, S Lee, J Lee, S Hong, DS Chang, W Sung, J Choi Advances in Neural Information Processing Systems 36, 2024 | 9 | 2024 |