LLM Pruning with Elastic Net Enhanced Wanda Strategy
DOI:
https://doi.org/10.32473/flairs.39.1.141693Keywords:
Large Language Models, Model Pruning, Training-Free Pruning, Wanda, Elastic Net, Sparsity, PerplexityAbstract
Large language models (LLMs) have an enormous size, and updating gradient computation or fine-tuning requires high computational costs. Due to these costs, this study suggests that LLMs can be pruned without retraining while keeping a comparable performance with the unpruned model. This study investigates implementing Wanda, one-shot, row-wise pruning that ranks weights by the product of weight magnitude and the L2 norm of the corresponding activations, and replaces its activation scale with an Elastic Net (EN) combination of L1 and L2 to calculate weight importance scores. This study investigates two geometries: (i) EN-Original (squared L2) and (ii) EN-Modified (unsquared L2). This research prunes LLaMA-2-7B using WikiText-103 activations and evaluates both validation perplexity (seven fixed slices) and zero-shot accuracy on BoolQ, HellaSwag, and PhishingDetect. At 50% sparsity with a small calibration dataset size (CALIB_MULT=1), EN-Modified shows lower or equal perplexity than EN-Original for all α ∈ {0, 0.25, 0.5, 0.75, 1} (e.g., 7.24 vs. 8.29 at α=0; parity at α=1) while preserving comparable macro mean zero-shot accuracy. The unpruned model baseline attains 64.0% accuracy; the best pruned models reach 62.67% at s=0.50. Overall, replacing Wanda’s pure L2 scale with both of the EN expressions improves robustness to activation statistics and delivers a better perplexity–accuracy trade-off at moderate sparsity, with no gradients or retraining.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Ayman Gharaibeh, William B. Glisson, Xiyuan Liu, Majd Z. Tahat

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.