Hopscotch: Discovering and Skipping Redundancies in Language Models

EMNLP 2025
Mustafa Eyceoz1, Nikhil Shivakumar Nayak1, Hao Wang1, Ligong Han1, Akash Srivastava2
Red Hat AI Innovation1 IBM Core AI2

Abstract

Modern causal language models stack many attention blocks to improve performance, but not all blocks are necessary for every task. We propose Hopscotch, a simple yet effective method that identifies and skips attention blocks with least contributions to a task and adapts to preserve output quality. Hopscotch jointly optimizes which blocks to skip and how to scale the outputs of the remaining layers. By introducing lightweight, trainable scaling parameters to attention and MLP blocks, it mitigates distribution shifts in hidden states caused by removing attention blocks. Hopscotch does not modify model weights or require access to pretraining or instruction-tuning data, and is compatible with existing model compression techniques. When applied to Llama-3.1-8B and Qwen2.5-7B, Hopscotch achieves less than a 2% drop in performance even after skipping four attention blocks.

Method Overview

Hopscotch Method Overview

Hopscotch employs an iterative block selection process to identify redundant attention blocks. The method introduces lightweight scaling parameters to both attention and MLP components, enabling the model to adapt after block removal without modifying the original weights. This approach ensures compatibility with existing compression techniques and requires no access to original pretraining data.

Citation

@misc{eyceoz2025hopscotchdiscoveringskippingredundancies,
    title={Hopscotch: Discovering and Skipping Redundancies in Language Models}, 
    author={Mustafa Eyceoz and Nikhil Shivakumar Nayak and Hao Wang and Ligong Han and Akash Srivastava},
    year={2025},
    eprint={2506.03303},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2506.03303}
}