ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
Paper • 2402.13516 • Published • 1
How to use Tiiny/prosparse-llama-2-13b-gguf with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("feature-extraction", model="Tiiny/prosparse-llama-2-13b-gguf", trust_remote_code=True) # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Tiiny/prosparse-llama-2-13b-gguf", trust_remote_code=True, dtype="auto")This model is the downstream distribution of SparseLLM/ProSparse-LLaMA-2-13B in PowerInfer GGUF format consisting of the LLM model weights and predictor weights.
Please kindly cite using the following BibTeX:
@article{song2024prosparse,
title={{ProSparse}: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models},
author={Song, Chenyang and Han, Xu and Zhang, Zhengyan and Hu, Shengding and Shi, Xiyu and Li, Kuai and Chen, Chen and Liu, Zhiyuan and Li, Guangli and Yang, Tao and Sun, Maosong},
year={2024},
journal={arXiv preprint arXiv:2402.13516},
url={https://arxiv.org/pdf/2402.13516.pdf}
}
We're not able to determine the quantization variants.