YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

llama2.c-stories15M-pruned50

This repo contains model files for llama2.c 15M tinystories optimized for NM-vLLM, a high-throughput serving engine for compressed LLMs.

This model was pruned with SparseGPT, using llm-compressor.

Sparsification

Install llm-compressor:

pip install llmcompressor

from llmcompressor.transformers import oneshot
from llmcompressor.transformers import SparseAutoModelForCausalLM

hf_model_stub = "Xenova/llama2.c-stories15M"
calibration_dataset = "open_platypus"
output_directory = f"{hf_model_stub.split('/')[-1]}-pruned_50.2of4-uncompressed"

model = SparseAutoModelForCausalLM.from_pretrained(hf_model_stub, torch_dtype="auto", device_map="auto")

recipe = """
test_stage:
  obcq_modifiers:
    SparseGPTModifier:
      sparsity: 0.5
      sequential_update: true
      mask_structure: "2:4"
      targets: ['re:model.layers.\d*$']
"""


oneshot(
    model=model,
    dataset=calibration_dataset,
    recipe=recipe,
    output_dir=output_directory,
)

model.save_pretrained(output_directory, save_compressed=False)

Slack

For further support, and discussions on these models and AI in general, join Neural Magic's Slack Community

Downloads last month: -

Safetensors

Model size

24.4M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for nm-testing/llama2.c-stories15M-pruned_50.2of4-uncompressed

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

Paper • 2301.00774 • Published Jan 2, 2023 • 4