Safetensors
qwen2_5_vl

Enhance model card: Metadata, links, and usage example

#1
by nielsr HF Staff - opened

This PR significantly improves the model card for SpatialThinker-7B by adding crucial metadata, relevant external links, and a practical usage example.

Specifically, it addresses the following:

  • Adds metadata: Sets license: apache-2.0, library_name: transformers (enabling automated code snippets), and pipeline_tag: image-text-to-text (improving discoverability for multimodal tasks).
  • Updates content: Replaces placeholder text with the paper's abstract, a detailed model description, and relevant sections from the GitHub README (Updates, Requirements, Installation, Training, Evaluation, Acknowledgements).
  • Includes links: Adds direct links to the Hugging Face paper page (SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards), the project page (https://hunarbatra.com/SpatialThinker/), and the GitHub repository (https://github.com/hunarbatra/SpatialThinker).
  • Provides a usage example: Adds a clear Python code snippet demonstrating how to load and use the model with the transformers library for image-text inference, derived from common transformers patterns for QwenVL models and the overall context.

These enhancements will make the model more accessible, discoverable, and easier to use for the Hugging Face community.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment