Spaces:

visualisable-ai
/

api

Running on CPU Upgrade

gary-boon Claude Opus 4.5 commited on 7 days ago

Commit

5333b21

1 Parent(s): ba27c0c

fix: Use eager attention for output_attentions support

Devstral defaults to SDPA (flash attention) which doesn't support
output_attentions=True. Force eager attention to enable capturing
attention weights for visualization.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>

Files changed (1) hide show

backend/model_service.py +3 -1

backend/model_service.py CHANGED Viewed

@@ -189,11 +189,13 @@ class ModelManager:
             logger.info(f"  Max context: {self.max_context}, Batch size: {self.batch_size}")
             # Load model with configured dtype
             self.model = AutoModelForCausalLM.from_pretrained(
                 self.model_name,
                 torch_dtype=self.dtype,
                 low_cpu_mem_usage=True,
-                trust_remote_code=True
             ).to(self.device)
             # Load tokenizer

             logger.info(f"  Max context: {self.max_context}, Batch size: {self.batch_size}")
             # Load model with configured dtype
+            # Use eager attention to support output_attentions=True for visualization
             self.model = AutoModelForCausalLM.from_pretrained(
                 self.model_name,
                 torch_dtype=self.dtype,
                 low_cpu_mem_usage=True,
+                trust_remote_code=True,
+                attn_implementation="eager"
             ).to(self.device)
             # Load tokenizer