Commit History

Store matrices as numpy arrays instead of Python lists
f2e89c2

gary-boon Claude Opus 4.5 commited on

Add per-step memory cleanup for large model support
a94eb19

gary-boon Claude Opus 4.5 commited on

Fix RAM exhaustion for large token generation
959074d

gary-boon Claude Opus 4.5 commited on

feat: add auto_complete parameter for token generation
bb689ce

gary-boon Claude Opus 4.5 commited on

fix: add QKV extraction support for Mistral/Devstral architecture
d1d37a8

gary-boon Claude Opus 4.5 commited on

feat: implement lazy-loading for attention matrices
929ba88

gary-boon Claude Opus 4.5 commited on

Add avg_entropy calculation for attention heads
66a46b6

gary-boon Claude Opus 4.5 commited on

Revert QKV visualization fixes - need better approach for data streaming
d0b7e29

gary-boon Claude Opus 4.5 commited on

Add safety checks for missing QKV keys
a79cb83

gary-boon Claude Opus 4.5 commited on

Limit QKV matrices to top 5 heads per layer to reduce response size
decb5ab

gary-boon Claude Opus 4.5 commited on

Fix QKV matrix extraction for Mistral/Devstral architecture
9056859

gary-boon Claude Opus 4.5 commited on

Fix QKV visualization for Mistral/Devstral architecture
4ec134b

gary-boon Claude Opus 4.5 commited on

Add future considerations doc for response size optimization
3e67ea2

gary-boon Claude Opus 4.5 commited on

Fix: Import time module at top level for SSE events
15a862b

gary-boon Claude Opus 4.5 commited on

Add SSE streaming endpoint for real-time analysis progress
172a186

gary-boon Claude Opus 4.5 commited on

feat: Include token metadata in analysis response
ee0f6c9

gary-boon Claude Opus 4.5 commited on

feat: Implement tier-based model filtering by device type
6bf9f5c

gary-boon Claude Opus 4.5 commited on

Fix: Add attn_implementation="eager" to model switch function
f94a7ae

gary-boon Claude Opus 4.5 commited on

Add Phase 5: Performance optimizations to phased plan
383a328

gary-boon Claude Opus 4.5 commited on

Add tokenSections boundaries and update system prompt
c6f4cc5

gary-boon Claude Opus 4.5 commited on

Fix: Handle MistralCommonTokenizer pad_token setter
e20ccaf

gary-boon Claude Opus 4.5 commited on

Integrate mistral-common for correct Devstral tokenization
ed06dcb

gary-boon Claude Opus 4.5 commited on

Remove mistral_common to fix dependency conflict
3d9d9ee

gary-boon Claude Opus 4.5 commited on

Use mistral_common for proper Devstral prompt formatting
3e80769

gary-boon Claude Opus 4.5 commited on

Add system prompt support for instruction-tuned models
2860768

gary-boon Claude Opus 4.5 commited on

fix: Simpler prompt format and temperature=0 for Devstral
76020ee

gary-boon Claude Opus 4.5 commited on

fix: Sanitize JSON response for NaN/Inf float values
99f6209

gary-boon Claude Opus 4.5 commited on

fix: Check chat_template is set before using apply_chat_template
474927d

gary-boon Claude Opus 4.5 commited on

fix: Add chat template support for Devstral instruct model
8d85da8

gary-boon Claude Opus 4.5 commited on

fix: Convert bfloat16 to float32 for numpy compatibility
cb6f39c

gary-boon Claude Opus 4.5 commited on

fix: Use eager attention for output_attentions support
5333b21

gary-boon Claude Opus 4.5 commited on

fix: Skip heavy ML deps in CI security checks
ba27c0c

gary-boon Claude Opus 4.5 commited on

fix: Update torch to 2.3+ for transformers compatibility
1b73605

gary-boon Claude Opus 4.5 commited on

fix: Update transformers for Devstral support
b788304

gary-boon Claude Opus 4.5 commited on

docs: Mark GPU HF Space Devstral deployment complete
65c6e2e

gary-boon Claude Opus 4.5 commited on

docs: Update phased plan with Phase 2/2b/2c completion status
688efad

gary-boon Claude Opus 4.5 commited on

Add vocabSize to modelInfo response
499afba

gary-boon Claude Opus 4.5 commited on

Update .env.spark.example: TORCH_DTYPE now auto-detected
543454f

gary-boon Claude Opus 4.5 commited on

Add recommended_dtype to model configs
62525b2

gary-boon Claude Opus 4.5 commited on

Phase 2: Add Devstral backend support
9080f28

gary-boon Claude Opus 4.5 commited on

Update plan: Phase 1 paused due to GB10 GPU support
e694533

gary-boon Claude Opus 4.5 commited on

Add DEVICE env var to force CPU mode on DGX Spark
5f122aa

gary-boon Claude Opus 4.5 commited on

Use NGC PyTorch 24.08 for Python 3.10 compatibility
a2875a2

gary-boon Claude Opus 4.5 commited on

Use NVIDIA NGC PyTorch container for GB10 support
a4cfbff

gary-boon Claude Opus 4.5 commited on

Try PyTorch nightly for GB10/sm_121 GPU support
a009a49

gary-boon Claude Opus 4.5 commited on

Make zarr/numcodecs imports optional for ARM64 compatibility
6435a75

gary-boon Claude Opus 4.5 commited on

Skip zarr/numcodecs in Spark build (ARM64 incompatible)
d129e37

gary-boon Claude Opus 4.5 commited on

Fix numcodecs ARM64 compatibility in Dockerfile.spark
772fc80

gary-boon Claude Opus 4.5 commited on

Fix Dockerfile.spark for CUDA 13.0 compatibility
a4927aa

gary-boon Claude Opus 4.5 commited on

Fix Dockerfile.spark for ARM64 architecture (DGX Spark)
9d00d33

gary-boon Claude Opus 4.5 commited on