Redacted
#9 opened 4 days ago
by
pathosethoslogos
Question regarding quantization hardware and modelopt sharding
#8 opened 19 days ago
by
Mario12355
Request: NVFP4 version of MiniMax-M2.5-REAP-139B (to fit on a single RTX 6000 Pro)
14
#7 opened 26 days ago
by
mondovero
VLLM error for kv weight scaling - workaround
7
#6 opened 29 days ago
by
ShaunEvansMD
Thanks for your effort
5
#5 opened 29 days ago
by
darkstar3537
fp8 kv cache
15
#4 opened 30 days ago
by
festr2
KeyError: '110.w1.input_scale' with TRT
2
#3 opened about 1 month ago
by
guanwenyu1995
"w1_weight_scale_2 must match w3_weight_scale_2. Accuracy may be affected."
👍 1
20
#2 opened about 1 month ago
by
zenmagnets
Here's the vLLM recipe I'm using with 2x RTX Pro 6000
👍 3
17
#1 opened about 1 month ago
by
zenmagnets