Spaces:
Running
Running
Commit History
remove unnecessary output
84f8c68
Unified return format
e408c7b
fix cmd
6c8a230
fix cmd
a1a7aac
fix build
1b9b118
fix build
d937c3a
fix dockerfile path
03ff3a5
add meta
36ff0ea
chore: track binaries with git-lfs
aa000f7
chore: track binaries with git-lfs
f33d63d
add sync task
46ebeba
Handle negative value in padding (#3389)
6e115ac
unverified
Treboko
commited on
models : update`./models/download-ggml-model.cmd` to allow for tdrz download (#3381)
0b65831
unverified
talk-llama : sync llama.cpp
4321600
sync : ggml
a0af6fc
ggml: Add initial WebGPU backend (llama/14521)
4b3da1d
Reese Levine
commited on
ggml : initial zDNN backend (llama/14975)
6dd510c
common : handle mxfp4 enum
fd4c0e1
ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors (llama/15379)
a575f57
vulkan: disable spirv-opt for bfloat16 shaders (llama/15352)
cf24af7
vulkan: Use larger workgroups for mul_mat_vec when M is small (llama/15355)
054584a
vulkan: support sqrt (llama/15370)
e5406c0
Dong Won Kim
commited on
vulkan: Optimize argsort (llama/15354)
80a188c
vulkan: fuse adds (llama/15252)
ad199b1
vulkan: Support mul_mat_id with f32 accumulators (llama/15337)
41a76e6
vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id (llama/15334)
a6fa78e
OpenCL: add initial FA support (llama/14987)
8ece1ee
opencl: add initial mxfp4 support via mv (llama/15270)
1a0281c
lhez
shawngu-quic
commited on
vulkan : fix out-of-bounds access in argmax kernel (llama/15342)
78a1865
vulkan : fix compile warnings on macos (llama/15340)
e3107ff
ggml: initial IBM zDNN backend (llama/14975)
449e1a4
CUDA: fix negative KV_max values in FA (llama/15321)
6e3a7b6
HIP: Cleanup hipification header (llama/15285)
7cdf9cd
vulkan: perf_logger improvements (llama/15246)
d48d508
ggml: fix ggml_conv_1d_dw bug (ggml/1323)
4496862
cuda : fix GGML_CUDA_GRAPHS=OFF (llama/15300)
59c694d
Sigbjørn Skjæret
commited on
finetune: SGD optimizer, more CLI args (llama/13873)
f585fe7
HIP: bump requirement to rocm 6.1 (llama/15296)
58a3802
ggml : update `ggml_rope_multi` (llama/12665)
b4896dc
ggml : repack block_iq4_nlx8 (llama/14904)
db4407f
CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n (llama/15132)
c768824
ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors over RPC (macOS & others) (llama/15188)
c8284f2
HIP: disable sync warp shuffel operators from clr amd_warp_sync_functions.h (llama/15273)
8fca6dd
sycl: Fix and disable more configurations of mul_mat (llama/15151)
7b868ed
Romain Biessy
commited on
opencl: allow mixed f16/f32 `add` (llama/15140)
345810b
CUDA cmake: add `-lineinfo` for easier debug (llama/15260)
008e169
CANN: GGML_OP_CPY optimization (llama/15070)
73e90ff
Chenguang Li
commited on