Spaces:
Running
Running
Commit History
vulkan: mutex around vkQueueSubmit (llama/14127)
ef3a7d0
vulkan: Better thread-safety for command pools/buffers (llama/14116)
fdc26e7
vulkan: Track descriptor pools/sets per-context (llama/14109)
855a3bf
Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (llama/14099)
dcb106f
vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (llama/14001)
e5107fe
vulkan: automatically deduce size of push constants (llama/13936)
00a9e2f
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (llama/13813)
32985b0
vulkan: fix warnings in perf logger querypool code (llama/13937)
11bac96
vulkan: use timestamp queries for GGML_VULKAN_PERF (llama/13817)
56ddc5b
vulkan : Remove unexpected ; (ggml/1253)
c4be6fb
Kai Pastor
commited on
vulkan: mark IM2COL as supporting non-contig (llama/13783)
09c03ad
vulkan: support CPY from any type to itself (llama/13695)
f5f766b
vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't support it (llama/13696)
69679f5
use LOG_WARN to replace `std::cerr` (llama/13657)
6975ec2
Judd
commited on
vulkan: fix warnings (llama/13626)
8602d10
Eve
commited on
Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 32B incoherence (llama/13607)
dfa38af
vulkan: use scalar FA rather than coopmat2 when N==1 (llama/13554)
97d9aa6
vulkan: KHR_coopmat flash attention (llama/13506)
4d1bd4f
vulkan: scalar flash attention implementation (llama/13324)
3331abd
vulkan: Allow up to 4096 elements for mul_mat_id row_ids (llama/13326)
53f8fee
vulkan: Additional type support for unary, binary, and copy (llama/13266)
b9cb11e
vulkan: Add bfloat16 support (llama/12554)
b21f8a1
vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul shader (llama/13191)
710fdcf
vulkan : kernels for depthwise 2D convolution (CONV_2D_DW) (ggml/1204)
43d9f3e
vulkan: matmul gcn tuning (llama/13016)
ac537d2
vulkan: support noncontiguous rms_norm (llama/13031)
e4d1f59
graph : make FA compatible with MLA + add initial Metal kernels (llama/12953)
fb0d243
vulkan: enable coopmat2 FA gqa and split_k optimizations more often (llama/12931)
f844153
vulkan: In coopmat2 mmq, load q4_k/q5_k scales through shared memory (llama/12833)
4b7a407
ggml : add bilinear upscale support (ggml/1185)
4c5e449
Diego Devesa
commited on
vulkan: Use unclamped loads for flash attention mask (llama/12720)
a76ef69
Vulkan: Tune Vulkan mmq int dot shader for performance (llama/12767)
b3bf710
vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency (llama/12630)
ee422be
vulkan: Implement split_k for coopmat2 flash attention. (llama/12627)
5ab06d6
vulkan: Implement grouped query attention in the coopmat2 FA shader (llama/12559)
e7bebe6
vulkan: fix build when glslc doesn't support coopmat (llama/12683)
f91eb88
Wagner Bruna
commited on
Vulkan: Add DP4A MMQ and Q8_1 quantization shader (llama/12135)
06ec111
metal : improve FA + improve MoE (llama/12612)
04a3389
vulkan: Optimize mul_mat_vec p021 and nc shaders (llama/12505)
6868981
Vulkan: RTE rounding for cpy to quant (llama/12480)
8707beb
vulkan: Submit once enough matmul work has been recorded (llama/12406)
ec77b2c
Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver issues (llama/12434)
55088d3
llama: Add support for RWKV v7 architecture (llama/12412)
727de7e
vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (llama/12312)
c9f86c1
vulkan: subgroup size tuning (llama/12087)
af63c3d
vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking (llama/12273)
5d51f1c
vulkan: Adjust coopmat2 tile sizes and selection heuristic (llama/12258)
3cc6539
vulkan : sync (llama/0)
4c17fa1
ggml : upgrade init_tensor API to return a ggml_status (llama/11854)
d6b6852
William Tambellini
slaren
commited on