ggml : fix fallback to CPU for ununsupported ops (llama/15118) 2b7ae5e Diego Devesa commited on Aug 6
vulkan: fix build when using glslang that does not support coopmat2 (llama/15062) 863e083 jeffbolznv commited on Aug 4
CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (llama/15035) 9e85264 JohannesGaessler commited on Aug 2
cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (llama/15038) cc3a2ed ggerganov commited on Aug 2
vulkan: Support ne[3]>1 in noncontig matrix-vector multiply (llama/15015) d4c4115 jeffbolznv commited on Aug 2
vulkan: optimizations for direct convolution (llama/14933) 215f463 jeffbolznv OccamRazor commited on Aug 2
CUDA: fix MMQ nwarps for AMD with warp_size==32 (llama/15014) fbc3cd1 JohannesGaessler commited on Aug 1
ggml : Q2k interleaving implementation - x86/x64 SIMD (llama/14373) e2965b0 Srihari-mcw Manognasree commited on Aug 1
docker : add cann build pipline (llama/14591) 2d993ad diannao ggerganov Xuan-Son Nguyen commited on Aug 1
CANN: Improve loading efficiency after converting weights to NZ format. (llama/14985) 7612978 hipudding commited on Jul 31
opencl: add `mul_mat_f32_f32_l4_lm` and `mul_mat_f16_f32_l4_lm` (llama/14809) 05577c3 lhez commited on Jul 30
HIP: enable mfma mmq on gfx908 and gfx90a for select datatypes and shapes (llama/14949) 149f5a5 uvos commited on Jul 30
CUDA: skip masked KV slices for all FA kernels (llama/14924) 0c60f80 JohannesGaessler commited on Jul 30
HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (llama/14945) e37eff3 uvos commited on Jul 29
HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (llama/14930) f9dbd96 uvos commited on Jul 29
HIP: Ignore unsupported unroll transformation in fattn-vec (llama/14931) 8e133f7 uvos commited on Jul 29
SYCL: Add set_rows support for quantized types (llama/14883) c55b72b Akarshan Biswas commited on Jul 28
whisper : fixed crash in GPU device selection on multi-GPU systems (#3372) 0869200 unverified Dw9 commited on Aug 12
whisper : reset conv scheduler when CoreML is used (#3350) f425556 unverified ggerganov commited on Jul 30
vulkan : add fp16 support for the conv_2d kernel (llama/14872) 48e92ad Erik Scholz commited on Jul 27
vulkan: skip empty set_rows to avoid invalid API usage (llama/14860) 22fb24a jeffbolznv commited on Jul 27
HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 (llama/14624) 5422b31 deepsek commited on Jul 26