Spaces:
Running
Running
Commit History
CUDA: deduplicate mmq code (llama/7397) e7b20b1
rpc : track allocated buffers (llama/7411) 925eb7a
Update SYCL upscale operation (llama/7321) 3984ba6
AidanBeltonS commited on
ggml-opencl, llama: using reserve() if count already known (llama/7272) 8325ed5
ggml : add loongarch lsx and lasx support (llama/6454) 9794ea7
junchao-loongson Jinyang He commited on
Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 (llama/7258) cf52931
Srihari-mcw commited on
Vulkan Embedding Fix (llama/7360) 2bfeba3
ggml : fix another case of quants nans (llama/7387) 645c367
slaren commited on
ggml: implement quantized KV cache for FA (llama/7372) aef1b4b
cuda : clear error after buffer allocation failure (llama/7376) b7f6691
slaren commited on
Capture CUDA logging output (llama/7298) 3519475
fraxy-v slaren commited on
android : use "ci-android" branch for CI (llama/7341) ff9d573
CUDA: deduplicate FlashAttention code (llama/7352) 65ab3e8
cuda : add half2 __shfl_xor() for ROCm 5.5 (llama/7263) ad83dfd
Engininja2 commited on
Update and fix Vulkan soft_max and argsort implementations (llama/7237) a0218a3
ggml : fix quants nans when all the group weights are very close to zero (llama/7313) b57bcbc
slaren commited on
CUDA: faster large batch FA without tensor cores (llama/7314) a6d9f2d
rpc : set SO_REUSEADDR for the server socket (llama/7320) 195fe29
ggml-quants, llama : removed excess checks (llama/7274) 142d95e
ggml : rewrite silu and softmax for cpu (llama/7154) c78b872
Justine Tunney commited on
rpc : add command line arg for specifying backend memory b441739
Add support for properly optimized Windows ARM64 builds with LLVM and MSVC (llama/7191) c917076
ggml : use dynamic thread scheduling for matrix multiplication (llama/6915) 6f8daf7
kunnis commited on
Avoid unnecessarily disabling CUDA graphs (llama/7302) 4816f6a
agray3 commited on
ggml : tag ggml_tensor::backend as deprecated (llama/7290) 1a5606e
slaren commited on
Add missing " (llama/7303) 2c417da
AidanBeltonS commited on
ggml : add `ggml_upscale_ext` (ggml/814) 04a5333
scripts : update sync 9e35f6d unverified
whisper : use ggml-cuda in mel calc, set appropriate device (#2236) 93af41a unverified
cuda : fix HIPBLAS build (#2234) a8eb666 unverified
cuda : fix bounds check for src0 rows in MMVQ kernel (#2231) 4fdb9d2 unverified
ci : fix CUDA builds (#2232) 41b22d2 unverified
whisper : auto-grow working areas for mel_calc_cuda (#2227) 6282f63 unverified
whisper : free whisper_mel instances (#2220) 9373d6b unverified
whisper : whisper_state/backend fixes (#2217) adde036 unverified
whisper : calculate mel spectrogram directly into a ggml_tensor (#2208) 521186a unverified
whisper : add CUDA-specific computation mel spectrograms (#2206) c6894d3 unverified
whisper : remove `speed_up` and `phase_vocoder*` functions (#2198) 7ef0c95 unverified
readme : add conan badge (#2196) f08dc65 unverified
Martin Delille commited on
readme : add install instructions for Conan (#2189) fb4f721 unverified
Carlos Zoido commited on
whisper: use global cache for sin/cos vals and Hann window (#2194) 3a04f56 unverified
release : v1.6.2 3e54141 unverified
Revert "whisper : remove extra backend instance (huh?)" (#2182) b708d81 unverified
server : fix typo (#2181) 18c60fc unverified
Daniel Valdivia commited on
ruby : update bindings (#2154) a2bce18 unverified
Todd commited on
release : v1.6.1 ca6f4b2 unverified
examples : add support for decoding input with ffmpeg (Linux) (#2133) c160b58 unverified
William Tambellini commited on
node : add flash_attn param (#2170) b4d05df unverified
ci: Update build.yml to suppress warnings about node.js versions (#2166) e9954d9 unverified
Tamotsu Takahashi commited on