Spaces:
Running
Running
Commit History
ggml : remove oboslete alibi code (skipme) (#0)
d25c1e3
talk-llama : sync llama.cpp
f5f68d6
sync : ggml
3ea4549
ggml : optimize for ppc64le using VSX intrinsics (ggml/784)
05d3824
metal : fix indent (ggml/0)
d4f82d5
ggml : restore sigmoid decl order (ggml/0)
67c5387
ggml : resolve merge (ggml/0)
d692b06
ggml : full ALiBi support (llama/7192)
192bda4
metal : fix flash attention kernel requirements (llama/7169)
6cb3028
Minor arithmetic improvement to mmvq wrapper kernel (llama/7172)
ae75124
Ouadie EL FAROUKI
commited on
Vulkan Bugfixes and Improvements (llama/7084)
8dade62
CUDA: generalize FP16 fattn vec kernel (llama/7061)
ca79691
opencl : alignment size converted from bits to bytes (llama/7090)
2692ce5
Introduction of CUDA Graphs to LLama.cpp (llama/6766)
08fc76d
agray3
slaren
commited on
metal : use `vm_allocate` instead of `posix_memalign` on macOS (llama/7078)
eb910b1
Gilad S
commited on
ggml : introduce bfloat16 support (llama/6412)
81ec961
Justine Tunney
commited on
metal : fix unused warning
24e883a
Add an option to build without CUDA VMM (llama/7067)
38b1143
gguf-split: add --no-tensor-first-split (llama/7072)
b9bc04d
Xuan Son Nguyen
commited on
CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (llama/7019)
4cf786d
switch to using localizedDescription (llama/7010)
fd25ba6
metal : remove deprecated error code (llama/7008)
42a84fb
metal : log more info on error (llama/6987)
d4dcef9
ggml : add Flash Attention (llama/5021)
34d3b03
ggml : fix __MSC_VER -> _MSC_VER (llama/6977)
a83f2ae
Fix more int overflow during quant (PPL/CUDA). (llama/6563)
531387f
gguf : enforce that tensor names are unique (llama/6905)
22e446d
Xuan Son Nguyen
slaren
commited on
add device version in device list (llama/6959)
c022e9a
Neo Zhang
arthw
commited on
Reset schedule earlier to allow overlap with ggml graph computation on device (llama/6933)
3a8eea8
agray3
commited on
add basic tensor data validation function (llama/6884)
71e001c
slaren
commited on
gguf : fix mismatch between alloc and free functions (llama/6929)
d8fb433
slaren
commited on
Merge pull request from GHSA-p5mv-gjc5-mwqv
72b368d
ggml : fix redefinition of vaddvq_f32 for 32-bit ARM (llama/6906)
f900de6
ggml : fix MIN / MAX macros (llama/6904)
a1c0e2a
ggml : move 32-bit arm compat in ggml-impl.h (llama/6865)
7343760
llamafile : improve sgemm.cpp (llama/6796)
bfe2a5f
Justine Tunney
commited on
ggml : fix calloc argument ordering. (llama/6820)
12af87c
Dave Airlie
commited on
ggml : fix ggml_backend_cpu_supports_op() for CPY (llama/0)
d645791
ggml : group all experts in a single ggml_mul_mat_id (llama/6505)
f0b5c67
ggml : fix llamafile sgemm wdata offsets (llama/6710)
5e756db
ggml : add llamafile sgemm (llama/6414)
093eec4
Justine Tunney
commited on
llama : add qwen2moe (llama/6074)
daae175
fix mul_mat_id() for new input, make the ut pass (llama/6682)
6d1ba81
Neo Zhang Jianyu
commited on
Added support for GGML_OP_CLAMP in Metal (llama/6662)
a06cbc7
Dave
dave-fl
commited on
fix memcpy() crash, add missed cmd in guide, fix softmax (llama/6622)
6901743
Neo Zhang Jianyu
commited on
CUDA: fix matrix multiplication logic for tests (llama/6667)
6ccb5a5
metal : unify mul_mv_id kernels (llama/6556)
e9910b5
slaren
commited on
llama : add gguf_remove_key + remove split meta during quantize (llama/6591)
1706870
jiez
z5269887
commited on
feat: implemented sigmoid function (ggml/806)
cd0c122
Justina Cho
commited on