llama.cpp/ggml/include
Sigbjørn Skjæret b72c20b85c
Fix conversion of unnormalized BF16->BF16 weights (#7843)
* add truncate_bf16

* truncate intermediate fp32 if converting bf16 to bf16

* fix masking in __compute_fp32_to_bf16

* np.int16 no longer used

* missing cast and additional numpy 2.x fix

* ggml-impl : do not flush bf16 subnormals to zero

* ggml : add reference fp32 to bf16 conversion

The fast version is no longer equivalent for all platforms
because of the handling of subnormal values.

* gguf-py : remove flush to zero for bf16 subnormals

* gguf-py : remove float32 truncation to bf16

Rounding achieves the same thing in the cases where this was used.

* missed prototype update in merge

* merge cleanup

---------

Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-08-02 15:11:39 -04:00
..
ggml-alloc.h llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
ggml-backend.h CUDA: fix partial offloading for ne0 % 256 != 0 (#8572) 2024-07-18 23:48:47 +02:00
ggml-blas.h llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
ggml-cann.h [CANN] Add Ascend NPU backend (#6035) 2024-07-17 14:23:50 +03:00
ggml-cuda.h feat: Support Moore Threads GPU (#8383) 2024-07-28 01:41:25 +02:00
ggml-kompute.h llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
ggml-metal.h Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) 2024-07-02 12:18:10 -04:00
ggml-rpc.h llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
ggml-sycl.h llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
ggml-vulkan.h llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
ggml.h Fix conversion of unnormalized BF16->BF16 weights (#7843) 2024-08-02 15:11:39 -04:00