llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-11-15 07:19:53 +00:00

History

Sigbjørn Skjæret b72c20b85c Fix conversion of unnormalized BF16->BF16 weights (#7843 ) * add truncate_bf16 * truncate intermediate fp32 if converting bf16 to bf16 * fix masking in __compute_fp32_to_bf16 * np.int16 no longer used * missing cast and additional numpy 2.x fix * ggml-impl : do not flush bf16 subnormals to zero * ggml : add reference fp32 to bf16 conversion The fast version is no longer equivalent for all platforms because of the handling of subnormal values. * gguf-py : remove flush to zero for bf16 subnormals * gguf-py : remove float32 truncation to bf16 Rounding achieves the same thing in the cases where this was used. * missed prototype update in merge * merge cleanup --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>		2024-08-02 15:11:39 -04:00
..
cmake	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
include	Fix conversion of unnormalized BF16->BF16 weights (#7843 )	2024-08-02 15:11:39 -04:00
src	Fix conversion of unnormalized BF16->BF16 weights (#7843 )	2024-08-02 15:11:39 -04:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	cann: update cmake (#8765 )	2024-07-30 12:37:35 +02:00