This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from
ggml.h, and back the llama_rope_type enum.
I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is
safe to remove it yet.
This commit moves the `llama_rope_type` enum from `llama.h` to
`ggml.h` and changes its name to `ggml_rope_type`.
The motivation for this change is to address the TODO in `llama.h` and
use the enum in ggml.
Note: This commit does not change the `mode` parameter to be of type
`enum ggml_rope_type`. The name `mode` and its usage suggest that it
might be more generic and possibly used as a bit field for multiple
flags. Further investigation/discussion may be needed to determine
if `mode` should be restricted to RoPE types.
* ggml: use vulkan as gpu backend when available
Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
* whisper: enable using vk as default buffer type
Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
---------
Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
This commit adds the `--pooling` option to the README.md file in the
`examples/embedding` directory.
The motivation for adding this options is that currently if the model
used does not specify a pooling type the embedding example will fail
with the following error message:
```console
main: error: pooling type NONE not supported
```
This commit also updates the name of the executable in the examples
section.
* gguf-py : use classes for quants
* convert_hf : simplify internal quantization type selection
* gguf-py : fix flake8 lint
* gguf-py : fix BF16 numpy view type
* gguf-py : remove LlamaFileTypeMap
Too specific to 'llama.cpp', and would be a maintenance burden
to keep up to date.
* gguf-py : add generic quantize and dequantize functions
The quant classes no longer need to be known,
only the target or the source type,
for 'quantize' and 'dequantize', respectively.
When using CMake to build with Vulkan support, compiling
vulkan-shaders-gen fails due to missing a CMakeLists.txt specification
to link vulkan-shaders-gen with the threading library, resulting in the
following error.
[5/172] Linking CXX executable bin/vulkan-shaders-gen
FAILED: bin/vulkan-shaders-gen
: && /usr/bin/c++ ggml/src/vulkan-shaders/CMakeFiles/vulkan-shaders-gen.dir/vulkan-shaders-gen.cpp.o -o bin/vulkan-shaders-gen && :
ld: error: undefined symbol: pthread_create
>>> referenced by vulkan-shaders-gen.cpp
>>> ggml/src/vulkan-shaders/CMakeFiles/vulkan-shaders-gen.dir/vulkan-shaders-gen.cpp.o:(std::__1::__libcpp_thread_create[abi:se180100](pthread**,
>>> void* (*)(void*), void*))
c++: error: linker command failed with exit code 1 (use -v to see invocation)
[6/172] Generating build details from Git
-- Found Git: /usr/local/bin/git (found version "2.45.2")
ninja: build stopped: subcommand failed.
Add the CMakeLists.txt specification to link vulkan-shaders-gen with the
threading library and fix the above error.
Fixes#8834
* Fix compilation issue in `vulkan-shaders-gen`
e31a4f6797 broke compilation on w64devkit. Including `algorithm` seems to fix that.
* Guard it under `#ifdef _WIN32`
* common : Changed tuple to struct (TODO fix)
Use struct `llama_init_result` to replace the previous
std::tuple<struct llama_model *, struct llama_context *>
* delete llama_init_default_params()
* delete the extra whitespace
ramalama is a repo agnostic boring CLI tool that supports pulling from
ollama, huggingface and oci registries.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
* gguf-py, llama : add constants and methods related to Llama-3.1 <|eom_id|> token
* llama : find Llama-3.1 <|eom_id|> token id during vocab loading
* llama-vocab : add Llama-3.1 <|eom_id|> token to the set of tokens stopping the generation
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
* Fix Vulkan repeat op
* Implement Vulkan concat op
* Delete old Vulkan shader generator
* Implement Vulkan im2col op
* Implement Vulkan unary gelu_quick op
* Implement Vulkan group_norm op
* Implement Vulkan timestep_embedding op
* Implement Vulkan upscale op
* Fix Vulkan vk_context tensor extra index issue
* Fix Vulkan matmul shader parameter bug
* Properly fix Vulkan matmul shader parameter bug
* Add Vulkan ADD f16 + f32 -> f16 operator support
* Implement Vulkan tanh op
* Fix Vulkan group count too large Validation error on non-Nvidia GPUs
* Throw error when too much memory is requested
* Fix another Vulkan group count too large Validation error on non-Nvidia GPUs
* Fix matmul MMQ condition
* Implement Vulkan pad op
* Fix Vulkan crash when tensor is used multiple times in a compute graph
* Add Vulkan CONCAT f16 + f16 -> f16 op
* Add Vulkan LEAKY_RELU op
This commit moves the comment for the c parameter from ggml_rope to
ggml_rope_ext. The comment is currently incorrect as ggml_rope does not
have a c parameter (freq_factors tensor).
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>