Commit Graph

  • 8f9e546b51 trim partial stopping strings when not streaming anon 2023-06-02 08:14:28 -0300
  • bebea657cb
    Merge pull request #13 from anon998/small-fixes Randall Fitzgerald 2023-06-02 06:53:10 -0400
  • abb7782745
    Merge branch 'master' into small-fixes anon998 2023-06-02 10:35:06 +0000
  • 88cc7bb6f7
    Stuff with logits Henri Vasserman 2023-06-02 13:29:57 +0300
  • 47efbb5cf3 use std::isinf to check if ignore_eos is active anon 2023-06-02 07:19:21 -0300
  • 2932db15a3 avoid creating element in logit_bias accidentally anon 2023-06-02 06:55:38 -0300
  • d38ad41528 Add llama.cpp:full image support for Chinese, and related documents(#1649) qingfengfenga 2023-06-02 17:40:54 +0800
  • a8a9f19689 small fixes anon 2023-06-02 05:57:20 -0300
  • 49dce94885 make types match gpt_params exactly anon 2023-06-02 05:51:34 -0300
  • 1488a0f528 make functions that never return false void anon 2023-06-02 05:47:00 -0300
  • ebfead6e5a remove unused variables anon 2023-06-02 05:45:57 -0300
  • 731ecc0d1b fix typo anon 2023-06-02 05:45:16 -0300
  • 0bc047730f
    Apply suggestions from code review Henri Vasserman 2023-06-02 10:29:09 +0300
  • 8d0c81e7cc Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental Concedo 2023-06-02 12:19:59 +0800
  • 144d8a8312 updated lite Concedo 2023-06-02 12:19:51 +0800
  • e55f7b0bdb
    mtl : add f16 mat x f32 vec multiplication kernel Georgi Gerganov 2023-06-01 23:37:49 +0300
  • f0196a7e7a
    mtl : optimize rms_norm and soft_max kernels Georgi Gerganov 2023-06-01 22:51:42 +0300
  • d9626743ac
    add option to use scratch buffers in training or not xaedes 2023-06-01 20:59:19 +0200
  • 9665429e94
    mtl : full GPU inference of the computation graph Georgi Gerganov 2023-06-01 21:50:01 +0300
  • fbd3f6258d
    mtl : add non-broadcast mul kernel Georgi Gerganov 2023-06-01 21:40:53 +0300
  • 42dca4004c
    mtl : add silu kernel Georgi Gerganov 2023-06-01 21:35:11 +0300
  • a0cc3de59a
    mtl : add f32 -> f32 cpy kernel Georgi Gerganov 2023-06-01 21:30:33 +0300
  • a266c26de2
    mtl : verify V tensor contents Georgi Gerganov 2023-06-01 21:27:24 +0300
  • f67c2d8cab
    ggml : update ggml_nbytes() to handle non-contiguous tensors Georgi Gerganov 2023-06-01 21:27:03 +0300
  • 0d4b87de3d
    improve training memory usage with scratch buffers xaedes 2023-06-01 19:50:48 +0200
  • 17930fbcb7
    mtl : fix soft_max kernel Georgi Gerganov 2023-06-01 20:48:24 +0300
  • 765b290010
    bug fix for ggml_compute_forward_get_rows_back_f32 xaedes 2023-06-01 19:42:51 +0200
  • 3164f93381
    fix formulas in comments xaedes 2023-06-01 19:41:55 +0200
  • 17a70362a6
    mtl : add diag_mask_inf kernel Georgi Gerganov 2023-06-01 20:41:54 +0300
  • 0e269665cd
    add ggml_opt_resume_g which accepts forward and backward cgraphs xaedes 2023-06-01 19:41:28 +0200
  • 24239f0df7 Improve implementation 0cc4m 2023-06-01 18:57:08 +0200
  • 0f1c580860
    mtl : add scale kernel Georgi Gerganov 2023-06-01 19:52:32 +0300
  • 51efb59437
    mtl : confirm f16 x f32 attention mul mat Georgi Gerganov 2023-06-01 19:45:36 +0300
  • 948fcfde7e
    mtl : add cpy kernel + handle view ops Georgi Gerganov 2023-06-01 19:21:28 +0300
  • 94ea9e7bfe
    ggml : store offset as opt arg for ggml_view_xd() operators Georgi Gerganov 2023-06-01 19:21:08 +0300
  • a8a22ff93f Build locally will detect CPU Howard Su 2023-06-01 23:00:12 +0800
  • ac072d7c91 Modify per code review sugguestions Howard Su 2023-04-16 22:42:06 +0800
  • 7adce4f64c Only check hardware when option is ON Howard Su 2023-04-07 21:04:47 +0800
  • 5f50d15120 Add detection code for avx Howard Su 2023-04-01 16:32:14 +0800
  • 98552d1e5d cleanup and simplify the code Howard Su 2023-06-01 22:44:36 +0800
  • 37659d2c4e allow blasbatchsize -1 which disables blas, but keeps benefits like gpu offloads. Concedo 2023-06-01 22:33:50 +0800
  • ab0a7d1531 Add normalfloat4 as Q4_2 Howard Su 2023-06-01 22:17:53 +0800
  • d29b6d5f55
    Merge pull request #12 from anon998/clear-logit-bias Randall Fitzgerald 2023-06-01 08:58:35 -0400
  • 8cbc4be6c2 clear logit_bias between requests + print anon 2023-06-01 09:49:50 -0300
  • 6025476e39 default penalize_nl back to true anon 2023-06-01 09:49:16 -0300
  • 49a18bdd14 remove unused parameter warning anon 2023-06-01 09:41:35 -0300
  • af711263ae
    Merge pull request #11 from SlyEcho/server_refactor Randall Fitzgerald 2023-06-01 08:10:55 -0400
  • 797155a0d1
    Merge pull request #10 from cirk2/master Randall Fitzgerald 2023-06-01 08:10:26 -0400
  • 49272e3c53 adjusted defaults Concedo 2023-06-01 20:03:44 +0800
  • 9531ae60db
    Add logit bias support Henri Vasserman 2023-06-01 13:57:47 +0300
  • 8c6a5fc92b
    last tokens fixes Henri Vasserman 2023-06-01 13:18:12 +0300
  • 5bbc030338
    Add Options enpoints and Access-Control-Allow-Headers to satisfy CORS rules Felix Hellmann 2023-06-01 10:47:53 +0200
  • 457aaf5bad Reduce code duplication between cuda and opencl branches 0cc4m 2023-06-01 07:33:32 +0200
  • 93278f84cf low_level_api_chat_cpp.py: fix default path_prefix arg value to match class default value Don Mahurin 2023-05-23 06:21:31 -0700
  • 62ddbc6cd9
    Update Makefile to add SSSE3 compilation use cases rankaiyx 2023-06-01 08:46:07 +0800
  • f7882e2d69 Fixed a crash caused by erasing from empty last_n_tokens digiwombat 2023-05-31 20:35:28 -0400
  • 5f6e16da36
    Merge pull request #9 from anon998/stopping-strings Randall Fitzgerald 2023-05-31 20:05:18 -0400
  • e9b1f0bf5c fix stopping strings anon 2023-05-31 20:31:58 -0300
  • 342604bb81 Added a super simple CORS header as default for all endpoints. digiwombat 2023-05-31 19:54:05 -0400
  • e5dad2afa0 Look for libllama in parent directory Don Mahurin 2023-05-23 06:21:31 -0700
  • 4ad62c489d fix "missing 1 required positional argument: 'min_keep'" Don Mahurin 2023-05-22 23:54:57 -0700
  • 60a7c76339 Update llama.cpp Andrei Betlen 2023-05-21 17:47:21 -0400
  • fda33ddbd5 Fix llama_cpp and Llama type signatures. Closes #221 Andrei Betlen 2023-05-19 11:59:33 -0400
  • 601b19203f Check for CUDA_PATH before adding Andrei Betlen 2023-05-17 15:26:38 -0400
  • 66c27f3120 Fixd CUBLAS dll load issue in Windows Aneesh Joy 2023-05-17 18:04:58 +0100
  • aae6c03e94 Update llama.cpp Andrei Betlen 2023-05-14 00:04:22 -0400
  • a83d117507 Add winmode arg only on windows if python version supports it Andrei Betlen 2023-05-15 09:15:01 -0400
  • 7609c73ee6 Update llama.cpp (remove min_keep default value) Andrei Betlen 2023-05-07 00:12:47 -0400
  • 59f80d2a0d Fix mlock_supported and mmap_supported return type Andrei Betlen 2023-05-07 03:04:22 -0400
  • 3808a73751 Fix obscure Wndows DLL issue. Closes #208 Andrei Betlen 2023-05-14 22:08:11 -0400
  • 690588410e Fix return type Andrei Betlen 2023-05-07 19:30:14 -0400
  • 4885e55ccd Fix: runtime type errors Andrei Betlen 2023-05-05 14:12:26 -0400
  • 0c2fb05361 Fix: types Andrei Betlen 2023-05-05 14:04:12 -0400
  • ff31330d7f Fix candidates type Andrei Betlen 2023-05-05 14:00:30 -0400
  • 7862b520ec Fix llama_cpp types Andrei Betlen 2023-05-05 13:54:22 -0400
  • f20b34a3be Add return type annotations for embeddings and logits Andrei Betlen 2023-05-05 14:22:55 -0400
  • 731c71255b Add types for all low-level api functions Andrei Betlen 2023-05-05 12:22:27 -0400
  • a439fe1529 Allow model to tokenize strings longer than context length and set add_bos. Closes #92 Andrei Betlen 2023-05-12 14:28:22 -0400
  • b5531e1435 low_level_api_chat_cpp.py: Fix missing antiprompt output in chat. Don Mahurin 2023-05-26 06:35:15 -0700
  • fb79c567d2 Fix session loading and saving in low level example chat Mug 2023-05-08 15:27:03 +0200
  • 0bf36a77ae Fix mirastat requiring c_float Mug 2023-05-06 13:35:50 +0200
  • f8ba031576 Fix lora Mug 2023-05-08 15:27:42 +0200
  • bbf6848cb0 Wrong logit_bias parsed type Mug 2023-05-06 13:27:52 +0200
  • 335cd8d947 Rename postfix to suffix to match upstream Mug 2023-05-06 13:18:25 +0200
  • 32cf0133c9 Update low level examples Mug 2023-05-04 18:33:08 +0200
  • 9e79465b21 Prefer explicit imports Andrei Betlen 2023-05-05 14:05:31 -0400
  • d15578e63e Update llama.cpp (session version) Andrei Betlen 2023-05-03 09:33:30 -0400
  • c26e9bf1c1 Update sampling api Andrei Betlen 2023-05-01 14:47:55 -0400
  • 78531e5d05 Fix return types and import comments Andrei Betlen 2023-05-01 14:02:06 -0400
  • d0031edbd2 Update llama.cpp Andrei Betlen 2023-05-01 10:44:28 -0400
  • 441d30811a Detect multi-byte responses and wait Mug 2023-04-28 12:50:30 +0200
  • 36b3494332 Also ignore errors on input prompts Mug 2023-04-26 14:45:51 +0200
  • c8e6ac366a Update llama.cpp (llama_load_session_file) Andrei Betlen 2023-04-28 15:32:43 -0400
  • 66ad132575 Update llama.cpp Andrei Betlen 2023-04-26 20:00:54 -0400
  • 656190750d Update llama.cpp Andrei Betlen 2023-04-25 19:03:41 -0400
  • 80c18cb665 Update llama.cpp (remove llama_get_kv_cache) Andrei Betlen 2023-04-24 09:30:10 -0400
  • bf9f02d8ee Update llama.cpp Andrei Betlen 2023-04-22 19:50:28 -0400
  • 5bbf40aa47 Update llama.cpp Andrei Betlen 2023-04-21 17:40:27 -0400
  • fd64310276 Fix decode errors permanently Mug 2023-04-26 14:37:06 +0200
  • bdbaf5dc76 Fixed end of text wrong type, and fix n_predict behaviour Mug 2023-04-17 14:45:28 +0200