Commit Graph

  • 793bcc0b94 Fix style Didzis Gosko 2023-06-19 23:43:40 +0300
  • 18b35625c3
    ggml : fix bug in LBFGS optimizer (found by ggml tests) master-18b3562 Georgi Gerganov 2023-06-19 20:43:30 +0300
  • 7de2494a0f Add comment mudler 2023-06-19 18:43:27 +0200
  • 7a45a13e3d Move booleans at the bottom of the structure mudler 2023-06-19 18:42:36 +0200
  • 67ba34e88f
    ggml : minor style + try fix sanitizer build Georgi Gerganov 2023-06-19 18:55:09 +0300
  • 1ca2186189
    Update README.md John 2023-06-19 17:53:35 +0200
  • d0e3596350
    ggml : minor style changes Georgi Gerganov 2023-06-19 18:45:36 +0300
  • 69fd31d18c Merge branch 'master' into optimize_quants_upstream Concedo 2023-06-19 23:38:59 +0800
  • 5e8e99f206 Merge branch 'master' into concedo_experimental Concedo 2023-06-19 23:37:53 +0800
  • 90a0e65c67
    Merge branch 'master' into HEAD Georgi Gerganov 2023-06-19 18:35:49 +0300
  • ba4e85a833
    llama : use aligned memory during ggml_init call from loading saved sessions (#1934) master-ba4e85a l3utterfly 2023-06-19 23:20:06 +0800
  • 23fc5c219a
    cmake : fix trailing whitespaces master-23fc5c2 Georgi Gerganov 2023-06-19 18:18:34 +0300
  • cb40dfca69
    llama : only use Q6_K for output weights if tensor size is multiple of 256 (#1932) master-cb40dfc Kawrakow 2023-06-19 18:17:03 +0300
  • ca7c3f4da5
    cuda : faster k-quants on older GPUs (#1930) master-ca7c3f4 Kawrakow 2023-06-19 18:14:09 +0300
  • b97ca431db
    ggml : sync latest ggml repo (#1924) master-b97ca43 Georgi Gerganov 2023-06-19 18:12:33 +0300
  • 1e3abfcef0
    cmake : fix build shared ggml when CUDA is enabled (#1929) master-1e3abfc Howard Su 2023-06-19 23:10:37 +0800
  • c27f708127
    Merge branch 'master' into fix_build Georgi Gerganov 2023-06-19 18:10:24 +0300
  • c94a438328 xx + ib0 Concedo 2023-06-19 23:01:49 +0800
  • 266d436746 Added broken new q4k quant Concedo 2023-06-19 22:20:19 +0800
  • 51e834c27b keep duplicate targets for now Concedo 2023-06-19 22:38:23 +0800
  • cf94340dfc Merge branch 'master' into concedo_experimental Concedo 2023-06-19 22:28:38 +0800
  • 911226625a
    Update README.md John 2023-06-19 15:38:22 +0200
  • 8e2dc19dc6 updated tokenizer, added support for scratch buffers for neox and gpt2 Concedo 2023-06-19 21:29:06 +0800
  • 559e43c447
    whitespace Henri Vasserman 2023-06-19 16:12:15 +0300
  • 807d1705db Workaround struct misalignment during value-copy mudler 2023-06-19 14:49:54 +0200
  • 8dea3b78b5
    Update README Henri Vasserman 2023-06-19 15:42:13 +0300
  • a63b5dd633
    Add back embedding feature Henri Vasserman 2023-06-19 15:32:20 +0300
  • 597074c3f0 Hacky patch for lower VRAM + prints JohannesGaessler 2023-06-19 14:22:31 +0200
  • 1f421dddde added missing #if defined(GGML_USE_CUBLAS) John 2023-06-19 14:13:34 +0200
  • eb22d7e504 was reverted on cuda merge John 2023-06-19 13:43:12 +0200
  • c5399d1cf7
    Merge pull request #9 from tomBlueOrange/patch-1 John 2023-06-19 13:40:00 +0200
  • 08972d2aee threading: removed feature wait_on_done to figure out causes of deadlock in windows AVX mqy 2023-06-19 19:15:00 +0800
  • 2f9366be4a - removed commented out old code from fix - updated another instance of same issue below original l3utterfly 2023-06-19 18:23:25 +0800
  • aac7f7cc04 threading: try to fix a deadlock, also added critical deadlock detection mqy 2023-06-19 18:15:32 +0800
  • 5183699900 Sync with upstream Didzis Gosko 2023-06-19 12:47:07 +0300
  • 948ce2cf9c Missing model memory release Didzis Gosko 2023-06-19 11:48:43 +0300
  • 7e14cb901c
    Apply suggestions from code review Didzis Gosko 2023-06-19 11:47:05 +0300
  • 986a56e2b2 Fixed copy/paste mistake Iwan Kawrakow 2023-06-19 11:39:47 +0300
  • 16b9cd1939
    Convert vector to f16 for dequantize mul mat vec (#1913) master-16b9cd1 Johannes Gäßler 2023-06-19 10:23:56 +0200
  • cc8a375bc4 threading: fix deadlock by reverting part of changes from commit 286c5b30 mqy 2023-06-19 16:17:48 +0800
  • ced8e8d40d fixed issue: memory is not guaranteed to be aligned properly during ggml_init call from loading saved sessions l3utterfly 2023-06-19 14:46:15 +0800
  • 4d32b4088e threading test: decrease a threshold value to avoid timeout mqy 2023-06-19 14:05:30 +0800
  • 44b831dc59 tune: extract ggml_mulmat_tune_bench_wrapper mqy 2023-06-19 13:54:20 +0800
  • 65fd65e0c1 tune: update readme mqy 2023-06-19 13:50:35 +0800
  • 510b537d42 Only use Q6_K for output weights if tensor size is multiple of 256 Iwan Kawrakow 2023-06-19 08:38:38 +0300
  • aaf3f2476d
    Update Makefile - minor spelling error Tom Seneviratne 2023-06-19 14:46:22 +1000
  • cb6daa3171 updated lite Concedo 2023-06-19 11:51:23 +0800
  • f0165a5f18 Merge branch 'master' into cuda-integration John 2023-06-19 05:31:46 +0200
  • 932f7f663a Merge branch 'master' of https://github.com/cmp-nct/ggllm.cpp John 2023-06-19 05:30:41 +0200
  • 7c8249ff6b cuda malloc: - added functionality to find the smallest fitting buffer instead of the first found buffer that >= than requested -- this prevents that two buffer allocations in sequence can take a huge buffer for a small tensor and then require a new buffer for the 2nd tensor -- in my test it saved 1GB VRAM that are now free for more offloading John 2023-06-19 05:03:28 +0200
  • ec253a67bc
    Merge pull request #8 from alepar/patch-1 John 2023-06-18 22:46:26 +0200
  • 3984d36542
    Fixes typo Alexey Parfenov 2023-06-18 11:10:24 -0700
  • 6609c229e8 fixed OP_OUT_PROD and OP_NONE mqy 2023-06-19 01:05:34 +0800
  • 98728632c6 threading test: less loops to avoid timeout mqy 2023-06-19 01:04:32 +0800
  • 4b9458215b add description for --numa zrm 2023-06-18 12:36:35 -0400
  • 2f5bb462fd move numa state to g_state zrm 2023-06-18 11:59:27 -0400
  • 0c6392deff Fix build shared ggml when CUDA is enabled Howard Su 2023-06-18 23:43:24 +0800
  • b24c3049d9
    Added tokens per second to info prints (#1928) master-b24c304 Johannes Gäßler 2023-06-18 17:41:26 +0200
  • 4aea489740 k_quants: faster Q5_K on older GPUs Iwan Kawrakow 2023-06-18 18:12:34 +0300
  • d0d3c4f32b Merge remote-tracking branch 'origin/master' into concedo_experimental Concedo 2023-06-18 22:53:10 +0800
  • 1dfa3b5687 Added tokens per second to info prints JohannesGaessler 2023-06-18 16:40:19 +0200
  • 72f358150c
    Update README.md John 2023-06-18 16:37:21 +0200
  • 621d6264e0 Fix cmake compilation issues JohannesGaessler 2023-06-18 16:16:16 +0200
  • 0ede372a51
    Fixed incorrectly applying RMS norm twice (#1925) master-0ede372 Johannes Gäßler 2023-06-18 16:07:09 +0200
  • d6daebcb0c k_quants: faster Q2_K on older GPUs Iwan Kawrakow 2023-06-18 17:00:24 +0300
  • be6f8b9ee7 k_quants: hopefully much faster Q3_K on older GPUs Iwan Kawrakow 2023-06-18 16:50:15 +0300
  • a794320fc8 Fix trailing whitespace JohannesGaessler 2023-06-18 15:47:02 +0200
  • 1677059ba1 k_quants: hopefully much faster Q4_K on older GPUs Iwan Kawrakow 2023-06-18 16:37:25 +0300
  • d036161b14 Added compilation option description to README JohannesGaessler 2023-06-18 15:10:42 +0200
  • 7d2a566362
    ggml : asserts Georgi Gerganov 2023-06-18 16:10:44 +0300
  • b6adad28ef
    ggml : remove unused comments Georgi Gerganov 2023-06-18 16:06:16 +0300
  • ca4ae7844d Fixed incorrectly applying RMS norm twice JohannesGaessler 2023-06-18 14:34:52 +0200
  • b4028edb9a a debug line slipped in John 2023-06-18 14:26:35 +0200
  • 76caff6e9f cmake compilation options JohannesGaessler 2023-06-18 14:18:38 +0200
  • 2ced178f18
    ggml : sync latest ggml repo Georgi Gerganov 2023-06-18 15:11:55 +0300
  • 286c5b3014 threadng: remove unnecessary spin lock/unlock from suspend/resume; add more tests mqy 2023-06-18 20:01:58 +0800
  • 8596af4277
    ggml : fix bug in ggml_compute_forward_add_q_f32 (#1918) master-8596af4 l3utterfly 2023-06-18 19:19:16 +0800
  • 5feefb32b3 threading: add suspend/resume APIs, so it's possible to run a thread pool at session level mqy 2023-06-18 18:57:33 +0800
  • 5abb8aefea fix warning mqy 2023-06-18 18:55:44 +0800
  • 4e3b9f2f9c
    Merge branch 'ggerganov:master' into master Robyn 2023-06-18 20:43:59 +1000
  • a2db989667 cleanup JohannesGaessler 2023-06-18 11:36:09 +0200
  • b08b371983 allow hordeconfig to set a max ctx length too. Concedo 2023-06-18 16:42:32 +0800
  • e6c9b5f567 WIP better data types JohannesGaessler 2023-06-18 10:37:13 +0200
  • e1886cf4fe
    readme : update Android build instructions (#1922) Mike 2023-06-18 16:28:26 +0800
  • 4cb19833dc
    Update README.md Mike 2023-06-18 16:16:08 +0800
  • 28e2762324 works JohannesGaessler 2023-06-18 09:52:56 +0200
  • cde81f91ca compile option JohannesGaessler 2023-06-18 09:23:37 +0200
  • 8ac993bd0a dfloat2 JohannesGaessler 2023-06-18 09:04:11 +0200
  • 98b8596b2a Convert vector to f16 for dmmv JohannesGaessler 2023-06-17 16:09:21 +0200
  • 8ab8ba62eb
    llama : prevent usage of k-quants when tensor size is not a multiple of 256 (#1921) master-8ab8ba6 Kawrakow 2023-06-18 11:13:43 +0300
  • 90cc59d6ab
    examples : fix examples/metal (#1920) master-90cc59d Kawrakow 2023-06-18 10:52:10 +0300
  • ef1518b21d k-quants: prevent usage when tensor size is not divisible by 256 Iwan Kawrakow 2023-06-18 10:50:58 +0300
  • b3f6e57f5b Fix examples/metal Iwan Kawrakow 2023-06-18 10:44:02 +0300
  • 278427d9a4 Merge branch 'master' into concedo_experimental Concedo 2023-06-18 15:29:44 +0800
  • 8775dd99f4 various debug logging improvements Concedo 2023-06-18 15:24:58 +0800
  • 0ec4dab864 fixed break and asssion from select; try fix cuda link error mqy 2023-06-18 14:59:44 +0800
  • c31d51d40d statically allocate zrm 2023-06-18 02:34:08 -0400
  • 2193ab6281 fix cuda build error mqy 2023-06-18 14:07:33 +0800
  • 67bb367962 typos mqy 2023-06-18 14:03:09 +0800
  • 06b00827a0 bulk refactoring task profile and related to run CL GPU offloading. * removed ggml_task_backend, infavour of ggml_task_profile.runner and newly added id and name. * extracted mul_mat blas codes into ggml_compute_forward_mul_mat_blas, thus align with CUDA/CL a bit more and make it easier to fix profile and run tune. * rewrote task profile and update/add some cuda/cl codes, finnaly made CL GPU offloading work. * misc minor fix/update to tune, the data format was changed. mqy 2023-06-18 12:29:16 +0800