Commit Graph

4 Commits

Author SHA1 Message Date
mqy
06b00827a0 bulk refactoring task profile and related to run CL GPU offloading.
* removed ggml_task_backend, infavour of ggml_task_profile.runner and newly added id and name.
* extracted mul_mat blas codes into ggml_compute_forward_mul_mat_blas,
  thus align with CUDA/CL a bit more and make it easier to fix profile and run tune.
* rewrote task profile and update/add some cuda/cl codes, finnaly made CL GPU offloading work.
* misc minor fix/update to tune, the data format was changed.
2023-06-18 14:27:56 +08:00
mqy
6b83a3e16f try make CL run w/o tunning, but -ngl stucks no output. had to add task runer and profile id, many changes, see the f codes 2023-06-18 14:27:56 +08:00
mqy
48016f685c bulk refactored task profile to support complete fallback; enable tune by default for ease of dev 2023-06-18 14:27:56 +08:00
mqy
213f133701 initial 2023-06-18 14:27:53 +08:00