llama.cpp/examples/llava/llava-surgery.py

import argparse
import glob
import os
import torch


ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", help="Path to LLaVA v1.5 model")
args = ap.parse_args()

# find the model part that includes the the multimodal projector weights
path = sorted(glob.glob(f"{args.model}/pytorch_model*.bin"))[-1]
checkpoint = torch.load(path)

# get a list of mm tensor names
mm_tensors = [k for k, v in checkpoint.items() if k.startswith("model.mm_projector")]

# store these tensors in a new dictionary and torch.save them
projector = {name: checkpoint[name].float() for name in mm_tensors}
torch.save(projector, f"{args.model}/llava.projector")

# remove these tensors from the checkpoint and save it again
for name in mm_tensors:
    del checkpoint[name]

# BakLLaVA models contain CLIP tensors in it
clip_tensors = [k for k, v in checkpoint.items() if k.startswith("model.vision_tower")]
if len(clip_tensors) > 0:
    clip = {name.replace("vision_tower.vision_tower.", ""): checkpoint[name].float() for name in clip_tensors}
    torch.save(clip, f"{args.model}/llava.clip")

    # remove these tensors
    for name in clip_tensors:
        del checkpoint[name]

    # added tokens should be removed to be able to convert Mistral models
    if os.path.exists(f"{args.model}/added_tokens.json"):
        with open(f"{args.model}/added_tokens.json", "w") as f:
            f.write("{}\n")


torch.save(checkpoint, path)

print("Done!")
print(f"Now you can convert {args.model} to a a regular LLaMA GGUF file.")
print(f"Also, use {args.model}/llava.projector to prepare a llava-encoder.gguf file.")
examples: support LLaVA v1.5 (multimodal model) (#3436) * WIP: start implementing LLaVA * rm scratch buf for now, will revert after cleanup * LLaVA image encoder is working. will combine with llama * Add llava inference code, but it's buggy. debugging * LLaVA is working e2e, needs to optimize memory allocation + cleanup * Use ggml_allocr + rm unnecessary code * fix: crlf -> lf * fix: new line at EoF * fix: trailing whitespace * Add readme * Update readme * Some cleanup * Are you happy editorconfig? * rm unused batch image preprocessing * rm unused import * fix: rm designated initializers * introduce pad-to-square mode for non-square images * are you happy editorconfig? * gitignore /llava * Handle cases where image file does not exist * add llava target to Makefile * add support for 13b model variant * Maybe seed is unlucky? * Check if apples are compared to apples * are you happy editorconfig? * Use temperature = 0.1 by default * command line: use gpt_params_parse() * minor * handle default n_predict * fix typo * llava : code formatting, rename files, fix compile warnings * do not use Wno-cast-qual for MSVC --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-10-12 15:23:18 +00:00			`import argparse`
			`import glob`
			`import os`
			`import torch`


			`ap = argparse.ArgumentParser()`
			`ap.add_argument("-m", "--model", help="Path to LLaVA v1.5 model")`
			`args = ap.parse_args()`

			`# find the model part that includes the the multimodal projector weights`
			`path = sorted(glob.glob(f"{args.model}/pytorch_model*.bin"))[-1]`
			`checkpoint = torch.load(path)`

			`# get a list of mm tensor names`
			`mm_tensors = [k for k, v in checkpoint.items() if k.startswith("model.mm_projector")]`

			`# store these tensors in a new dictionary and torch.save them`
multimodal : add BakLLaVA conversion support (#3682) 2023-10-19 16:40:41 +00:00			`projector = {name: checkpoint[name].float() for name in mm_tensors}`
examples: support LLaVA v1.5 (multimodal model) (#3436) * WIP: start implementing LLaVA * rm scratch buf for now, will revert after cleanup * LLaVA image encoder is working. will combine with llama * Add llava inference code, but it's buggy. debugging * LLaVA is working e2e, needs to optimize memory allocation + cleanup * Use ggml_allocr + rm unnecessary code * fix: crlf -> lf * fix: new line at EoF * fix: trailing whitespace * Add readme * Update readme * Some cleanup * Are you happy editorconfig? * rm unused batch image preprocessing * rm unused import * fix: rm designated initializers * introduce pad-to-square mode for non-square images * are you happy editorconfig? * gitignore /llava * Handle cases where image file does not exist * add llava target to Makefile * add support for 13b model variant * Maybe seed is unlucky? * Check if apples are compared to apples * are you happy editorconfig? * Use temperature = 0.1 by default * command line: use gpt_params_parse() * minor * handle default n_predict * fix typo * llava : code formatting, rename files, fix compile warnings * do not use Wno-cast-qual for MSVC --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-10-12 15:23:18 +00:00			`torch.save(projector, f"{args.model}/llava.projector")`

			`# remove these tensors from the checkpoint and save it again`
			`for name in mm_tensors:`
			`del checkpoint[name]`

multimodal : add BakLLaVA conversion support (#3682) 2023-10-19 16:40:41 +00:00			`# BakLLaVA models contain CLIP tensors in it`
			`clip_tensors = [k for k, v in checkpoint.items() if k.startswith("model.vision_tower")]`
			`if len(clip_tensors) > 0:`
			`clip = {name.replace("vision_tower.vision_tower.", ""): checkpoint[name].float() for name in clip_tensors}`
			`torch.save(clip, f"{args.model}/llava.clip")`

			`# remove these tensors`
			`for name in clip_tensors:`
			`del checkpoint[name]`

			`# added tokens should be removed to be able to convert Mistral models`
			`if os.path.exists(f"{args.model}/added_tokens.json"):`
			`with open(f"{args.model}/added_tokens.json", "w") as f:`
			`f.write("{}\n")`


examples: support LLaVA v1.5 (multimodal model) (#3436) * WIP: start implementing LLaVA * rm scratch buf for now, will revert after cleanup * LLaVA image encoder is working. will combine with llama * Add llava inference code, but it's buggy. debugging * LLaVA is working e2e, needs to optimize memory allocation + cleanup * Use ggml_allocr + rm unnecessary code * fix: crlf -> lf * fix: new line at EoF * fix: trailing whitespace * Add readme * Update readme * Some cleanup * Are you happy editorconfig? * rm unused batch image preprocessing * rm unused import * fix: rm designated initializers * introduce pad-to-square mode for non-square images * are you happy editorconfig? * gitignore /llava * Handle cases where image file does not exist * add llava target to Makefile * add support for 13b model variant * Maybe seed is unlucky? * Check if apples are compared to apples * are you happy editorconfig? * Use temperature = 0.1 by default * command line: use gpt_params_parse() * minor * handle default n_predict * fix typo * llava : code formatting, rename files, fix compile warnings * do not use Wno-cast-qual for MSVC --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-10-12 15:23:18 +00:00			`torch.save(checkpoint, path)`

			`print("Done!")`
			`print(f"Now you can convert {args.model} to a a regular LLaMA GGUF file.")`
			`print(f"Also, use {args.model}/llava.projector to prepare a llava-encoder.gguf file.")`