r-g2-2024
/

Llama-3.1-70B-Instruct-multimodal-JP-Graph-v0.1

Visual Question Answering

Model card Files Files and versions

r-g2-2024 commited on Jul 27

Commit

cc16e9c

·

verified ·

1 Parent(s): 97c0e30

Update README.md

Files changed (1) hide show

README.md +47 -3

README.md CHANGED Viewed

@@ -1,3 +1,47 @@
----
-license: unknown
----

+---
+license: other
+language:
+- ja
+base_model:
+- tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3
+- Qwen/Qwen2.5-VL-7B-Instruct
+pipeline_tag: visual-question-answering
+---
+# Llama-3.1-70B-Instruct-multimodal-JP-Graph - Built with Llama
+Llama-3.1-70B-Instruct-multimodal-JP-Graph is a Japanese Large Vision Language Model.
+This model is based on [Llama-3.1-Swallow-70B](https://huggingface.co/tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3) and Image Encoder of [Qwen2-VL-7B](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct).
+# How to use
+### 1. Install LLaVA-NeXT
+- First, please install LLaVA-NeXT by following the instructions at the [URL](https://github.com/LLaVA-VL/LLaVA-NeXT).
+```sh
+git clone https://github.com/LLaVA-VL/LLaVA-NeXT
+cd LLaVA-NeXT
+conda create -n llava python=3.10 -y
+conda activate llava
+pip install --upgrade pip  # Enable PEP 660 support.
+pip install -e ".[train]"
+```
+### 2. Install dependencies
+```sh
+pip install flash-attn==2.6.3
+pip install transformers==4.45.2
+```
+### 3. Modify LLaVA-NeXT
+- Modify the LLaVA-NeXT code as follows.
+  - Create the LLaVA-NeXT/llava/model/multimodal_encoder/qwen2_vl directory and copy the contents of the attached qwen2_vl directory into it.
+  - Overwrite LLaVA-NeXT/llava/model/multimodal_encoder/builder.py with the attached "builder.py".
+  - Copy the attached "qwen2vl_encoder.py" into LLaVA-NeXT/llava/model/multimodal_encoder/.
+  - Overwrite LLaVA-NeXT/llava/model/language_model/llava_llama.py with the attached "llava_llama.py".
+  - Overwrite LLaVA-NeXT/llava/model/llava_arch.py with the attached "llava_arch.py".
+  - Overwrite LLaVA-NeXT/llava/conversation.py with the attached "conversation.py".
+### 4.  Inference
+The following script loads the model and allows inference.