Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,47 @@
|
|
| 1 |
-
---
|
| 2 |
-
license:
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
language:
|
| 4 |
+
- ja
|
| 5 |
+
base_model:
|
| 6 |
+
- tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3
|
| 7 |
+
- Qwen/Qwen2.5-VL-7B-Instruct
|
| 8 |
+
pipeline_tag: visual-question-answering
|
| 9 |
+
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# Llama-3.1-70B-Instruct-multimodal-JP-Graph - Built with Llama
|
| 13 |
+
|
| 14 |
+
Llama-3.1-70B-Instruct-multimodal-JP-Graph is a Japanese Large Vision Language Model.
|
| 15 |
+
This model is based on [Llama-3.1-Swallow-70B](https://huggingface.co/tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3) and Image Encoder of [Qwen2-VL-7B](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct).
|
| 16 |
+
|
| 17 |
+
# How to use
|
| 18 |
+
### 1. Install LLaVA-NeXT
|
| 19 |
+
|
| 20 |
+
- First, please install LLaVA-NeXT by following the instructions at the [URL](https://github.com/LLaVA-VL/LLaVA-NeXT).
|
| 21 |
+
|
| 22 |
+
```sh
|
| 23 |
+
git clone https://github.com/LLaVA-VL/LLaVA-NeXT
|
| 24 |
+
cd LLaVA-NeXT
|
| 25 |
+
conda create -n llava python=3.10 -y
|
| 26 |
+
conda activate llava
|
| 27 |
+
pip install --upgrade pip # Enable PEP 660 support.
|
| 28 |
+
pip install -e ".[train]"
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
### 2. Install dependencies
|
| 32 |
+
```sh
|
| 33 |
+
pip install flash-attn==2.6.3
|
| 34 |
+
pip install transformers==4.45.2
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
### 3. Modify LLaVA-NeXT
|
| 38 |
+
- Modify the LLaVA-NeXT code as follows.
|
| 39 |
+
- Create the LLaVA-NeXT/llava/model/multimodal_encoder/qwen2_vl directory and copy the contents of the attached qwen2_vl directory into it.
|
| 40 |
+
- Overwrite LLaVA-NeXT/llava/model/multimodal_encoder/builder.py with the attached "builder.py".
|
| 41 |
+
- Copy the attached "qwen2vl_encoder.py" into LLaVA-NeXT/llava/model/multimodal_encoder/.
|
| 42 |
+
- Overwrite LLaVA-NeXT/llava/model/language_model/llava_llama.py with the attached "llava_llama.py".
|
| 43 |
+
- Overwrite LLaVA-NeXT/llava/model/llava_arch.py with the attached "llava_arch.py".
|
| 44 |
+
- Overwrite LLaVA-NeXT/llava/conversation.py with the attached "conversation.py".
|
| 45 |
+
|
| 46 |
+
### 4. Inference
|
| 47 |
+
The following script loads the model and allows inference.
|