eacortes commited on
Commit
4ef75d1
·
verified ·
1 Parent(s): d600b0e

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,11 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ model.rknn filter=lfs diff=lfs merge=lfs -text
37
+ model_b1_s256.rknn filter=lfs diff=lfs merge=lfs -text
38
+ model_b4_s256.rknn filter=lfs diff=lfs merge=lfs -text
39
+ model_b4_s512.rknn filter=lfs diff=lfs merge=lfs -text
40
+ rknn/model_o1.rknn filter=lfs diff=lfs merge=lfs -text
41
+ rknn/model_o2.rknn filter=lfs diff=lfs merge=lfs -text
42
+ rknn/model_o3.rknn filter=lfs diff=lfs merge=lfs -text
43
+ rknn/model_w8a8.rknn filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - sentence-transformers/natural-questions
5
+ language:
6
+ - en
7
+ base_model: cross-encoder/ms-marco-MiniLM-L12-v2
8
+ pipeline_tag: text-ranking
9
+ library_name: rk-transformers
10
+ tags:
11
+ - transformers
12
+ - rknn
13
+ - rockchip
14
+ - npu
15
+ - rk-transformers
16
+ - rk3588
17
+ model_name: ms-marco-MiniLM-L12-v2
18
+ ---
19
+ # ms-marco-MiniLM-L12-v2 (RKNN2)
20
+
21
+ > This is an RKNN-compatible version of the [cross-encoder/ms-marco-MiniLM-L12-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L12-v2) model. It has been optimized for Rockchip NPUs using the [rk-transformers](https://github.com/emapco/rk-transformers) library.
22
+
23
+ <details><summary>Click to see the RKNN model details and usage examples</summary>
24
+
25
+ ## Model Details
26
+
27
+ - **Original Model:** [cross-encoder/ms-marco-MiniLM-L12-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L12-v2)
28
+ - **Target Platform:** rk3588
29
+ - **rknn-toolkit2 Version:** 2.3.2
30
+ - **rk-transformers Version:** 0.1.0
31
+
32
+ ### Available Model Files
33
+
34
+ | Model File | Optimization Level | Quantization | File Size |
35
+ | :--------- | :----------------- | :----------- | :-------- |
36
+ | [model.rknn](./model.rknn) | 0 | float16 | 68.8 MB |
37
+ | [model_b1_s256.rknn](./model_b1_s256.rknn) | 0 | float16 | 67.0 MB |
38
+ | [model_b4_s256.rknn](./model_b4_s256.rknn) | 0 | float16 | 75.1 MB |
39
+ | [model_b4_s512.rknn](./model_b4_s512.rknn) | 0 | float16 | 81.7 MB |
40
+ | [rknn/model_o1.rknn](./rknn/model_o1.rknn) | 1 | float16 | 68.8 MB |
41
+ | [rknn/model_o2.rknn](./rknn/model_o2.rknn) | 2 | float16 | 68.8 MB |
42
+ | [rknn/model_o3.rknn](./rknn/model_o3.rknn) | 3 | float16 | 68.8 MB |
43
+ | [rknn/model_w8a8.rknn](./rknn/model_w8a8.rknn) | 0 | w8a8 | 36.5 MB |
44
+
45
+ ## Usage
46
+
47
+ ### Installation
48
+
49
+ Install `rk-transformers` to use this model:
50
+
51
+ ```bash
52
+ pip install rk-transformers
53
+ ```
54
+
55
+ #### RKTransformers API
56
+
57
+ ```python
58
+ from rktransformers import RKRTModelForSequenceClassification
59
+ from transformers import AutoTokenizer
60
+
61
+ # Load tokenizer and model
62
+ tokenizer = AutoTokenizer.from_pretrained("rk-transformers/ms-marco-MiniLM-L12-v2")
63
+ model = RKRTModelForSequenceClassification.from_pretrained(
64
+ "rk-transformers/ms-marco-MiniLM-L12-v2",
65
+ platform="rk3588",
66
+ core_mask="auto",
67
+ )
68
+
69
+ # Tokenize and run inference
70
+ inputs = tokenizer(
71
+ ["Sample text for encoding"],
72
+ padding="max_length",
73
+ max_length=512,
74
+ truncation=True,
75
+ return_tensors="np"
76
+ )
77
+
78
+ outputs = model(**inputs)
79
+ print(outputs.shape)
80
+
81
+ # Load specific optimized/quantized model file
82
+ model = RKRTModelForSequenceClassification.from_pretrained(
83
+ "rk-transformers/ms-marco-MiniLM-L12-v2",
84
+ platform="rk3588",
85
+ file_name="rknn/model_w8a8.rknn"
86
+ )
87
+ ```
88
+
89
+ ## Configuration
90
+
91
+ The full configuration for all exported RKNN models is available in the [rknn.json](./rknn.json) file.
92
+
93
+ </details>
94
+ # Cross-Encoder for MS Marco
95
+
96
+ This model was trained on the [MS Marco Passage Ranking](https://github.com/microsoft/MSMARCO-Passage-Ranking) task.
97
+
98
+ The model can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See [SBERT.net Retrieve & Re-rank](https://www.sbert.net/examples/applications/retrieve_rerank/README.html) for more details. The training code is available here: [SBERT.net Training MS Marco](https://github.com/UKPLab/sentence-transformers/tree/master/examples/cross_encoder/training/ms_marco)
99
+
100
+
101
+ ## Usage with SentenceTransformers
102
+
103
+ The usage is easy when you have [SentenceTransformers](https://www.sbert.net/) installed. Then you can use the pre-trained models like this:
104
+ ```python
105
+ from sentence_transformers import CrossEncoder
106
+
107
+ model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L12-v2')
108
+ scores = model.predict([
109
+ ("How many people live in Berlin?", "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."),
110
+ ("How many people live in Berlin?", "Berlin is well known for its museums."),
111
+ ])
112
+ print(scores)
113
+ # [ 9.218911 -4.0780287]
114
+ ```
115
+
116
+
117
+ ## Usage with Transformers
118
+
119
+ ```python
120
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
121
+ import torch
122
+
123
+ model = AutoModelForSequenceClassification.from_pretrained('cross-encoder/ms-marco-MiniLM-L12-v2')
124
+ tokenizer = AutoTokenizer.from_pretrained('cross-encoder/ms-marco-MiniLM-L12-v2')
125
+
126
+ features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'], padding=True, truncation=True, return_tensors="pt")
127
+
128
+ model.eval()
129
+ with torch.no_grad():
130
+ scores = model(**features).logits
131
+ print(scores)
132
+ ```
133
+
134
+
135
+
136
+ ## Performance
137
+ In the following table, we provide various pre-trained Cross-Encoders together with their performance on the [TREC Deep Learning 2019](https://microsoft.github.io/TREC-2019-Deep-Learning/) and the [MS Marco Passage Reranking](https://github.com/microsoft/MSMARCO-Passage-Ranking/) dataset.
138
+
139
+
140
+ | Model-Name | NDCG@10 (TREC DL 19) | MRR@10 (MS Marco Dev) | Docs / Sec |
141
+ | ------------- |:-------------| -----| --- |
142
+ | **Version 2 models** | | |
143
+ | cross-encoder/ms-marco-TinyBERT-L2-v2 | 69.84 | 32.56 | 9000
144
+ | cross-encoder/ms-marco-MiniLM-L2-v2 | 71.01 | 34.85 | 4100
145
+ | cross-encoder/ms-marco-MiniLM-L4-v2 | 73.04 | 37.70 | 2500
146
+ | cross-encoder/ms-marco-MiniLM-L6-v2 | 74.30 | 39.01 | 1800
147
+ | cross-encoder/ms-marco-MiniLM-L12-v2 | 74.31 | 39.02 | 960
148
+ | **Version 1 models** | | |
149
+ | cross-encoder/ms-marco-TinyBERT-L2 | 67.43 | 30.15 | 9000
150
+ | cross-encoder/ms-marco-TinyBERT-L4 | 68.09 | 34.50 | 2900
151
+ | cross-encoder/ms-marco-TinyBERT-L6 | 69.57 | 36.13 | 680
152
+ | cross-encoder/ms-marco-electra-base | 71.99 | 36.41 | 340
153
+ | **Other models** | | |
154
+ | nboost/pt-tinybert-msmarco | 63.63 | 28.80 | 2900
155
+ | nboost/pt-bert-base-uncased-msmarco | 70.94 | 34.75 | 340
156
+ | nboost/pt-bert-large-msmarco | 73.36 | 36.48 | 100
157
+ | Capreolus/electra-base-msmarco | 71.23 | 36.89 | 340
158
+ | amberoad/bert-multilingual-passage-reranking-msmarco | 68.40 | 35.54 | 330
159
+ | sebastian-hofstaetter/distilbert-cat-margin_mse-T2-msmarco | 72.82 | 37.88 | 720
160
+
161
+ Note: Runtime was computed on a V100 GPU.
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "sbert_ce_default_activation_function": "torch.nn.modules.linear.Identity",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.55.4",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
model.rknn ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4bada98d5ef1f57199733bceeb9b348a061eb17b77e444b68cca1557ef64b52b
3
+ size 72099070
model_b1_s256.rknn ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:33291ae8d7d9e04ae32a1c10cb9de3bde30719f33061b48705b223a006a54551
3
+ size 70270718
model_b4_s256.rknn ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:872b6e2550a0cd9ed4de28fc86d62b3af6227fdae378f84df5fddb32334d5724
3
+ size 78763262
model_b4_s512.rknn ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c640bce5951ca71d22756d003eaccac10159d40e489f15b170c2c781a88fd916
3
+ size 85670846
models--cross-encoder--ms-marco-MiniLM-L12-v2/.no_exist/7b0235231ca2674cb8ca8f022859a6eba2b1c968/modules.json ADDED
File without changes
rknn.json ADDED
@@ -0,0 +1,358 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_b1_s256.rknn": {
3
+ "rktransformers_version": "0.1.0",
4
+ "model_input_names": [
5
+ "input_ids",
6
+ "attention_mask",
7
+ "token_type_ids"
8
+ ],
9
+ "batch_size": 1,
10
+ "max_seq_length": 256,
11
+ "float_dtype": "float16",
12
+ "target_platform": "rk3588",
13
+ "single_core_mode": false,
14
+ "mean_values": null,
15
+ "std_values": null,
16
+ "custom_string": null,
17
+ "inputs_yuv_fmt": null,
18
+ "dynamic_input": null,
19
+ "opset": 19,
20
+ "task": "sequence-classification",
21
+ "quantization": {
22
+ "do_quantization": false,
23
+ "dataset_name": null,
24
+ "dataset_subset": null,
25
+ "dataset_size": 128,
26
+ "dataset_split": null,
27
+ "dataset_columns": null,
28
+ "quantized_dtype": "w8a8",
29
+ "quantized_algorithm": "normal",
30
+ "quantized_method": "channel",
31
+ "quantized_hybrid_level": 0,
32
+ "quant_img_RGB2BGR": false,
33
+ "auto_hybrid_cos_thresh": 0.98,
34
+ "auto_hybrid_euc_thresh": null
35
+ },
36
+ "optimization": {
37
+ "optimization_level": 0,
38
+ "enable_flash_attention": true,
39
+ "remove_weight": false,
40
+ "compress_weight": false,
41
+ "remove_reshape": false,
42
+ "sparse_infer": false,
43
+ "model_pruning": false
44
+ }
45
+ },
46
+ "model_b4_s256.rknn": {
47
+ "rktransformers_version": "0.1.0",
48
+ "model_input_names": [
49
+ "input_ids",
50
+ "attention_mask",
51
+ "token_type_ids"
52
+ ],
53
+ "batch_size": 4,
54
+ "max_seq_length": 256,
55
+ "float_dtype": "float16",
56
+ "target_platform": "rk3588",
57
+ "single_core_mode": false,
58
+ "mean_values": null,
59
+ "std_values": null,
60
+ "custom_string": null,
61
+ "inputs_yuv_fmt": null,
62
+ "dynamic_input": null,
63
+ "opset": 19,
64
+ "task": "sequence-classification",
65
+ "quantization": {
66
+ "do_quantization": false,
67
+ "dataset_name": null,
68
+ "dataset_subset": null,
69
+ "dataset_size": 128,
70
+ "dataset_split": null,
71
+ "dataset_columns": null,
72
+ "quantized_dtype": "w8a8",
73
+ "quantized_algorithm": "normal",
74
+ "quantized_method": "channel",
75
+ "quantized_hybrid_level": 0,
76
+ "quant_img_RGB2BGR": false,
77
+ "auto_hybrid_cos_thresh": 0.98,
78
+ "auto_hybrid_euc_thresh": null
79
+ },
80
+ "optimization": {
81
+ "optimization_level": 0,
82
+ "enable_flash_attention": true,
83
+ "remove_weight": false,
84
+ "compress_weight": false,
85
+ "remove_reshape": false,
86
+ "sparse_infer": false,
87
+ "model_pruning": false
88
+ }
89
+ },
90
+ "model.rknn": {
91
+ "rktransformers_version": "0.1.0",
92
+ "model_input_names": [
93
+ "input_ids",
94
+ "attention_mask",
95
+ "token_type_ids"
96
+ ],
97
+ "batch_size": 1,
98
+ "max_seq_length": 512,
99
+ "float_dtype": "float16",
100
+ "target_platform": "rk3588",
101
+ "single_core_mode": false,
102
+ "mean_values": null,
103
+ "std_values": null,
104
+ "custom_string": null,
105
+ "inputs_yuv_fmt": null,
106
+ "dynamic_input": null,
107
+ "opset": 19,
108
+ "task": "sequence-classification",
109
+ "quantization": {
110
+ "do_quantization": false,
111
+ "dataset_name": null,
112
+ "dataset_subset": null,
113
+ "dataset_size": 128,
114
+ "dataset_split": null,
115
+ "dataset_columns": null,
116
+ "quantized_dtype": "w8a8",
117
+ "quantized_algorithm": "normal",
118
+ "quantized_method": "channel",
119
+ "quantized_hybrid_level": 0,
120
+ "quant_img_RGB2BGR": false,
121
+ "auto_hybrid_cos_thresh": 0.98,
122
+ "auto_hybrid_euc_thresh": null
123
+ },
124
+ "optimization": {
125
+ "optimization_level": 0,
126
+ "enable_flash_attention": true,
127
+ "remove_weight": false,
128
+ "compress_weight": false,
129
+ "remove_reshape": false,
130
+ "sparse_infer": false,
131
+ "model_pruning": false
132
+ }
133
+ },
134
+ "model_b4_s512.rknn": {
135
+ "rktransformers_version": "0.1.0",
136
+ "model_input_names": [
137
+ "input_ids",
138
+ "attention_mask",
139
+ "token_type_ids"
140
+ ],
141
+ "batch_size": 4,
142
+ "max_seq_length": 512,
143
+ "float_dtype": "float16",
144
+ "target_platform": "rk3588",
145
+ "single_core_mode": false,
146
+ "mean_values": null,
147
+ "std_values": null,
148
+ "custom_string": null,
149
+ "inputs_yuv_fmt": null,
150
+ "dynamic_input": null,
151
+ "opset": 19,
152
+ "task": "sequence-classification",
153
+ "quantization": {
154
+ "do_quantization": false,
155
+ "dataset_name": null,
156
+ "dataset_subset": null,
157
+ "dataset_size": 128,
158
+ "dataset_split": null,
159
+ "dataset_columns": null,
160
+ "quantized_dtype": "w8a8",
161
+ "quantized_algorithm": "normal",
162
+ "quantized_method": "channel",
163
+ "quantized_hybrid_level": 0,
164
+ "quant_img_RGB2BGR": false,
165
+ "auto_hybrid_cos_thresh": 0.98,
166
+ "auto_hybrid_euc_thresh": null
167
+ },
168
+ "optimization": {
169
+ "optimization_level": 0,
170
+ "enable_flash_attention": true,
171
+ "remove_weight": false,
172
+ "compress_weight": false,
173
+ "remove_reshape": false,
174
+ "sparse_infer": false,
175
+ "model_pruning": false
176
+ }
177
+ },
178
+ "rknn/model_o1.rknn": {
179
+ "rktransformers_version": "0.1.0",
180
+ "model_input_names": [
181
+ "input_ids",
182
+ "attention_mask",
183
+ "token_type_ids"
184
+ ],
185
+ "batch_size": 1,
186
+ "max_seq_length": 512,
187
+ "float_dtype": "float16",
188
+ "target_platform": "rk3588",
189
+ "single_core_mode": false,
190
+ "mean_values": null,
191
+ "std_values": null,
192
+ "custom_string": null,
193
+ "inputs_yuv_fmt": null,
194
+ "dynamic_input": null,
195
+ "opset": 19,
196
+ "task": "sequence-classification",
197
+ "quantization": {
198
+ "do_quantization": false,
199
+ "dataset_name": null,
200
+ "dataset_subset": null,
201
+ "dataset_size": 128,
202
+ "dataset_split": null,
203
+ "dataset_columns": null,
204
+ "quantized_dtype": "w8a8",
205
+ "quantized_algorithm": "normal",
206
+ "quantized_method": "channel",
207
+ "quantized_hybrid_level": 0,
208
+ "quant_img_RGB2BGR": false,
209
+ "auto_hybrid_cos_thresh": 0.98,
210
+ "auto_hybrid_euc_thresh": null
211
+ },
212
+ "optimization": {
213
+ "optimization_level": 1,
214
+ "enable_flash_attention": true,
215
+ "remove_weight": false,
216
+ "compress_weight": false,
217
+ "remove_reshape": false,
218
+ "sparse_infer": false,
219
+ "model_pruning": false
220
+ }
221
+ },
222
+ "rknn/model_o2.rknn": {
223
+ "rktransformers_version": "0.1.0",
224
+ "model_input_names": [
225
+ "input_ids",
226
+ "attention_mask",
227
+ "token_type_ids"
228
+ ],
229
+ "batch_size": 1,
230
+ "max_seq_length": 512,
231
+ "float_dtype": "float16",
232
+ "target_platform": "rk3588",
233
+ "single_core_mode": false,
234
+ "mean_values": null,
235
+ "std_values": null,
236
+ "custom_string": null,
237
+ "inputs_yuv_fmt": null,
238
+ "dynamic_input": null,
239
+ "opset": 19,
240
+ "task": "sequence-classification",
241
+ "quantization": {
242
+ "do_quantization": false,
243
+ "dataset_name": null,
244
+ "dataset_subset": null,
245
+ "dataset_size": 128,
246
+ "dataset_split": null,
247
+ "dataset_columns": null,
248
+ "quantized_dtype": "w8a8",
249
+ "quantized_algorithm": "normal",
250
+ "quantized_method": "channel",
251
+ "quantized_hybrid_level": 0,
252
+ "quant_img_RGB2BGR": false,
253
+ "auto_hybrid_cos_thresh": 0.98,
254
+ "auto_hybrid_euc_thresh": null
255
+ },
256
+ "optimization": {
257
+ "optimization_level": 2,
258
+ "enable_flash_attention": true,
259
+ "remove_weight": false,
260
+ "compress_weight": false,
261
+ "remove_reshape": false,
262
+ "sparse_infer": false,
263
+ "model_pruning": false
264
+ }
265
+ },
266
+ "rknn/model_o3.rknn": {
267
+ "rktransformers_version": "0.1.0",
268
+ "model_input_names": [
269
+ "input_ids",
270
+ "attention_mask",
271
+ "token_type_ids"
272
+ ],
273
+ "batch_size": 1,
274
+ "max_seq_length": 512,
275
+ "float_dtype": "float16",
276
+ "target_platform": "rk3588",
277
+ "single_core_mode": false,
278
+ "mean_values": null,
279
+ "std_values": null,
280
+ "custom_string": null,
281
+ "inputs_yuv_fmt": null,
282
+ "dynamic_input": null,
283
+ "opset": 19,
284
+ "task": "sequence-classification",
285
+ "quantization": {
286
+ "do_quantization": false,
287
+ "dataset_name": null,
288
+ "dataset_subset": null,
289
+ "dataset_size": 128,
290
+ "dataset_split": null,
291
+ "dataset_columns": null,
292
+ "quantized_dtype": "w8a8",
293
+ "quantized_algorithm": "normal",
294
+ "quantized_method": "channel",
295
+ "quantized_hybrid_level": 0,
296
+ "quant_img_RGB2BGR": false,
297
+ "auto_hybrid_cos_thresh": 0.98,
298
+ "auto_hybrid_euc_thresh": null
299
+ },
300
+ "optimization": {
301
+ "optimization_level": 3,
302
+ "enable_flash_attention": true,
303
+ "remove_weight": false,
304
+ "compress_weight": false,
305
+ "remove_reshape": false,
306
+ "sparse_infer": false,
307
+ "model_pruning": false
308
+ }
309
+ },
310
+ "rknn/model_w8a8.rknn": {
311
+ "rktransformers_version": "0.1.0",
312
+ "model_input_names": [
313
+ "input_ids",
314
+ "attention_mask",
315
+ "token_type_ids"
316
+ ],
317
+ "batch_size": 1,
318
+ "max_seq_length": 512,
319
+ "float_dtype": "float16",
320
+ "target_platform": "rk3588",
321
+ "single_core_mode": false,
322
+ "mean_values": null,
323
+ "std_values": null,
324
+ "custom_string": null,
325
+ "inputs_yuv_fmt": null,
326
+ "dynamic_input": null,
327
+ "opset": 19,
328
+ "task": "sequence-classification",
329
+ "quantization": {
330
+ "do_quantization": true,
331
+ "dataset_name": "sentence-transformers/natural-questions",
332
+ "dataset_subset": null,
333
+ "dataset_size": 1024,
334
+ "dataset_split": [
335
+ "train"
336
+ ],
337
+ "dataset_columns": [
338
+ "answer"
339
+ ],
340
+ "quantized_dtype": "w8a8",
341
+ "quantized_algorithm": "normal",
342
+ "quantized_method": "channel",
343
+ "quantized_hybrid_level": 0,
344
+ "quant_img_RGB2BGR": false,
345
+ "auto_hybrid_cos_thresh": 0.98,
346
+ "auto_hybrid_euc_thresh": null
347
+ },
348
+ "optimization": {
349
+ "optimization_level": 0,
350
+ "enable_flash_attention": true,
351
+ "remove_weight": false,
352
+ "compress_weight": false,
353
+ "remove_reshape": false,
354
+ "sparse_infer": false,
355
+ "model_pruning": false
356
+ }
357
+ }
358
+ }
rknn/model_o1.rknn ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d90dcbea5b184df2830e4a9a84c0135d335df1b43b4c8b2e36ba26d4f654016
3
+ size 72099070
rknn/model_o2.rknn ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b0813cfb3c5bdbab369477ee781addd58a069246ac34b71b6e8c38255070aeb
3
+ size 72099070
rknn/model_o3.rknn ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd402bb19ef3b0a952eafa820aa6b2c9c369668256334b6ec314e7e7436c86ae
3
+ size 72099070
rknn/model_w8a8.rknn ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f39839582190c0b2e6f0c260994f946c8ea973ccc456635111e721f1e6e6843
3
+ size 38286411
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff