Upload folder using huggingface_hub

Browse files

Files changed (17) hide show

.gitattributes +8 -0
README.md +161 -0
config.json +32 -0
model.rknn +3 -0
model_b1_s256.rknn +3 -0
model_b4_s256.rknn +3 -0
model_b4_s512.rknn +3 -0
models--cross-encoder--ms-marco-MiniLM-L12-v2/.no_exist/7b0235231ca2674cb8ca8f022859a6eba2b1c968/modules.json +0 -0
rknn.json +358 -0
rknn/model_o1.rknn +3 -0
rknn/model_o2.rknn +3 -0
rknn/model_o3.rknn +3 -0
rknn/model_w8a8.rknn +3 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +58 -0
vocab.txt +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,11 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+model.rknn filter=lfs diff=lfs merge=lfs -text
+model_b1_s256.rknn filter=lfs diff=lfs merge=lfs -text
+model_b4_s256.rknn filter=lfs diff=lfs merge=lfs -text
+model_b4_s512.rknn filter=lfs diff=lfs merge=lfs -text
+rknn/model_o1.rknn filter=lfs diff=lfs merge=lfs -text
+rknn/model_o2.rknn filter=lfs diff=lfs merge=lfs -text
+rknn/model_o3.rknn filter=lfs diff=lfs merge=lfs -text
+rknn/model_w8a8.rknn filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,161 @@

+---
+license: apache-2.0
+datasets:
+- sentence-transformers/natural-questions
+language:
+- en
+base_model: cross-encoder/ms-marco-MiniLM-L12-v2
+pipeline_tag: text-ranking
+library_name: rk-transformers
+tags:
+- transformers
+- rknn
+- rockchip
+- npu
+- rk-transformers
+- rk3588
+model_name: ms-marco-MiniLM-L12-v2
+---
+# ms-marco-MiniLM-L12-v2 (RKNN2)
+> This is an RKNN-compatible version of the [cross-encoder/ms-marco-MiniLM-L12-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L12-v2) model. It has been optimized for Rockchip NPUs using the [rk-transformers](https://github.com/emapco/rk-transformers) library.
+<details><summary>Click to see the RKNN model details and usage examples</summary>
+## Model Details
+- **Original Model:** [cross-encoder/ms-marco-MiniLM-L12-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L12-v2)
+- **Target Platform:** rk3588
+- **rknn-toolkit2 Version:** 2.3.2
+- **rk-transformers Version:** 0.1.0
+### Available Model Files
+| Model File | Optimization Level | Quantization | File Size |
+| :--------- | :----------------- | :----------- | :-------- |
+| [model.rknn](./model.rknn) | 0 | float16 | 68.8 MB |
+| [model_b1_s256.rknn](./model_b1_s256.rknn) | 0 | float16 | 67.0 MB |
+| [model_b4_s256.rknn](./model_b4_s256.rknn) | 0 | float16 | 75.1 MB |
+| [model_b4_s512.rknn](./model_b4_s512.rknn) | 0 | float16 | 81.7 MB |
+| [rknn/model_o1.rknn](./rknn/model_o1.rknn) | 1 | float16 | 68.8 MB |
+| [rknn/model_o2.rknn](./rknn/model_o2.rknn) | 2 | float16 | 68.8 MB |
+| [rknn/model_o3.rknn](./rknn/model_o3.rknn) | 3 | float16 | 68.8 MB |
+| [rknn/model_w8a8.rknn](./rknn/model_w8a8.rknn) | 0 | w8a8 | 36.5 MB |
+## Usage
+### Installation
+Install `rk-transformers` to use this model:
+```bash
+pip install rk-transformers
+```
+#### RKTransformers API
+```python
+from rktransformers import RKRTModelForSequenceClassification
+from transformers import AutoTokenizer
+# Load tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained("rk-transformers/ms-marco-MiniLM-L12-v2")
+model = RKRTModelForSequenceClassification.from_pretrained(
+    "rk-transformers/ms-marco-MiniLM-L12-v2",
+    platform="rk3588",
+    core_mask="auto",
+)
+# Tokenize and run inference
+inputs = tokenizer(
+    ["Sample text for encoding"],
+    padding="max_length",
+    max_length=512,
+    truncation=True,
+    return_tensors="np"
+)
+outputs = model(**inputs)
+print(outputs.shape)
+# Load specific optimized/quantized model file
+model = RKRTModelForSequenceClassification.from_pretrained(
+    "rk-transformers/ms-marco-MiniLM-L12-v2",
+    platform="rk3588",
+    file_name="rknn/model_w8a8.rknn"
+)
+```
+## Configuration
+The full configuration for all exported RKNN models is available in the [rknn.json](./rknn.json) file.
+</details>
+# Cross-Encoder for MS Marco
+This model was trained on the [MS Marco Passage Ranking](https://github.com/microsoft/MSMARCO-Passage-Ranking) task.
+The model can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See [SBERT.net Retrieve & Re-rank](https://www.sbert.net/examples/applications/retrieve_rerank/README.html) for more details. The training code is available here: [SBERT.net Training MS Marco](https://github.com/UKPLab/sentence-transformers/tree/master/examples/cross_encoder/training/ms_marco)
+## Usage with SentenceTransformers
+The usage is easy when you have [SentenceTransformers](https://www.sbert.net/) installed. Then you can use the pre-trained models like this:
+```python
+from sentence_transformers import CrossEncoder
+model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L12-v2')
+scores = model.predict([
+    ("How many people live in Berlin?", "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."),
+    ("How many people live in Berlin?", "Berlin is well known for its museums."),
+])
+print(scores)
+# [ 9.218911  -4.0780287]
+```
+## Usage with Transformers
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+model = AutoModelForSequenceClassification.from_pretrained('cross-encoder/ms-marco-MiniLM-L12-v2')
+tokenizer = AutoTokenizer.from_pretrained('cross-encoder/ms-marco-MiniLM-L12-v2')
+features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="pt")
+model.eval()
+with torch.no_grad():
+    scores = model(**features).logits
+    print(scores)
+```
+## Performance
+In the following table, we provide various pre-trained Cross-Encoders together with their performance on the [TREC Deep Learning 2019](https://microsoft.github.io/TREC-2019-Deep-Learning/) and the [MS Marco Passage Reranking](https://github.com/microsoft/MSMARCO-Passage-Ranking/) dataset.
+| Model-Name        | NDCG@10 (TREC DL 19) | MRR@10 (MS Marco Dev)  | Docs / Sec |
+| ------------- |:-------------| -----| --- |
+| **Version 2 models** | | |
+| cross-encoder/ms-marco-TinyBERT-L2-v2 | 69.84 | 32.56 | 9000
+| cross-encoder/ms-marco-MiniLM-L2-v2 | 71.01 | 34.85 | 4100
+| cross-encoder/ms-marco-MiniLM-L4-v2 | 73.04 | 37.70 | 2500
+| cross-encoder/ms-marco-MiniLM-L6-v2 | 74.30 | 39.01 | 1800
+| cross-encoder/ms-marco-MiniLM-L12-v2 | 74.31 | 39.02 | 960
+| **Version 1 models** | | |
+| cross-encoder/ms-marco-TinyBERT-L2  | 67.43 | 30.15  | 9000
+| cross-encoder/ms-marco-TinyBERT-L4  | 68.09 | 34.50  | 2900
+| cross-encoder/ms-marco-TinyBERT-L6 |  69.57 | 36.13  | 680
+| cross-encoder/ms-marco-electra-base | 71.99 | 36.41 | 340
+| **Other models** | | |
+| nboost/pt-tinybert-msmarco | 63.63 | 28.80 | 2900
+| nboost/pt-bert-base-uncased-msmarco | 70.94 | 34.75 | 340
+| nboost/pt-bert-large-msmarco | 73.36 | 36.48 | 100
+| Capreolus/electra-base-msmarco | 71.23 | 36.89 | 340
+| amberoad/bert-multilingual-passage-reranking-msmarco | 68.40 | 35.54 | 330
+| sebastian-hofstaetter/distilbert-cat-margin_mse-T2-msmarco | 72.82 | 37.88 | 720
+ Note: Runtime was computed on a V100 GPU.

config.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 384,
+  "id2label": {
+    "0": "LABEL_0"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "label2id": {
+    "LABEL_0": 0
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "sbert_ce_default_activation_function": "torch.nn.modules.linear.Identity",
+  "torch_dtype": "float32",
+  "transformers_version": "4.55.4",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

model.rknn ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4bada98d5ef1f57199733bceeb9b348a061eb17b77e444b68cca1557ef64b52b
+size 72099070

model_b1_s256.rknn ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:33291ae8d7d9e04ae32a1c10cb9de3bde30719f33061b48705b223a006a54551
+size 70270718

model_b4_s256.rknn ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:872b6e2550a0cd9ed4de28fc86d62b3af6227fdae378f84df5fddb32334d5724
+size 78763262

model_b4_s512.rknn ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c640bce5951ca71d22756d003eaccac10159d40e489f15b170c2c781a88fd916
+size 85670846

models--cross-encoder--ms-marco-MiniLM-L12-v2/.no_exist/7b0235231ca2674cb8ca8f022859a6eba2b1c968/modules.json ADDED Viewed

File without changes

rknn.json ADDED Viewed

	@@ -0,0 +1,358 @@

+{
+    "model_b1_s256.rknn": {
+        "rktransformers_version": "0.1.0",
+        "model_input_names": [
+            "input_ids",
+            "attention_mask",
+            "token_type_ids"
+        ],
+        "batch_size": 1,
+        "max_seq_length": 256,
+        "float_dtype": "float16",
+        "target_platform": "rk3588",
+        "single_core_mode": false,
+        "mean_values": null,
+        "std_values": null,
+        "custom_string": null,
+        "inputs_yuv_fmt": null,
+        "dynamic_input": null,
+        "opset": 19,
+        "task": "sequence-classification",
+        "quantization": {
+            "do_quantization": false,
+            "dataset_name": null,
+            "dataset_subset": null,
+            "dataset_size": 128,
+            "dataset_split": null,
+            "dataset_columns": null,
+            "quantized_dtype": "w8a8",
+            "quantized_algorithm": "normal",
+            "quantized_method": "channel",
+            "quantized_hybrid_level": 0,
+            "quant_img_RGB2BGR": false,
+            "auto_hybrid_cos_thresh": 0.98,
+            "auto_hybrid_euc_thresh": null
+        },
+        "optimization": {
+            "optimization_level": 0,
+            "enable_flash_attention": true,
+            "remove_weight": false,
+            "compress_weight": false,
+            "remove_reshape": false,
+            "sparse_infer": false,
+            "model_pruning": false
+        }
+    },
+    "model_b4_s256.rknn": {
+        "rktransformers_version": "0.1.0",
+        "model_input_names": [
+            "input_ids",
+            "attention_mask",
+            "token_type_ids"
+        ],
+        "batch_size": 4,
+        "max_seq_length": 256,
+        "float_dtype": "float16",
+        "target_platform": "rk3588",
+        "single_core_mode": false,
+        "mean_values": null,
+        "std_values": null,
+        "custom_string": null,
+        "inputs_yuv_fmt": null,
+        "dynamic_input": null,
+        "opset": 19,
+        "task": "sequence-classification",
+        "quantization": {
+            "do_quantization": false,
+            "dataset_name": null,
+            "dataset_subset": null,
+            "dataset_size": 128,
+            "dataset_split": null,
+            "dataset_columns": null,
+            "quantized_dtype": "w8a8",
+            "quantized_algorithm": "normal",
+            "quantized_method": "channel",
+            "quantized_hybrid_level": 0,
+            "quant_img_RGB2BGR": false,
+            "auto_hybrid_cos_thresh": 0.98,
+            "auto_hybrid_euc_thresh": null
+        },
+        "optimization": {
+            "optimization_level": 0,
+            "enable_flash_attention": true,
+            "remove_weight": false,
+            "compress_weight": false,
+            "remove_reshape": false,
+            "sparse_infer": false,
+            "model_pruning": false
+        }
+    },
+    "model.rknn": {
+        "rktransformers_version": "0.1.0",
+        "model_input_names": [
+            "input_ids",
+            "attention_mask",
+            "token_type_ids"
+        ],
+        "batch_size": 1,
+        "max_seq_length": 512,
+        "float_dtype": "float16",
+        "target_platform": "rk3588",
+        "single_core_mode": false,
+        "mean_values": null,
+        "std_values": null,
+        "custom_string": null,
+        "inputs_yuv_fmt": null,
+        "dynamic_input": null,
+        "opset": 19,
+        "task": "sequence-classification",
+        "quantization": {
+            "do_quantization": false,
+            "dataset_name": null,
+            "dataset_subset": null,
+            "dataset_size": 128,
+            "dataset_split": null,
+            "dataset_columns": null,
+            "quantized_dtype": "w8a8",
+            "quantized_algorithm": "normal",
+            "quantized_method": "channel",
+            "quantized_hybrid_level": 0,
+            "quant_img_RGB2BGR": false,
+            "auto_hybrid_cos_thresh": 0.98,
+            "auto_hybrid_euc_thresh": null
+        },
+        "optimization": {
+            "optimization_level": 0,
+            "enable_flash_attention": true,
+            "remove_weight": false,
+            "compress_weight": false,
+            "remove_reshape": false,
+            "sparse_infer": false,
+            "model_pruning": false
+        }
+    },
+    "model_b4_s512.rknn": {
+        "rktransformers_version": "0.1.0",
+        "model_input_names": [
+            "input_ids",
+            "attention_mask",
+            "token_type_ids"
+        ],
+        "batch_size": 4,
+        "max_seq_length": 512,
+        "float_dtype": "float16",
+        "target_platform": "rk3588",
+        "single_core_mode": false,
+        "mean_values": null,
+        "std_values": null,
+        "custom_string": null,
+        "inputs_yuv_fmt": null,
+        "dynamic_input": null,
+        "opset": 19,
+        "task": "sequence-classification",
+        "quantization": {
+            "do_quantization": false,
+            "dataset_name": null,
+            "dataset_subset": null,
+            "dataset_size": 128,
+            "dataset_split": null,
+            "dataset_columns": null,
+            "quantized_dtype": "w8a8",
+            "quantized_algorithm": "normal",
+            "quantized_method": "channel",
+            "quantized_hybrid_level": 0,
+            "quant_img_RGB2BGR": false,
+            "auto_hybrid_cos_thresh": 0.98,
+            "auto_hybrid_euc_thresh": null
+        },
+        "optimization": {
+            "optimization_level": 0,
+            "enable_flash_attention": true,
+            "remove_weight": false,
+            "compress_weight": false,
+            "remove_reshape": false,
+            "sparse_infer": false,
+            "model_pruning": false
+        }
+    },
+    "rknn/model_o1.rknn": {
+        "rktransformers_version": "0.1.0",
+        "model_input_names": [
+            "input_ids",
+            "attention_mask",
+            "token_type_ids"
+        ],
+        "batch_size": 1,
+        "max_seq_length": 512,
+        "float_dtype": "float16",
+        "target_platform": "rk3588",
+        "single_core_mode": false,
+        "mean_values": null,
+        "std_values": null,
+        "custom_string": null,
+        "inputs_yuv_fmt": null,
+        "dynamic_input": null,
+        "opset": 19,
+        "task": "sequence-classification",
+        "quantization": {
+            "do_quantization": false,
+            "dataset_name": null,
+            "dataset_subset": null,
+            "dataset_size": 128,
+            "dataset_split": null,
+            "dataset_columns": null,
+            "quantized_dtype": "w8a8",
+            "quantized_algorithm": "normal",
+            "quantized_method": "channel",
+            "quantized_hybrid_level": 0,
+            "quant_img_RGB2BGR": false,
+            "auto_hybrid_cos_thresh": 0.98,
+            "auto_hybrid_euc_thresh": null
+        },
+        "optimization": {
+            "optimization_level": 1,
+            "enable_flash_attention": true,
+            "remove_weight": false,
+            "compress_weight": false,
+            "remove_reshape": false,
+            "sparse_infer": false,
+            "model_pruning": false
+        }
+    },
+    "rknn/model_o2.rknn": {
+        "rktransformers_version": "0.1.0",
+        "model_input_names": [
+            "input_ids",
+            "attention_mask",
+            "token_type_ids"
+        ],
+        "batch_size": 1,
+        "max_seq_length": 512,
+        "float_dtype": "float16",
+        "target_platform": "rk3588",
+        "single_core_mode": false,
+        "mean_values": null,
+        "std_values": null,
+        "custom_string": null,
+        "inputs_yuv_fmt": null,
+        "dynamic_input": null,
+        "opset": 19,
+        "task": "sequence-classification",
+        "quantization": {
+            "do_quantization": false,
+            "dataset_name": null,
+            "dataset_subset": null,
+            "dataset_size": 128,
+            "dataset_split": null,
+            "dataset_columns": null,
+            "quantized_dtype": "w8a8",
+            "quantized_algorithm": "normal",
+            "quantized_method": "channel",
+            "quantized_hybrid_level": 0,
+            "quant_img_RGB2BGR": false,
+            "auto_hybrid_cos_thresh": 0.98,
+            "auto_hybrid_euc_thresh": null
+        },
+        "optimization": {
+            "optimization_level": 2,
+            "enable_flash_attention": true,
+            "remove_weight": false,
+            "compress_weight": false,
+            "remove_reshape": false,
+            "sparse_infer": false,
+            "model_pruning": false
+        }
+    },
+    "rknn/model_o3.rknn": {
+        "rktransformers_version": "0.1.0",
+        "model_input_names": [
+            "input_ids",
+            "attention_mask",
+            "token_type_ids"
+        ],
+        "batch_size": 1,
+        "max_seq_length": 512,
+        "float_dtype": "float16",
+        "target_platform": "rk3588",
+        "single_core_mode": false,
+        "mean_values": null,
+        "std_values": null,
+        "custom_string": null,
+        "inputs_yuv_fmt": null,
+        "dynamic_input": null,
+        "opset": 19,
+        "task": "sequence-classification",
+        "quantization": {
+            "do_quantization": false,
+            "dataset_name": null,
+            "dataset_subset": null,
+            "dataset_size": 128,
+            "dataset_split": null,
+            "dataset_columns": null,
+            "quantized_dtype": "w8a8",
+            "quantized_algorithm": "normal",
+            "quantized_method": "channel",
+            "quantized_hybrid_level": 0,
+            "quant_img_RGB2BGR": false,
+            "auto_hybrid_cos_thresh": 0.98,
+            "auto_hybrid_euc_thresh": null
+        },
+        "optimization": {
+            "optimization_level": 3,
+            "enable_flash_attention": true,
+            "remove_weight": false,
+            "compress_weight": false,
+            "remove_reshape": false,
+            "sparse_infer": false,
+            "model_pruning": false
+        }
+    },
+    "rknn/model_w8a8.rknn": {
+        "rktransformers_version": "0.1.0",
+        "model_input_names": [
+            "input_ids",
+            "attention_mask",
+            "token_type_ids"
+        ],
+        "batch_size": 1,
+        "max_seq_length": 512,
+        "float_dtype": "float16",
+        "target_platform": "rk3588",
+        "single_core_mode": false,
+        "mean_values": null,
+        "std_values": null,
+        "custom_string": null,
+        "inputs_yuv_fmt": null,
+        "dynamic_input": null,
+        "opset": 19,
+        "task": "sequence-classification",
+        "quantization": {
+            "do_quantization": true,
+            "dataset_name": "sentence-transformers/natural-questions",
+            "dataset_subset": null,
+            "dataset_size": 1024,
+            "dataset_split": [
+                "train"
+            ],
+            "dataset_columns": [
+                "answer"
+            ],
+            "quantized_dtype": "w8a8",
+            "quantized_algorithm": "normal",
+            "quantized_method": "channel",
+            "quantized_hybrid_level": 0,
+            "quant_img_RGB2BGR": false,
+            "auto_hybrid_cos_thresh": 0.98,
+            "auto_hybrid_euc_thresh": null
+        },
+        "optimization": {
+            "optimization_level": 0,
+            "enable_flash_attention": true,
+            "remove_weight": false,
+            "compress_weight": false,
+            "remove_reshape": false,
+            "sparse_infer": false,
+            "model_pruning": false
+        }
+    }
+}

rknn/model_o1.rknn ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6d90dcbea5b184df2830e4a9a84c0135d335df1b43b4c8b2e36ba26d4f654016
+size 72099070

rknn/model_o2.rknn ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3b0813cfb3c5bdbab369477ee781addd58a069246ac34b71b6e8c38255070aeb
+size 72099070

rknn/model_o3.rknn ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cd402bb19ef3b0a952eafa820aa6b2c9c369668256334b6ec314e7e7436c86ae
+size 72099070

rknn/model_w8a8.rknn ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5f39839582190c0b2e6f0c260994f946c8ea973ccc456635111e721f1e6e6843
+size 38286411

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,58 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "never_split": null,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff