--- license: mit --- # Nano语言模型 - 极小模型系列 用于测试的极小语言模型,有以下几个规格。 | Model Name |BlockSize|VocabSize|Layers| Embd |Q_Heads|KV_Heads|Hidden|NormEps| #Param | |-----------------|---------|---------|------|------|-------|--------|------|-------|--------| |Psycho-230k-base | 512 | 4096 | 8 | 32 | 4 | 2 | 96 | 1e-5 | 229920 | |Nano-230k-base | 512 | 4096 | 8 | 32 | 4 | 2 | 96 | 1e-5 | 229920 | 使用2025年12月构造的[4096词表](https://github.com/bd4sur/Nano/blob/master/tokenizer/tokenizer_4096.json)。 ## 训练参数 Psycho-230k-base ``` { "use_lora": false, "lora_rank": 8, "lora_alpha": 16, "lora_dropout": 0.0, "from_checkpoint": "", "save_checkpoint_to": "/home/bd4sur/ai/Nano/checkpoint", "dataset_path": [ ["/home/bd4sur/ai/Nano/dataset_preprocessed/pt_train_0.base64", "/home/bd4sur/ai/Nano/dataset_preprocessed/pt_val_0.base64"] ], "tokenizer_path": "/home/bd4sur/ai/Nano/tokenizer/tokenizer_4096.json", "random_seed": 39, "batch_size": 256, "gradient_accumulation_steps": 1, "grad_clip": 1.0, "dropout": 0.0, "learning_rate": 5e-4, "weight_decay": 1e-1, "beta1": 0.9, "beta2": 0.95, "decay_lr": true, "warmup_iters": 500, "lr_decay_iters": 1e9, "min_lr": 6e-5, "eval_interval": 100, "log_interval": 10, "eval_iters": 5, "backend": "nccl", "device": "cuda", "sdp_kernel": "flash", "dtype": "bfloat16", "use_amp": true } ``` Nano-230k-base ``` { "use_lora": false, "lora_rank": 8, "lora_alpha": 16, "lora_dropout": 0.0, "from_checkpoint": "", "save_checkpoint_to": "/home/bd4sur/ai/Nano/checkpoint", "dataset_path": [ ["/home/bd4sur/ai/Nano/dataset_preprocessed/pt_1Gtk_512_4096_train.base64", "/home/bd4sur/ai/Nano/dataset_preprocessed/pt_1Gtk_512_4096_valid.base64"] ], "tokenizer_path": "/home/bd4sur/ai/Nano/tokenizer/tokenizer_4096.json", "random_seed": 39, "batch_size": 256, "gradient_accumulation_steps": 1, "grad_clip": 1.0, "dropout": 0.0, "learning_rate": 5e-4, "weight_decay": 1e-1, "beta1": 0.9, "beta2": 0.95, "decay_lr": true, "warmup_iters": 500, "lr_decay_iters": 1e9, "min_lr": 6e-5, "eval_interval": 100, "log_interval": 10, "eval_iters": 5, "backend": "nccl", "device": "cuda", "sdp_kernel": "flash", "dtype": "bfloat16", "use_amp": true } ```