genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-06_beta0.1_epoch16.0_42

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0560
  • Rewards/chosen: -0.0768
  • Rewards/rejected: 0.0
  • Rewards/accuracies: 0.5500
  • Rewards/margins: -0.0768
  • Logps/rejected: -56.2549
  • Logps/chosen: -30.8380
  • Logits/rejected: -3.3810
  • Logits/chosen: -3.2576

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 16.0

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.6912 0.1117 20 -2.3786 -2.2314 -30.0473 -41.6374 0.6928 0.5 0.0023 0.0023 0.0
0.6775 0.2235 40 -2.3835 -2.2418 -29.7868 -41.3972 0.6781 0.875 0.0283 0.0283 0.0
0.6352 0.3352 60 -2.4506 -2.3140 -28.9355 -40.3581 0.6334 0.9750 0.1135 0.1135 0.0
0.5527 0.4469 80 -2.5961 -2.4815 -26.8917 -38.1349 0.5456 1.0 0.3178 0.3178 0.0
0.4155 0.5587 100 -2.7877 -2.7043 -24.6228 -35.5825 0.4618 1.0 0.5447 0.5447 0.0
0.4148 0.6704 120 -2.9855 -2.9357 -22.7730 -33.9602 0.4124 0.9750 0.7297 0.7297 0.0
0.4048 0.7821 140 -3.0851 -3.0553 -21.8011 -33.0451 0.3880 0.9750 0.8269 0.8269 0.0
0.367 0.8939 160 -3.1732 -3.1657 -20.7670 -32.3537 0.3679 1.0 0.9303 0.9303 0.0
0.3302 1.0056 180 -3.2066 -3.2086 -20.4659 -31.9908 0.3615 0.9750 0.9604 0.9604 0.0
0.3375 1.1173 200 -3.2463 -3.2582 -20.0837 -31.6332 0.3559 0.9750 0.9986 0.9986 0.0
0.3172 1.2291 220 -3.2746 -3.2954 -19.8890 -31.5264 0.3517 0.9750 1.0181 1.0181 0.0
0.3177 1.3408 240 -3.2941 -3.3199 -19.7086 -31.3858 0.3488 0.9750 1.0361 1.0361 0.0
0.318 1.4525 260 -3.3066 -3.3382 -19.4988 -31.3553 0.3468 0.9750 1.0571 1.0571 0.0
0.3004 1.5642 280 -3.3013 -3.3296 -19.4541 -31.2717 0.3439 0.9750 1.0616 1.0616 0.0
0.3579 1.6760 300 -3.3224 -3.3619 -19.2996 -31.1175 0.3442 0.9750 1.0770 1.0770 0.0
0.3457 1.7877 320 -3.3044 -3.3413 -19.3727 -31.2252 0.3402 0.9750 1.0697 1.0697 0.0
0.3362 1.8994 340 -3.3132 -3.3506 -19.2465 -31.0188 0.3390 0.9750 1.0824 1.0824 0.0
0.2611 2.0112 360 -3.3525 -3.3943 -19.0097 -31.0481 0.3390 0.9750 1.1060 1.1060 0.0
0.2496 2.1229 380 -3.3850 -3.4511 -19.0207 -31.5875 0.3513 0.9500 1.1049 1.1049 0.0
0.2294 2.2346 400 -3.4066 -3.4721 -18.7581 -31.4791 0.3479 0.9750 1.1312 1.1312 0.0
0.2657 2.3464 420 -3.4068 -3.4797 -18.9398 -32.0904 0.3536 0.9500 1.1130 1.1130 0.0
0.2225 2.4581 440 -3.4011 -3.4714 -18.8087 -31.7098 0.3474 0.9750 1.1261 1.1261 0.0
0.2554 2.5698 460 -3.4113 -3.4856 -18.9376 -31.7534 0.3510 0.9500 1.1132 1.1132 0.0
0.2282 2.6816 480 -3.4261 -3.4980 -18.8271 -31.7000 0.3481 0.9500 1.1243 1.1243 0.0
0.2073 2.7933 500 -3.4131 -3.4875 -18.7871 -31.8831 0.3500 0.9500 1.1283 1.1283 0.0
0.2345 2.9050 520 -3.4157 -3.4942 -18.8716 -31.9919 0.3530 0.9500 1.1198 1.1198 0.0
0.1987 3.0168 540 -3.4264 -3.5064 -18.6744 -32.0745 0.3496 0.9250 1.1396 1.1396 0.0
0.1838 3.1285 560 -3.4521 -3.5512 -19.5243 -34.0481 0.3871 0.9000 1.0546 1.0546 0.0
0.1988 3.2402 580 -3.4623 -3.5623 -19.6098 -33.7099 0.3881 0.9000 1.0460 1.0460 0.0
0.1733 3.3520 600 -3.4567 -3.5549 -19.3341 -33.2678 0.3793 0.9000 1.0736 1.0736 0.0
0.1543 3.4637 620 -3.4552 -3.5590 -19.7700 -34.0098 0.3932 0.9250 1.0300 1.0300 0.0
0.17 3.5754 640 -3.4528 -3.5567 -19.9679 -34.2455 0.4013 0.9000 1.0102 1.0102 0.0
0.1905 3.6872 660 -3.4496 -3.5554 -20.0944 -34.5604 0.3994 0.9000 0.9976 0.9976 0.0
0.2188 3.7989 680 -3.4679 -3.5720 -19.9418 -34.7166 0.3983 0.9000 1.0128 1.0128 0.0
0.1544 3.9106 700 -3.4570 -3.5612 -19.6301 -34.3280 0.3955 0.9000 1.0440 1.0440 0.0
0.1485 4.0223 720 -3.4488 -3.5566 -19.6168 -34.5071 0.3981 0.9000 1.0453 1.0453 0.0
0.1312 4.1341 740 -3.4462 -3.5609 -20.5134 -37.0993 0.4548 0.9000 0.9557 0.9557 0.0
0.137 4.2458 760 -3.4631 -3.5800 -20.7353 -37.0974 0.4546 0.9000 0.9335 0.9335 0.0
0.1264 4.3575 780 -3.4360 -3.5525 -20.9012 -38.0580 0.4690 0.9000 0.9169 0.9169 0.0
0.1676 4.4693 800 -3.4399 -3.5572 -20.5777 -36.9247 0.4478 0.9000 0.9492 0.9492 0.0
0.1669 4.5810 820 -3.4320 -3.5501 -20.6078 -36.8322 0.4565 0.9000 0.9462 0.9462 0.0
0.2087 4.6927 840 -3.4383 -3.5573 -20.8896 -37.6431 0.4653 0.9000 0.9180 0.9180 0.0
0.1917 4.8045 860 -3.4337 -3.5523 -20.8375 -37.4122 0.4579 0.9000 0.9233 0.9233 0.0
0.1446 4.9162 880 -3.4445 -3.5623 -21.0812 -37.0777 0.4579 0.9000 0.8989 0.8989 0.0
0.1087 5.0279 900 -3.4355 -3.5585 -21.4054 -38.2455 0.4944 0.875 0.8665 0.8665 0.0
0.1107 5.1397 920 -3.4260 -3.5516 -21.8624 -39.4764 0.5151 0.9000 0.8208 0.8208 0.0
0.1205 5.2514 940 -3.4314 -3.5559 -21.9157 -40.4403 0.5230 0.875 0.8154 0.8154 0.0
0.1228 5.3631 960 -3.4247 -3.5500 -21.7546 -40.5583 0.5338 0.875 0.8315 0.8315 0.0
0.1248 5.4749 980 -3.4306 -3.5546 -22.0767 -40.2288 0.5253 0.875 0.7993 0.7993 0.0
0.1221 5.5866 1000 -3.4348 -3.5552 -21.7049 -39.9160 0.5152 0.9000 0.8365 0.8365 0.0
0.113 5.6983 1020 -3.4238 -3.5488 -21.9340 -40.4565 0.5297 0.875 0.8136 0.8136 0.0
0.1521 5.8101 1040 -3.4173 -3.5440 -21.9173 -40.1878 0.5310 0.875 0.8153 0.8153 0.0
0.1454 5.9218 1060 -3.4203 -3.5454 -21.6730 -40.2415 0.5311 0.9000 0.8397 0.8397 0.0
0.0924 6.0335 1080 -3.4155 -3.5412 -22.3914 -40.8432 0.5556 0.9000 0.7679 0.7679 0.0
0.0872 6.1453 1100 -3.4080 -3.5353 -23.4502 -42.1279 0.5976 0.875 0.6620 0.6620 0.0
0.1171 6.2570 1120 -3.4042 -3.5301 -23.1504 -42.3703 0.5868 0.875 0.6920 0.6920 0.0
0.1352 6.3687 1140 -3.3884 -3.5140 -23.5642 -42.2649 0.5905 0.8500 0.6506 0.6506 0.0
0.1121 6.4804 1160 -3.3825 -3.5106 -22.8764 -42.2739 0.5852 0.8500 0.7194 0.7194 0.0
0.1095 6.5922 1180 -3.3981 -3.5221 -23.0010 -42.5219 0.5904 0.875 0.7069 0.7069 0.0
0.1029 6.7039 1200 -3.3989 -3.5260 -23.4520 -43.0338 0.6176 0.8500 0.6618 0.6618 0.0
0.0999 6.8156 1220 -3.4005 -3.5274 -23.5220 -42.7762 0.6177 0.8500 0.6548 0.6548 0.0
0.1516 6.9274 1240 -3.4047 -3.5306 -23.3477 -43.3137 0.6201 0.875 0.6722 0.6722 0.0
0.1004 7.0391 1260 -3.3980 -3.5243 -23.2792 -42.6266 0.6063 0.8500 0.6791 0.6791 0.0
0.1213 7.1508 1280 -3.3665 -3.4955 -25.0291 -45.2048 0.6887 0.75 0.5041 0.5041 0.0
0.0933 7.2626 1300 -3.3802 -3.5086 -24.5921 -44.9486 0.6780 0.8250 0.5478 0.5478 0.0
0.0938 7.3743 1320 -3.3620 -3.4917 -24.9811 -44.8765 0.6838 0.7750 0.5089 0.5089 0.0
0.142 7.4860 1340 -3.3718 -3.5008 -24.9029 -45.3435 0.6902 0.7750 0.5167 0.5167 0.0
0.127 7.5978 1360 -3.3774 -3.5059 -24.6134 -45.3868 0.6831 0.7750 0.5457 0.5457 0.0
0.0918 7.7095 1380 -3.3833 -3.5088 -24.9553 -45.2748 0.6842 0.7750 0.5115 0.5115 0.0
0.1178 7.8212 1400 -3.3662 -3.4939 -25.5868 -46.2983 0.7027 0.7250 0.4483 0.4483 0.0
0.1169 7.9330 1420 -3.3615 -3.4883 -25.5333 -45.5611 0.6962 0.7750 0.4537 0.4537 0.0
0.1522 8.0447 1440 -3.3602 -3.4881 -25.4634 -45.9803 0.7025 0.7250 0.4607 0.4607 0.0
0.1275 8.1564 1460 -3.3589 -3.4881 -26.1552 -48.0564 0.7525 0.7250 0.3915 0.3915 0.0
0.0848 8.2682 1480 -3.3670 -3.4939 -25.9976 -47.4926 0.7414 0.7000 0.4072 0.4072 0.0
0.1039 8.3799 1500 -3.3645 -3.4922 -26.2727 -48.1866 0.7770 0.7250 0.3797 0.3797 0.0
0.1203 8.4916 1520 -3.3499 -3.4808 -26.0705 -47.7090 0.7492 0.7250 0.4000 0.4000 0.0
0.0618 8.6034 1540 -3.3557 -3.4830 -26.3612 -48.1454 0.7531 0.75 0.3709 0.3709 0.0
0.0899 8.7151 1560 -3.3437 -3.4721 -26.0612 -47.8259 0.7493 0.7250 0.4009 0.4009 0.0
0.1591 8.8268 1580 -3.3498 -3.4778 -26.6553 -48.5006 0.7682 0.6750 0.3415 0.3415 0.0
0.0961 8.9385 1600 -3.3444 -3.4703 -27.0557 -48.8106 0.7781 0.6750 0.3014 0.3014 0.0
0.0892 9.0503 1620 -3.3324 -3.4578 -26.8131 -48.4665 0.7767 0.6500 0.3257 0.3257 0.0
0.1226 9.1620 1640 -3.3216 -3.4506 -27.6348 -50.1392 0.8401 0.6500 0.2435 0.2435 0.0
0.1012 9.2737 1660 -3.3205 -3.4494 -27.0937 -50.0571 0.8226 0.6750 0.2976 0.2976 0.0
0.0825 9.3855 1680 -3.3115 -3.4402 -27.6659 -50.3234 0.8500 0.6500 0.2404 0.2404 0.0
0.1345 9.4972 1700 -3.3097 -3.4362 -27.4484 -50.4653 0.8455 0.7000 0.2622 0.2622 0.0
0.1178 9.6089 1720 -3.3052 -3.4332 -27.4479 -50.4414 0.8560 0.7000 0.2622 0.2622 0.0
0.138 9.7207 1740 -3.3118 -3.4380 -27.7125 -50.2650 0.8467 0.625 0.2358 0.2358 0.0
0.1465 9.8324 1760 -3.3077 -3.4372 -27.4658 -49.9790 0.8316 0.6750 0.2604 0.2604 0.0
0.0996 9.9441 1780 -3.3067 -3.4345 -27.8794 -50.4825 0.8560 0.625 0.2191 0.2191 0.0
0.1189 10.0559 1800 -3.3062 -3.4330 -27.8648 -50.3372 0.8531 0.625 0.2205 0.2205 0.0
0.1078 10.1676 1820 -3.3068 -3.4342 -28.0472 -50.6231 0.8598 0.6750 0.2023 0.2023 0.0
0.0897 10.2793 1840 -3.2946 -3.4231 -28.0809 -51.4095 0.8764 0.625 0.1989 0.1989 0.0
0.0831 10.3911 1860 -3.2957 -3.4226 -28.6957 -52.6367 0.9198 0.6000 0.1374 0.1374 0.0
0.0949 10.5028 1880 -3.3039 -3.4304 -28.8144 -53.0358 0.9273 0.5750 0.1256 0.1256 0.0
0.1054 10.6145 1900 -3.3017 -3.4274 -28.8699 -52.3021 0.9307 0.5750 0.1200 0.1200 0.0
0.0955 10.7263 1920 -3.2988 -3.4243 -28.5700 -52.1654 0.9160 0.6000 0.1500 0.1500 0.0
0.1188 10.8380 1940 -3.2948 -3.4213 -28.7836 -52.6606 0.9211 0.5750 0.1286 0.1286 0.0
0.1093 10.9497 1960 -3.2943 -3.4220 -28.8628 -52.4896 0.9208 0.6000 0.1207 0.1207 0.0
0.133 11.0615 1980 -3.2865 -3.4145 -29.1527 -53.5414 0.9443 0.6000 0.0917 0.0917 0.0
0.1008 11.1732 2000 -3.2907 -3.4160 -29.3006 -53.6194 0.9536 0.6000 0.0769 0.0769 0.0
0.0795 11.2849 2020 -3.2707 -3.3993 -29.3426 -53.7331 0.9601 0.5500 0.0727 0.0727 0.0
0.1097 11.3966 2040 -3.2880 -3.4116 -29.6362 -53.5940 0.9723 0.6000 0.0434 0.0434 0.0
0.1379 11.5084 2060 -3.2820 -3.4072 -29.6360 -53.6930 0.9694 0.6000 0.0434 0.0434 0.0
0.0688 11.6201 2080 -3.2765 -3.4013 -29.5118 -54.0503 0.9727 0.5750 0.0558 0.0558 0.0
0.1238 11.7318 2100 -3.2738 -3.3992 -29.3518 -53.9649 0.9643 0.5750 0.0718 0.0718 0.0
0.102 11.8436 2120 -3.2728 -3.3986 -29.6406 -54.0280 0.9608 0.5750 0.0429 0.0429 0.0
0.0781 11.9553 2140 -3.2767 -3.4007 -29.5877 -54.2530 0.9735 0.5750 0.0482 0.0482 0.0
0.0932 12.0670 2160 -3.2663 -3.3913 -29.9586 -54.9684 0.9989 0.6000 0.0111 0.0111 0.0
0.0887 12.1788 2180 -3.2603 -3.3863 -30.0693 -55.1009 1.0059 0.5500 0.0001 0.0001 0.0
0.1067 12.2905 2200 -3.2736 -3.3964 -30.0728 -54.9080 1.0082 0.5500 -0.0003 -0.0003 0.0
0.0828 12.4022 2220 -3.2638 -3.3891 -29.9830 -54.6157 1.0083 0.5750 0.0087 0.0087 0.0
0.1253 12.5140 2240 -3.2658 -3.3908 -30.0711 -55.2232 1.0133 0.5750 -0.0001 -0.0001 0.0
0.106 12.6257 2260 -3.2566 -3.3827 -30.2506 -55.4329 1.0206 0.5750 -0.0181 -0.0181 0.0
0.0846 12.7374 2280 -3.2632 -3.3876 -30.6053 -55.6918 1.0288 0.5750 -0.0535 -0.0535 0.0
0.1123 12.8492 2300 -3.2658 -3.3896 -30.2257 -55.2281 1.0185 0.5500 -0.0156 -0.0156 0.0
0.1002 12.9609 2320 -3.2582 -3.3840 -30.5014 -55.0736 1.0290 0.5250 -0.0431 -0.0431 0.0
0.109 13.0726 2340 -3.2652 -3.3902 -30.2310 -54.9712 1.0176 0.5750 -0.0161 -0.0161 0.0
0.0913 13.1844 2360 -3.2562 -3.3817 -30.5128 -55.4944 1.0323 0.5500 -0.0443 -0.0443 0.0
0.1148 13.2961 2380 -3.2598 -3.3840 -30.4925 -55.8914 1.0291 0.5500 -0.0422 -0.0422 0.0
0.1118 13.4078 2400 -3.2585 -3.3820 -30.6324 -56.0202 1.0348 0.5750 -0.0562 -0.0562 0.0
0.0812 13.5196 2420 -3.2651 -3.3868 -30.6670 -55.8481 1.0371 0.5750 -0.0597 -0.0597 0.0
0.1006 13.6313 2440 -3.2630 -3.3861 -30.3289 -55.7380 1.0399 0.5250 -0.0259 -0.0259 0.0
0.076 13.7430 2460 -3.2569 -3.3818 -30.7570 -55.9962 1.0451 0.5250 -0.0687 -0.0687 0.0
0.0781 13.8547 2480 -3.2592 -3.3826 -30.6637 -55.9950 1.0458 0.5500 -0.0594 -0.0594 0.0
0.0892 13.9665 2500 -3.2666 -3.3896 -30.5092 -55.8917 1.0433 0.5500 -0.0439 -0.0439 0.0
0.1012 14.0782 2520 -3.2542 -3.3787 -30.5755 -56.2111 1.0447 0.5500 -0.0505 -0.0505 0.0
0.1257 14.1899 2540 -3.2573 -3.3817 -30.8714 -56.0508 1.0483 0.5250 -0.0801 -0.0801 0.0
0.1197 14.3017 2560 -3.2555 -3.3789 -30.9074 -56.2701 1.0567 0.5750 -0.0837 -0.0837 0.0
0.1024 14.4134 2580 -3.2508 -3.3756 -30.6748 -56.0792 1.0568 0.5500 -0.0605 -0.0605 0.0
0.0841 14.5251 2600 -3.2517 -3.3768 -30.8211 -56.1266 1.0530 0.5750 -0.0751 -0.0751 0.0
0.1166 14.6369 2620 -3.2521 -3.3776 -30.7581 -56.2008 1.0520 0.5250 -0.0688 -0.0688 0.0
0.0786 14.7486 2640 -3.2588 -3.3815 -30.6962 -56.2018 1.0571 0.5500 -0.0626 -0.0626 0.0
0.1008 14.8603 2660 -3.2547 -3.3795 -30.6797 -55.7110 1.0555 0.5250 -0.0610 -0.0610 0.0
0.1146 14.9721 2680 -3.2503 -3.3754 -30.6332 -55.9605 1.0556 0.5500 -0.0563 -0.0563 0.0
0.0965 15.0838 2700 -3.2535 -3.3774 -30.8946 -56.2887 1.0547 0.5750 -0.0825 -0.0825 0.0
0.0833 15.1955 2720 -3.2535 -3.3774 -30.6622 -55.8942 1.0543 0.5750 -0.0592 -0.0592 0.0
0.1128 15.3073 2740 -3.2487 -3.3741 -30.6323 -55.9934 1.0510 0.5500 -0.0562 -0.0562 0.0
0.1008 15.4190 2760 -3.2560 -3.3804 -30.8279 -56.2560 1.0566 0.5250 -0.0758 -0.0758 0.0
0.1223 15.5307 2780 -3.2573 -3.3805 -30.8725 -56.2568 1.0601 0.5250 -0.0802 -0.0802 0.0
0.1231 15.6425 2800 -3.2499 -3.3753 -30.7777 -56.1515 1.0558 0.5500 -0.0708 -0.0708 0.0
0.1085 15.7542 2820 -3.2575 -3.3811 -30.6396 -56.0537 1.0494 0.5500 -0.0570 -0.0570 0.0
0.0907 15.8659 2840 -3.2582 -3.3814 -30.8428 -56.6040 1.0577 0.5250 -0.0773 -0.0773 0.0
0.1089 15.9777 2860 -3.2576 -3.3810 -30.7676 -56.1844 1.0558 0.5500 -0.0698 -0.0698 0.0

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu121
  • Datasets 3.5.0
  • Tokenizers 0.20.3
Downloads last month
2
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for YuchenLi01/genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-06_beta0.1_epoch16.0_42

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(1381)
this model

Evaluation results