yuyuzhang commited on
Commit
7353728
·
verified ·
1 Parent(s): 60664d9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md CHANGED
@@ -68,5 +68,83 @@ Seed-Coder-8B-Reasoning has been evaluated extensively on reasoning-intensive co
68
  - Enhanced ability to **break down complex problems**, **design correct algorithms**, and **produce efficient implementations**.
69
  - Strong generalization to unseen problems across multiple domains (math, strings, arrays, graphs, DP, etc.).
70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
  For detailed results, please check our paper.
72
  <!-- For detailed results, please check our [📑 paper](https://arxiv.org/pdf/xxx.xxxxx). -->
 
68
  - Enhanced ability to **break down complex problems**, **design correct algorithms**, and **produce efficient implementations**.
69
  - Strong generalization to unseen problems across multiple domains (math, strings, arrays, graphs, DP, etc.).
70
 
71
+ <table>
72
+ <tr>
73
+ <th rowspan="2">Model</th>
74
+ <th colspan="3">Hard</th>
75
+ <th colspan="3">Medium</th>
76
+ <th colspan="3">Easy</th>
77
+ <th rowspan="2">Overall</th>
78
+ </tr>
79
+ <tr>
80
+ <th>4-mon</th><th>3-mon</th><th>2-mon</th>
81
+ <th>4-mon</th><th>3-mon</th><th>2-mon</th>
82
+ <th>4-mon</th><th>3-mon</th><th>2-mon</th>
83
+ </tr>
84
+
85
+ <!-- ~8B Models -->
86
+ <tr><td colspan="11"><b>~8B Models</b></td></tr>
87
+ <tr>
88
+ <td>DeepSeek-R1-Distill-Qwen-7B</td>
89
+ <td>11.3</td><td>10.7</td><td>9.6</td>
90
+ <td>39.6</td><td>37.2</td><td>37.1</td>
91
+ <td>76.2</td><td>77.1</td><td>67.1</td>
92
+ <td>36.5</td>
93
+ </tr>
94
+ <tr>
95
+ <td>DeepSeek-R1-Distill-Seed-Coder-8B</td>
96
+ <td>13.6</td><td>13.9</td><td>13.4</td>
97
+ <td>39.6</td><td>38.7</td><td>39.3</td>
98
+ <td>79.8</td><td>80.2</td><td>73.2</td>
99
+ <td>39.0</td>
100
+ </tr>
101
+ <tr>
102
+ <td>OlympicCoder-7B</td>
103
+ <td>12.7</td><td>11.8</td><td>12.5</td>
104
+ <td>40.8</td><td>39.0</td><td>38.7</td>
105
+ <td>78.0</td><td>77.1</td><td>67.8</td>
106
+ <td>37.9</td>
107
+ </tr>
108
+ <tr>
109
+ <td>Qwen3-8B-thinking</td>
110
+ <td>27.5</td><td>23.5</td><td>19.7</td>
111
+ <td>65.7</td><td>59.7</td><td>58.5</td>
112
+ <td>98.0</td><td>98.1</td><td>97.3</td>
113
+ <td>57.4</td>
114
+ </tr>
115
+ <tr>
116
+ <td>Seed-Coder-8B-Reasoning</td>
117
+ <td>27.6</td><td>28.0</td><td>31.0</td>
118
+ <td>65.8</td><td>59.2</td><td>57.5</td>
119
+ <td>87.8</td><td>88.0</td><td>80.1</td>
120
+ <td>53.6</td>
121
+ </tr>
122
+
123
+ <!-- 13B+ Models -->
124
+ <tr><td colspan="11"><b>13B+ Models</b></td></tr>
125
+ <tr>
126
+ <td>DeepSeek-R1-Distill-Qwen-14B</td>
127
+ <td>21.3</td><td>20.5</td><td>16.1</td>
128
+ <td>58.1</td><td>53.4</td><td>51.4</td>
129
+ <td>93.3</td><td>94.2</td><td>93.7</td>
130
+ <td>51.9</td>
131
+ </tr>
132
+ <tr>
133
+ <td>Claude-3.7-Sonnet-thinking</td>
134
+ <td>27.3</td><td>30.8</td><td>31.0</td>
135
+ <td>54.5</td><td>55.1</td><td>51.4</td>
136
+ <td>96.2</td><td>100.0</td><td>100.0</td>
137
+ <td>53.3</td>
138
+ </tr>
139
+ <tr>
140
+ <td>o3-mini-low</td>
141
+ <td>30.3</td><td>32.3</td><td>28.6</td>
142
+ <td>69.6</td><td>61.2</td><td>54.1</td>
143
+ <td>98.7</td><td>100.0</td><td>100.0</td>
144
+ <td>59.4</td>
145
+ </tr>
146
+ </table>
147
+
148
+
149
  For detailed results, please check our paper.
150
  <!-- For detailed results, please check our [📑 paper](https://arxiv.org/pdf/xxx.xxxxx). -->