Lifting the Curse of Capacity Gap in Distilling Language Models
Paper
• 2305.12129 • Published
minimoe-6L-384H distilled from base-base-uncased on Wikipedia.
Repository: https://github.com/GeneZC/MiniMoE arXiv: https://arxiv.org/abs/2305.12129