| | --- |
| | library_name: transformers |
| | model-index: |
| | - name: Explore_Llama-3.1-8B-Inst |
| | results: |
| | - task: |
| | type: text-generation |
| | name: Text Generation |
| | dataset: |
| | name: IFEval (0-Shot) |
| | type: HuggingFaceH4/ifeval |
| | args: |
| | num_few_shot: 0 |
| | metrics: |
| | - type: inst_level_strict_acc and prompt_level_strict_acc |
| | value: 77.95 |
| | name: strict accuracy |
| | source: |
| | url: >- |
| | https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.1-8B-Inst |
| | name: Open LLM Leaderboard |
| | - task: |
| | type: text-generation |
| | name: Text Generation |
| | dataset: |
| | name: BBH (3-Shot) |
| | type: BBH |
| | args: |
| | num_few_shot: 3 |
| | metrics: |
| | - type: acc_norm |
| | value: 30.39 |
| | name: normalized accuracy |
| | source: |
| | url: >- |
| | https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.1-8B-Inst |
| | name: Open LLM Leaderboard |
| | - task: |
| | type: text-generation |
| | name: Text Generation |
| | dataset: |
| | name: MATH Lvl 5 (4-Shot) |
| | type: hendrycks/competition_math |
| | args: |
| | num_few_shot: 4 |
| | metrics: |
| | - type: exact_match |
| | value: 17.52 |
| | name: exact match |
| | source: |
| | url: >- |
| | https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.1-8B-Inst |
| | name: Open LLM Leaderboard |
| | - task: |
| | type: text-generation |
| | name: Text Generation |
| | dataset: |
| | name: GPQA (0-shot) |
| | type: Idavidrein/gpqa |
| | args: |
| | num_few_shot: 0 |
| | metrics: |
| | - type: acc_norm |
| | value: 4.47 |
| | name: acc_norm |
| | source: |
| | url: >- |
| | https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.1-8B-Inst |
| | name: Open LLM Leaderboard |
| | - task: |
| | type: text-generation |
| | name: Text Generation |
| | dataset: |
| | name: MuSR (0-shot) |
| | type: TAUR-Lab/MuSR |
| | args: |
| | num_few_shot: 0 |
| | metrics: |
| | - type: acc_norm |
| | value: 9.64 |
| | name: acc_norm |
| | source: |
| | url: >- |
| | https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.1-8B-Inst |
| | name: Open LLM Leaderboard |
| | - task: |
| | type: text-generation |
| | name: Text Generation |
| | dataset: |
| | name: MMLU-PRO (5-shot) |
| | type: TIGER-Lab/MMLU-Pro |
| | config: main |
| | split: test |
| | args: |
| | num_few_shot: 5 |
| | metrics: |
| | - type: acc |
| | value: 31.02 |
| | name: accuracy |
| | source: |
| | url: >- |
| | https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.1-8B-Inst |
| | name: Open LLM Leaderboard |
| | license: apache-2.0 |
| | language: |
| | - en |
| | base_model: |
| | - meta-llama/Llama-3.1-8B-Instruct |
| | --- |
| | |
| | # Model Card for Model ID |
| |
|
| |
|
| |
|
| | ## Overview |
| |
|
| |
|
| | **DeepAutoAI/Explore_Llama-3.1-8B-Inst** is developed by **deepAuto.ai** by learning the distribution of llama-3.1-8B-instruct. |
| | Our approach leverages the base model’s pretrained weights and optimizes them for the **Winogrande** and **ARC-Challenge** datasets by |
| | training a latent diffusion model on the pretrained weights. specifically , this model is based on learning the distrinution of transformer layers from 16 to 31. |
| | |
| | Through this process, we learn the distribution of the base model's weight space, enabling us to explore optimal configurations. |
| | We then sample multiple sets of weights, using the **model-soup averaging technique** to identify the best-performing weights for both datasets. |
| | These weights are merged using linear interpolation to create the final model weights for **DeepAutoAI/Explore_Llama-3.1-8B-Inst**. |
| |
|
| | This approach has led to improved performance on previously unseen leaderboard tasks, all without any additional task-specific training. |
| |
|
| | The work is currently in progress |
| |
|
| |
|
| | ## Model Details |
| |
|
| |
|
| | <!-- Provide a longer summary of what this model is. --> |
| |
|
| | We trained a diffusion model to learn the distribution of subset of llama to enable generation weights that improve the performance. |
| | We generate task specific weights on winogrande and arc_challenge then transfer the best model for leaderboard benchmarking. |
| | |
| | - **Developed by:** DeepAuto.ai |
| | - **Funded by [optional]:** DeepAuto.ai |
| | - **Shared by [optional]:** DeepAuto.ai |
| | - **Model type:** llama-3.1-8B |
| | - **Language(s) (NLP):** English |
| | - **License:** Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in |
| | - compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 |
| | - **Finetuned from model [optional]:** No fine-tuning |
| | |
| | ### Model Sources [optional] |
| | |
| | <!-- Provide the basic links for the model. --> |
| | |
| | - **Repository:** Under construction |
| | - **Paper [optional]:** To be announce |
| | |
| | |
| | ## Uses |
| | |
| | <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
| | |
| | |
| | <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
| | |
| | The direct use case of our work is o improve existing model performance as well as generating task specific weights with no training. |
| | |
| | |
| | <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> |
| | Performance improvement of existing large models with limited compute |
| | |
| | ### Out-of-Scope Use |
| | |
| | <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
| | |
| | No fine-tuning or architecture generalization |
| | |
| | ## Bias, Risks, and Limitations |
| | |
| | <!-- This section is meant to convey both technical and sociotechnical limitations. --> |
| | |
| | Using a generative model to produce weights can potentially lead to unintended or undesirable outputs. However, the generated content |
| | will still fall within the range of what the base model is inherently capable of producing. |
| | |
| | ## How to Get Started with the Model |
| | The work is under progress |
| | |
| | ## Training Details |
| | We employed a latent diffusion process on pretrained model weights, unlocking the ability to generate diverse, previously unseen neural networks. |
| | Remarkably, even within the constraints of one-shot learning, our approach consistently produces a wide range of weight variations, each offering |
| | distinct performance characteristics. These generated weights not only open opportunities for weight averaging and model merging but also have the |
| | potential to significantly enhance model performance. Moreover, they enable the creation of task-specific weights, tailored to optimize performance |
| | for specialized applications |
| | |
| | ### Training Data |
| | The training data used to produced the current model is the base pretrained weights |
| | |
| | <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
| | |
| | |
| | ### Training Procedure |
| | |
| | <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
| | |
| | - We selected a set of layers and combined their pretrained weights, then trained a Variational Autoencoder (VAE) to encode these weights into the layer dimension. |
| | - We conditionally trained a diffusion model on this set of weights, allowing individual sampling of layer-specific weights. |
| | - All selected layers were encoded into a 1024-dimensional space. This model exclusively contained the sampled weights for layer normalization." |
| | |
| | |
| | <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> |
| | |
| | |
| | ## Evaluation |
| | |
| | <!-- This section describes the evaluation protocols and provides the results. --> |
| | |
| | ### Testing Data, Factors & Metrics |
| | |
| | |
| | <!-- This should link to a Dataset Card if possible. --> |
| | |
| | We test our method on Winogrande and arc_challenge, and hellaswag |
| |
|
| | #### Factors |
| |
|
| | <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> |
| |
|
| | [More Information Needed] |
| |
|
| | #### Metrics |
| |
|
| | <!-- These are the evaluation metrics being used, ideally with a description of why. --> |
| |
|
| | [More Information Needed] |
| |
|
| | ### Results |
| |
|
| | [More Information Needed] |
| |
|
| | #### Summary |
| |
|
| |
|
| |
|
| | ## Model Examination [optional] |
| |
|
| | <!-- Relevant interpretability work for the model goes here --> |
| |
|
| | [More Information Needed] |
| |
|
| | ## Environmental Impact |
| |
|
| | <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> |
| |
|
| |
|
| |
|
| | Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
| |
|
| | - **Hardware Type:** Nvidia-A100-80Gb |
| | - **Hours used:** VAE is trained for 4 hour and diffusion process 4 hours |
| | - **Compute Region:** South Korea |
| | - **Carbon Emitted:** 0.96kg |
| |
|
| | ## Technical Specifications [optional] |
| |
|
| | ### Model Architecture and Objective |
| |
|
| | We used Latent diffusion for weights generation, and llama3-1-8B as target architectures. |
| |
|
| | The primary objective of this weight generation process was to demonstrate that by learning only the distribution |
| | of few layers weights9normlaization layers in this case) in an 8-billion-parameter model, it is possible to significantly enhance the |
| | model's capabilities. Notably, this is achieved using a fraction of the computational resources and without the |
| | need for fine-tuning, showcasing the efficiency and potential of this approach. |
| |
|
| | ### Compute Infrastructure |
| |
|
| | Nvidia-A100 cluster |
| |
|
| | #### Hardware |
| |
|
| | A single Nvidia-A100 |
| |
|
| | #### Software |
| |
|
| | Model is tested using lm-harness tool version 0.4.3 |
| |
|
| | ## Citation [optional] |
| |
|
| | <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
| |
|
| | **BibTeX:** |
| |
|
| | [More Information Needed] |
| |
|
| | **APA:** |
| |
|
| | [More Information Needed] |
| |
|
| | ## Glossary [optional] |
| |
|
| | <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. --> |
| |
|
| | [More Information Needed] |
| |
|
| | ## More Information [optional] |
| |
|
| | [More Information Needed] |
| |
|
| | ## Model Card Authors [optional] |
| |
|
| | [More Information Needed] |
| |
|
| | ## Model Card Contact |
| |
|
| | For any question contact deepauto.ai |
| | # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) |
| | Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_DeepAutoAI__Explore_Llama-3.1-8B-Inst) |
| |
|
| | | Metric |Value| |
| | |-------------------|----:| |
| | |Avg. |28.50| |
| | |IFEval (0-Shot) |77.95| |
| | |BBH (3-Shot) |30.39| |
| | |MATH Lvl 5 (4-Shot)|17.52| |
| | |GPQA (0-shot) | 4.47| |
| | |MuSR (0-shot) | 9.64| |
| | |MMLU-PRO (5-shot) |31.02| |