---
language:
- en
- fr
license: apache-2.0
library_name: transformers
tags:
- smollm3
- fine-tuned
- causal-lm
- text-generation
- tonic
- legml
- {{#if quantized_models}}quantized{{/if}}
pipeline_tag: text-generation
base_model: {{base_model}}
{{#if dataset_name}}
datasets:
- {{dataset_name}}
{{/if}}
{{#if quantized_models}}
model-index:
- name: {{model_name}}
  results:
  - task:
      type: text-generation
    dataset:
      name: {{dataset_name}}
      type: {{dataset_name}}
    metrics:
    - name: Training Loss
      type: loss
      value: "{{training_loss|default:'N/A'}}"
    - name: Validation Loss
      type: loss
      value: "{{validation_loss|default:'N/A'}}"
    - name: Perplexity
      type: perplexity
      value: "{{perplexity|default:'N/A'}}"
- name: {{model_name}} (int8 quantized)
  results:
  - task:
      type: text-generation
    dataset:
      name: {{dataset_name}}
      type: {{dataset_name}}
    metrics:
    - name: Memory Reduction
      type: memory_efficiency
      value: "~50%"
    - name: Inference Speed
      type: speed
      value: "Faster"
- name: {{model_name}} (int4 quantized)
  results:
  - task:
      type: text-generation
    dataset:
      name: {{dataset_name}}
      type: {{dataset_name}}
    metrics:
    - name: Memory Reduction
      type: memory_efficiency
      value: "~75%"
    - name: Inference Speed
      type: speed
      value: "Significantly Faster"
{{else}}
model-index:
- name: {{model_name}}
  results:
  - task:
      type: text-generation
    dataset:
      name: {{dataset_name}}
      type: {{dataset_name}}
    metrics:
    - name: Training Loss
      type: loss
      value: "{{training_loss|default:'N/A'}}"
    - name: Validation Loss
      type: loss
      value: "{{validation_loss|default:'N/A'}}"
    - name: Perplexity
      type: perplexity
      value: "{{perplexity|default:'N/A'}}"
{{/if}}
{{#if author_name}}
author: {{author_name}}
{{/if}}
{{#if experiment_name}}
experiment_name: {{experiment_name}}
{{/if}}
{{#if trackio_url}}
trackio_url: {{trackio_url}}
{{/if}}
{{#if dataset_repo}}
dataset_repo: {{dataset_repo}}
{{/if}}
{{#if hardware_info}}
hardware: "{{hardware_info}}"
{{/if}}
{{#if training_config_type}}
training_config: {{training_config_type}}
{{/if}}
{{#if trainer_type}}
trainer_type: {{trainer_type}}
{{/if}}
{{#if batch_size}}
batch_size: {{batch_size}}
{{/if}}
{{#if learning_rate}}
learning_rate: {{learning_rate}}
{{/if}}
{{#if max_epochs}}
max_epochs: {{max_epochs}}
{{/if}}
{{#if max_seq_length}}
max_seq_length: {{max_seq_length}}
{{/if}}
{{#if dataset_sample_size}}
dataset_sample_size: {{dataset_sample_size}}
{{/if}}
{{#if dataset_size}}
dataset_size: {{dataset_size}}
{{/if}}
{{#if dataset_format}}
dataset_format: {{dataset_format}}
{{/if}}
{{#if gradient_accumulation_steps}}
gradient_accumulation_steps: {{gradient_accumulation_steps}}
{{/if}}
---

# {{model_name}}

{{model_description}}

## Model Details

- **Base Model**: SmolLM3-3B
- **Model Type**: Causal Language Model
- **Languages**: English, French
- **License**: Apache 2.0
- **Fine-tuned**: Yes
{{#if quantized_models}}
- **Quantized Versions**: Available in subdirectories
{{/if}}

## Usage

### Main Model

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the main model
model = AutoModelForCausalLM.from_pretrained(
    "{{repo_name}}",
    device_map="auto",
    torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained("{{repo_name}}")

# Generate text
input_text = "What are we having for dinner?"
input_ids = tokenizer(input_text, return_tensors="pt").to(model.device.type)
output = model.generate(**input_ids, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

## Training Information

### Training Configuration
- **Base Model**: {{base_model}}
- **Dataset**: {{dataset_name}}
- **Training Config**: {{training_config_type}}
- **Trainer Type**: {{trainer_type}}
{{#if dataset_sample_size}}
- **Dataset Sample Size**: {{dataset_sample_size}}
{{/if}}

### Training Parameters
- **Batch Size**: {{batch_size}}
- **Gradient Accumulation**: {{gradient_accumulation_steps}}
- **Learning Rate**: {{learning_rate}}
- **Max Epochs**: {{max_epochs}}
- **Sequence Length**: {{max_seq_length}}

### Training Infrastructure
- **Hardware**: {{hardware_info}}
- **Monitoring**: Trackio integration
- **Experiment**: {{experiment_name}}

## Model Architecture

This is a fine-tuned version of the SmolLM3-3B model with the following specifications:

- **Base Model**: SmolLM3-3B
- **Parameters**: ~3B
- **Context Length**: {{max_seq_length}}
- **Languages**: English, French
- **Architecture**: Transformer-based causal language model

## Performance

The model provides:
- **Text Generation**: High-quality text generation capabilities
- **Conversation**: Natural conversation abilities
- **Multilingual**: Support for English and French
{{#if quantized_models}}
- **Quantized Versions**: Optimized for different deployment scenarios
{{/if}}

## Limitations

1. **Context Length**: Limited by the model's maximum sequence length
2. **Bias**: May inherit biases from the training data
3. **Factual Accuracy**: May generate incorrect or outdated information
4. **Safety**: Should be used responsibly with appropriate safeguards
{{#if quantized_models}}
5. **Quantization**: Quantized versions may have slightly reduced accuracy
{{/if}}

## Training Data

The model was fine-tuned on:
- **Dataset**: {{dataset_name}}
- **Size**: {{dataset_size}}
- **Format**: {{dataset_format}}
- **Languages**: English, French

## Evaluation

The model was evaluated using:
- **Metrics**: Loss, perplexity, and qualitative assessment
- **Monitoring**: Real-time tracking via Trackio
- **Validation**: Regular validation during training

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{{{model_name_slug}},
  title={{{{model_name}}}},
  author={{{author_name}}},
  year={2024},
  url={https://huggingface.co/{{repo_name}}}
}
```

## License

This model is licensed under the Apache 2.0 License.

## Acknowledgments

- **Base Model**: SmolLM3-3B by HuggingFaceTB
- **Training Framework**: PyTorch, Transformers, PEFT
- **Monitoring**: Trackio integration
- **Quantization**: torchao library

## Support

For questions and support:
- Open an issue on the Hugging Face repository
- Check the model documentation
- Review the training logs and configuration

## Repository Structure

```
{{repo_name}}/
├── README.md (this file)
├── config.json
├── pytorch_model.bin
├── tokenizer.json
└── tokenizer_config.json
```

## Usage Examples

### Text Generation
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("{{repo_name}}")
tokenizer = AutoTokenizer.from_pretrained("{{repo_name}}")

text = "The future of artificial intelligence is"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Conversation
```python
def chat_with_model(prompt, max_length=100):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=max_length)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

response = chat_with_model("Hello, how are you today?")
print(response)
```

### Advanced Usage
```python
# With generation parameters
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)
```

## Monitoring and Tracking

This model was trained with comprehensive monitoring:
- **Trackio Space**: {{trackio_url}}
- **Experiment**: {{experiment_name}}
- **Dataset Repository**: https://huggingface.co/datasets/{{dataset_repo}}
- **Training Logs**: Available in the experiment data

## Deployment

### Requirements
```bash
pip install torch transformers accelerate
{{#if quantized_models}}
pip install torchao  # For quantized models
{{/if}}
```

### Hardware Requirements
- **Main Model**: GPU with 8GB+ VRAM recommended

## Changelog

- **v1.0.0**: Initial release with fine-tuned model