---
license: apache-2.0
tags:
- text-generation
- language-model
- causal-lm
- cosmicfish
- 120m
- transformer
- rope
- gqa
- swiglu
- rmsnorm
language: en
datasets:
- CosmicSet-1.0
- akkiisfrommars/TreeCorpusCleanedmodel
model_type: CosmicFish
pipeline_tag: text-generation
---

# CosmicFish-120M

A 120M parameter language model with modern architecture improvements developed by Mistyoz AI.

## Quick Start

**The easiest way to chat with CosmicFish is using our chat.py script:**

```bash
# Download the chat script from this repository
wget https://huggingface.co/MistyozAI/CosmicFish-120M/resolve/main/chat.py

# Install dependencies
pip install transformers huggingface-hub termcolor

# Run the chat interface (automatically downloads model)
python chat.py
```

The `chat.py` script handles all model loading, generation, and provides the best chat experience with live streaming, repetition penalty, and conversation commands.

## Model Details

- **Parameters**: 121M
- **Architecture**: CosmicFish (RoPE, GQA, SwiGLU, RMSNorm)
- **Context Length**: 512 tokens
- **Vocabulary**: 50,257 tokens
- **Training Data**: CosmicSet 1.0
- **Developer**: Mistyoz AI
- **Repository**: MistyozAI/CosmicFish-120M

## Usage

### Installation

```bash
pip install transformers huggingface-hub termcolor
```

### Quick Chat Interface

```python
from transformers import GPT2Tokenizer
from huggingface_hub import snapshot_download
import torch
import json
import os

# Download model from Hugging Face Hub
cache_dir = snapshot_download(repo_id="MistyozAI/CosmicFish-120M")

# Load tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# Load config
with open(os.path.join(cache_dir, "config.json")) as f:
    config_dict = json.load(f)

# Load model weights
state_dict = torch.load(os.path.join(cache_dir, "pytorch_model.bin"), map_location="cpu")

# Note: Full model class available in the repository
print("Model downloaded and ready for use!")
```

### Advanced Generation with Repetition Penalty

```python
def generate_with_repetition_penalty(model, tokenizer, prompt, max_tokens=100, temperature=0.7, penalty=1.2):
    input_ids = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)
    generated = input_ids.clone()
    
    for _ in range(max_tokens):
        with torch.no_grad():
            logits, _ = model(generated)
        
        next_token_logits = logits[:, -1, :] / temperature
        
        # Apply repetition penalty
        if penalty > 1.0:
            for token_id in set(generated[0].tolist()):
                if next_token_logits[0, token_id] > 0:
                    next_token_logits[0, token_id] /= penalty
                else:
                    next_token_logits[0, token_id] *= penalty
        
        probs = torch.nn.functional.softmax(next_token_logits, dim=-1)
        next_token = torch.multinomial(probs, num_samples=1)
        
        if next_token.item() == tokenizer.eos_token_id:
            break
            
        generated = torch.cat([generated, next_token], dim=1)
    
    return tokenizer.decode(generated[0], skip_special_tokens=True)
```

### Chat Interface

```python
def chat_with_model():
    conversation = []
    
    while True:
        user_input = input("You: ")
        if user_input.lower() in ['quit', 'exit']:
            break
        
        context = "Below is a conversation between a human and an AI assistant.\n\n"
        for human, ai in conversation:
            context += f"Human: {human}\nAssistant: {ai}\n\n"
        context += f"Human: {user_input}\nAssistant:"
        
        # Generate response with repetition penalty
        response = generate_with_repetition_penalty(
            model, tokenizer, context, 
            max_tokens=150, temperature=0.7, penalty=1.2
        )
        
        # Extract just the assistant's response
        response = response.split("Assistant:")[-1].split('\n')[0].strip()
        print(f"CosmicFish: {response}")
        
        conversation.append((user_input, response))

chat_with_model()
```

## Architecture

CosmicFish uses several modern improvements over standard transformers:

- **RoPE (Rotary Position Embeddings)**: Better position encoding than absolute positions
- **GQA (Grouped-Query Attention)**: Reduces memory usage with 4 query groups 
- **SwiGLU**: More effective activation function than ReLU/GELU
- **RMSNorm**: Simpler, more stable normalization than LayerNorm

## Training

- **Dataset**: CosmicSet 1.0
- **Sequence Length**: 512 tokens
- **Training Steps**: ~300K iterations
- **Hardware**: Nvidia A40 x1

## Performance

- **Speed**: Varies by hardware (not benchmarked)
- **Memory**: ~500MB RAM (FP16)
- **File Size**: 243MB

## Limitations

- Small model size (120M parameters) may produce less accurate responses
- 512 token context limit
- Training data cutoff applies
- May generate incorrect information
- Cannot browse internet or access real-time data

## License

Apache 2.0 - see LICENSE file.

## Credit

If you use CosmicFish-120M, please credit Mistyoz AI.