pika

pika is a simple and public domain-like tokenizer.

Special Tokens

Unknown token: [UNK]
End-of-Sequence token: [EOS]
Padding token: [PAD]

Training

pika was trained on the first 1000 rows of each language of agentlans/multilingual-text.

Limitations

Due to its small corpus, pika may split words into smaller pieces. Also, some uncommon special tokens aren't present, you'll have to add them manually if needed.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

qikp
/

pika

pika

Special Tokens

Training

Limitations

Dataset used to train qikp/pika