fixergroup/polish-reranker-base-mse-quantized-onnx

This model is an optimized ONNX version of sdadas/polish-reranker-base-mse, designed for efficient CPU inference.
It maintains full compatibility with the original model and can be used directly with the CrossEncoder API from sentence-transformers.
Thanks to quantization and ONNX Runtime, it delivers nearly identical output quality while running up to 2× faster on CPU.


Model Overview

This is a Polish text reranking model trained using the mean squared error (MSE) distillation approach on a large dataset of 1.4 million queries and 10 million documents.

Training Data Sources

  1. Polish MS MARCO training split (≈800k queries)
  2. ELI5 dataset translated into Polish (≈500k queries)
  3. Polish medical Q&A collection (≈100k queries)

The teacher model used was unicamp-dl/mt5-13b-mmarco-100k, a large multilingual reranker based on the MT5-XXL architecture.
The student model was sdadas/polish-roberta-base-v2.

In the MSE method, the student learns to directly reproduce the scores predicted by the teacher.


Requirements

sentence-transformers>=2.2.0
optimum[onnxruntime]>=1.10.0
onnxruntime>=1.15.0
transformers>=4.20.0
torch>=1.9.0

Usage (Sentence-Transformers)

You can use this ONNX model directly with the CrossEncoder class from sentence-transformers:

from sentence_transformers import CrossEncoder
import torch.nn

query = "Jak dożyć 100 lat?"
answers = [
    "Trzeba zdrowo się odżywiać i uprawiać sport.",
    "Trzeba pić alkohol, imprezować i jeździć szybkimi autami.",
    "Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu."
]

model = CrossEncoder(
    "fixergroup/polish-reranker-base-mse-quantized-onnx",
    default_activation_function=torch.nn.Identity(),
    max_length=512,
    device="cpu",
    trust_remote_code=True, #<---IMPORTANT
)
pairs = [[query, answer] for answer in answers]
results = model.predict(pairs)
print(results.tolist())

This version runs entirely on CPU and provides approximately 2× faster inference than the original PyTorch model.


Usage (Hugging Face Transformers + ONNX Runtime)

You can also use the model with ONNX Runtime directly:

from transformers import AutoTokenizer
from onnxruntime import InferenceSession
import numpy as np

model_name = "fixergroup/polish-reranker-base-mse-quantized-onnx"
tokenizer = AutoTokenizer.from_pretrained(model_name)
session = InferenceSession(f"{model_name}/model.onnx")

query = "Jak dożyć 100 lat?"
answers = [
    "Trzeba zdrowo się odżywiać i uprawiać sport.",
    "Trzeba pić alkohol, imprezować i jeździć szybkimi autami.",
    "Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu."
]

texts = [f"{query}</s></s>{answer}" for answer in answers]
tokens = tokenizer(texts, padding="longest", truncation=True, max_length=512, return_tensors="np")

inputs = {k: v for k, v in tokens.items()}
outputs = session.run(None, inputs)
results = np.squeeze(outputs[0])
print(results.tolist())

Evaluation Results

The base model sdadas/polish-reranker-base-mse achieves an NDCG@10 score of 57.50 in the Rerankers category of the Polish Information Retrieval Benchmark (PIRB).
The ONNX version preserves equivalent reranking quality while offering improved inference performance.


Citation

If you use this model in your research or application, please cite:

@article{dadas2024assessing,
  title={Assessing generalization capability of text ranking models in Polish}, 
  author={Sławomir Dadas and Małgorzata Grębowiec},
  year={2024},
  eprint={2402.14318},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

Notes

  • This ONNX model was quantized and optimized by FIXER Group.
  • Compatible with sentence-transformers CrossEncoder API.
  • Ideal for CPU-based reranking in production environments.
Downloads last month
29
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fixergroup/polish-reranker-base-mse-quantized-onnx

Quantized
(1)
this model

Paper for fixergroup/polish-reranker-base-mse-quantized-onnx