BERT Base Spoiler Detection

Model Description

This model is a fine-tuned version of bert-base-uncased for detecting spoilers in movie and TV show reviews. It classifies reviews as either containing spoilers or being spoiler-free.

Developed by: Tyler Jordan
Model type: Text Classification
Language: English
License: MIT
Base model: bert-base-uncased

Intended Use

Primary Use Case

Automatically detect spoilers in user-generated movie and TV show reviews to warn readers before they encounter plot-revealing content.

Intended Users

  • Movie review platforms
  • Content moderation systems
  • Personal projects for filtering spoilers

Out-of-Scope Uses

  • Reviews in languages other than English
  • Non-entertainment content (news, academic papers, etc.)
  • Legal or medical content requiring high accuracy

Training Data

Dataset: IMDB Review Dataset by Enam Biswas (2021)

Preprocessing:

  • Sampled 200,000 balanced reviews (100k spoilers, 100k non-spoilers) from 5.5M total reviews
  • Train/Validation/Test split: 140k/30k/30k (70%/15%/15%)
  • Text cleaning: HTML tag removal, whitespace normalization
  • Minimum review length: 30 characters
  • Maximum sequence length: 512 tokens

Class Distribution:

  • Spoiler: 50%
  • Non-spoiler: 50%

Training Procedure

Training Hyperparameters

  • Optimizer: AdamW
  • Learning rate: 1e-5
  • Batch size: 32
  • Epochs: 5
  • Max sequence length: 512
  • Dropout: 0.3
  • Weight decay: 0.01
  • Warmup steps: 10% of total steps
  • Learning rate schedule: Linear warmup with decay

Training Hardware

  • GPU: NVIDIA T4 (Google Colab)
  • Training time: ~2-3 hours

Framework

  • PyTorch 2.5.1
  • Transformers 4.x
  • CUDA 12.1

Evaluation

Metrics

Metric Value
Test Accuracy 76.0%
Validation Accuracy 76.3%

Evaluation Data

  • 30,000 held-out reviews from the IMDB dataset
  • Balanced split (50% spoilers, 50% non-spoilers)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results