Prophet

experiment current eksperymenty

Własny polski BPE koduje polski 1.58× gęściej niż Llama-3

Observations

base: polish-32k (nasz, byte-level BPE) vs Llama-3 / GPT-2 ↗
method: fertility = tokeny/słowo na 2000 held-out polskich dokumentów (962k słów); identyczny tekst dla wszystkich. BPE trenowany na ~1GB zbalansowanej próbki korpusu (cap legalese bo 71% to prawo), 51s na 16 rdzeniach. ↗
status: clean ↗
date: 2026-06-15 ↗

Referenced by

GPT (mentions)
prawo (mentions)
held-out (mentions)
BPE (mentions)
BPE (defined-by)

Local graph

Własny polski BPE koduje polski 1.58× gęściej niż Llama-3

← mentions GPT
← mentions prawo
← mentions held-out
← mentions BPE
← defined-by BPE

Provenance

slayer@882fb52:public/results/experiments.json#polish-bpe-fertility