News

ModernBERT, like BERT, is an encoder-only model. Encoder-only models have the characteristic of outputting a 'list of numbers (embedding vector)', which means that they literally encode human ...
This means that BERT is pre-trained on a large corpus of unlabelled text including the entire Wikipedia (that's 2,500 million words!) and Book Corpus (800 million words).