American Stories
historical newspaperA billion scale dataset of structured texts and layouts from U.S. public domain newspapers.
Huggingface · Paper · Github
A billion scale dataset of structured texts and layouts from U.S. public domain newspapers.
Huggingface · Paper · Github
A massive scale semantic similarity dataset of historical newspaper headlines.
Huggingface · Paper · Github
A topic-tagged, entity-tagged and georeferenced datasets of 2.7 million unique public domain U.S. newswire articles, written between 1878 and 1977.
Huggingface · Paper · Github