Packages

Layout Parser

document processing layout analysis

A unified toolkit for deep learning based document image analysis.
Description · Website · Package · Paper · Github

LinkTransformer

record linkage data wrangling classification

Link dataframes, deduplicate, cluster, and perform multilingual merges with the full power of Deep Learning.
Description · Website · Package · Paper · Github

EffOCR

document processing OCR

An optical character recognition (OCR) engine that is accurate, extremely cheap to deploy, and sample-efficient to customize to novel collections, languages, and character sets.
Description · Website · Package · Paper · Github

Homoglyphs

record linkage multimodal

An easy-to-use python package for deep learning-assisted string matching. Multi-modal record linkage using both textual and image information.
Description · Website · Package · Paper · Github

News Déjà Vu

retrieval semantic similarity

Use language model to select a historical news article that is similar in its use of language to a modern news article. Description · Website · Package · Paper · Github

Entity Linking

entity linking disambiguation

Disambiguate, coreference, and link entities to Wikipedia.
Description · Website · Package · Paper · Github