Tools and Web services
A semi-automatic text annotation tool is developped by the project. It takes PDF documents as input and processes them automatically by applying the three following steps:
- layout detection,
- optical characters recognition (with PERO OCR),
- named entities recognition (fine-tuned CamemBERT model).
Users can then check and manually correct each automatically detected and processed text section.
 |
The trade directories from the 19th century are a challenging dataset with very heterogeneous layouts, fonts, and contents. Source: gallica.bnf.fr / Bibliothèque nationale de France |
 |
SODUCO text annotation tool |
 |
The historical geocoder takes both addresses and dates into account |
Add a description here.
 |
A sample vectorisation output |
A collaborative tool to validate and edit geospatial data and more is developped to improve data quality by getting a human validation of any type of geospatial data. It allows users to improve this quality by creating, removing, modifying or validating any feature (geometry and attributes).
 |
 |
General view of the tool with uploaded data |
Edit mode, creation of the geometry of a new feature |
 |
 |
Edit mode, change of attributes of an existing feature |
Status mode to see what features were created, removed or modified |
Data and historical sources catalogue
A catalog has been developped to store, reference and retrieve archival records and digital data used and produced throughout the project.
 |
SODUCO catalog |