Commit 4fd22c7c authored by Bernard Stephan's avatar Bernard Stephan
Browse files

Update README.md

parent 92d05a7a
# PDF2Blocs
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4067965.svg)](https://doi.org/10.5281/zenodo.4067965)
## Abstract
*This python script converts pdf file written in french into html file.*
*The conversion consists in organizing the textual content of a pdf file into separate blocks. Each of these blocks will be transformed into an html section: H1, H2, P, FigCaption, Footer, Header.*
*This program uses pdftohtml and pdftotext, two tools of the poppler bookstore (https://poppler.freedesktop.org/)*
*It's run from the command line:*
python pdf2blocks.py /link/to/file.pdf
*The result is written on standard output.*
*The algorithme is described in french into the README.md file of the archive.*
## Résumé
Un script python qui permet de segmenter des documents numériques au format
PDF ayant un contenu textuel en français.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment