Commit 4fd22c7c authored by Bernard Stephan's avatar Bernard Stephan
Browse files


parent 92d05a7a
# PDF2Blocs
## Abstract
*This python script converts pdf file written in french into html file.*
*The conversion consists in organizing the textual content of a pdf file into separate blocks. Each of these blocks will be transformed into an html section: H1, H2, P, FigCaption, Footer, Header.*
*This program uses pdftohtml and pdftotext, two tools of the poppler bookstore (*
*It's run from the command line:*
python /link/to/file.pdf
*The result is written on standard output.*
*The algorithme is described in french into the file of the archive.*
## Résumé
Un script python qui permet de segmenter des documents numériques au format
PDF ayant un contenu textuel en français.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment