alexanderwdark
![](images/avatars/5229220195cb06d8a1666d.png)
|
Posted: Tue Oct 15, 2019 15:30 Post subject: WCX_RedTess - Распознавание PDF и изображений (сканов) |
|
|
WCX плагин позволяет работать с изображениями и PDF как с архивами, содержащими текстовые файлы (с кодом языка в качестве имени).
Вход в "архив" - Ctrl+PgDown.
Code: | WCX_TESS - C++ image to text & PDF to text converter in the form of TC packer plugin.
Based on code from Tesseract, Poppler, Leptonica and/or OpenCV libraries.
Text recognition here working using "trained models" from Tesseract.
Russian and English models are included in basic archive (*.traineddata files).
If you need any other models, download it and write language codes into "redtess.json" config.
You need "langs" key for this. Mixed records such as "eng, rus" are allowed.
You will see all these values in TC panel as virtual archive's files with txt extension.
There is "Fast" version of "trained models" by default.
It works fast, though can have some problems (but no so bad!).
But you can get "Best" version of models using this link:
https://github.com/tesseract-ocr/tessdata_best
And replace tessdata folder content.
Or use normal models:
https://github.com/tesseract-ocr/tessdata
Also you can enable support of many other image formats (see "formats" key in config).
You can use any of Leptonica or OpenCV supported pictures with this plugin.
Multi-page at this moment enabled for TIFF format.
PDF get rasterized in memory before recognition, so try to tune DPI in configuration file.
Leptonica is default library for plugin, but you can switch to OpenCV. |
Актуальная версия здесь
Last edited by alexanderwdark on Tue Oct 22, 2019 13:03; edited 3 times in total |
|