Ubuntu pdf image to text (OCR) - Extract all text from PDF
sudo apt-get install ghostscript
gs -dNOPAUSE -sDEVICE=tiffg4 -r600x600 -dBATCH -sPAPERSIZE=a4 -sOutputFile=file-name.tif file-name.pdf
# depending on the language, you must install the corresponding package:
sudo apt-get install tesseract-ocr tesseract-ocr-eng
tesseract file-name.tif file-name-txt-without-extension -l eng
gs -dNOPAUSE -sDEVICE=tiffg4 -r600x600 -dBATCH -sPAPERSIZE=a4 -sOutputFile=file-name.tif file-name.pdf
# depending on the language, you must install the corresponding package:
sudo apt-get install tesseract-ocr tesseract-ocr-eng
tesseract file-name.tif file-name-txt-without-extension -l eng
No comments:
Post a Comment