Tesseract for iOS

Introduction
Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License, and development has been sponsored by Google since 2006 until now (2020). This is the most popular and qualitative OCR-library. It uses artificial intelligence(AI) for text search and its recognition on images. It supports multiple platforms: MacOS, Window, Linux, but it can be compiled for iOS and Android also.
This is the source code’ s repository.
https://github.com/tesseract-ocr/tesseract
The number of languages supported is over 100, which each .traineddata file is a language trained model.
https://github.com/tesseract-ocr/tessdata
Now I would like to describe how to implement Tesseract for iOS.
Development Environment
- Macos Catalina 10.15.2
- Xcode 11.3, Swift 5
- Tesseract 4.1.1
- Leptonica 1.79.0
- OpenCV 4.2.0
Download and include dependencies
First, create new xcode project with Single View App mode

Download tesseract 4.1.1 version for iOS
(This is the compiled version from https://github.com/tesseract-ocr/tesseract for iOS ONLY)
https://github.com/kang298/Tesseract-builds-for-iOS/tree/tesseract-4.1.1
After downloaded and unzipped, you will have 2 folders “include” and “lib”, drag and drop both to your xcode project

Download OpenCV iOS framework
https://opencv.org/releases/ then drag drop to xcode project

Press command + R to build to make sure no error
Download languages’ trained model files
https://github.com/tesseract-ocr/tessdata
In this tutorial, we will test with 3 languages: English, Japanese, Vietnamese. So we should download the following models files and saved them in a folder named tessdata:
- eng.traineddata
- jpn.traineddata
- vie.traineddata
Then drag drop that folder to xcode project. NOTE: choose “Create folder references” instead of “Create Group” when adding that folder to project

Coding
Because Tesseract is developed by C++ so you only code by C++. Create an C++ file named tesseract_wrapper.cpp in project like following


Remember to check “Also create a header file” so that Xcode will create a header (tesseract_wrapper.hpp) file for you C++ file.
tesseract_wrapper.hpp
// // tesseract_wrapper.hpp // TestTesseract // // Created by Briswell on 1/13/20. // Copyright © 2020 Briswell. All rights reserved. // #ifndef tesseract_wrapper_hpp #define tesseract_wrapper_hpp #include "opencv2/imgproc.hpp" #include "stdio.h" using namespace cv; String ocrUsingTesseractCPP(String image_path,String data_path,String language); #endif /* tesseract_wrapper_hpp */
tesseract_wrapper.cpp
//
// tesseract_wrapper.cpp
// TestTesseract
//
// Created by Briswell on 1/13/20.
// Copyright © 2020 Briswell. All rights reserved.
//
#include "allheaders.h"
#include "opencv2/imgproc.hpp"
#include "opencv2/highgui.hpp"
#include "baseapi.h"
#include "tesseract_wrapper.hpp"
using namespace cv;
using namespace tesseract;
/*
matToPix():
convert from OpenCV Image Container to Leptonica's Pix Struct
Params:
mat: OpenCV Mat image Container
Output
Leptonica's Pix Struct
*/
Pix* matToPix(Mat *mat){
int image_depth = 8;
//create a Leptonica's Pix Struct with width, height of OpenCV Image Container
Pix *pixd = pixCreate(mat->size().width, mat->size().height, image_depth);
for(int y=0; yrows; y++) {
for(int x=0; xcols; x++) {
pixSetPixel(pixd, x, y, (l_uint32) mat->at(y,x));
}
}
return pixd;
}
/*
ocrUsingTesseractCPP():
Using Tesseract engine to read text from image
Params:
image_path: path to image
data_path: path to folder containing .traineddata files
language: expeted language to detect (eng,jpn,..)
Output:
String detected from image
*/
String ocrUsingTesseractCPP(String image_path,String data_path,String language){
//load a Mat Image Container from image's path and gray scale mode
Mat image = imread(image_path,IMREAD_GRAYSCALE);
TessBaseAPI* tessEngine = new TessBaseAPI();
//Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns, in this tutorial we just focus on LSTM only
OcrEngineMode mode = tesseract::OEM_LSTM_ONLY;
//init Tesseract engine
tessEngine->Init(data_path.c_str(), language.c_str(), mode);
//Set mode for page layout analysis, refer for all modes supporting
//https://tesseract.patagames.com/help/html/T_Patagames_Ocr_Enums_PageSegMode.htm
PageSegMode pageSegMode = tesseract::PSM_SINGLE_BLOCK;
tessEngine->SetPageSegMode(pageSegMode);
//increase accuracy for japanese
if(language.compare("jpn") == 0){
tessEngine->SetVariable("chop_enable", "true");
tessEngine->SetVariable("use_new_state_cost", "false");
tessEngine->SetVariable("segment_segcost_rating", "false");
tessEngine->SetVariable("enable_new_segsearch", "0");
tessEngine->SetVariable("language_model_ngram_on", "0");
tessEngine->SetVariable("textord_force_make_prop_words", "false");
tessEngine->SetVariable("edges_max_children_per_outline", "40");
}
//convert from OpenCV Image Container to Leptonica's Pix Struct
Pix *pixImage = matToPix(&image);
//set Leptonica's Pix Struct to Tesseract engine
tessEngine->SetImage(pixImage);
//get recognized text in UTF8 encoding
char *text = tessEngine->GetUTF8Text();
//release Tesseract's cache
tessEngine->End();
pixDestroy(&pixImage);
return text;
}
Because Swift can not call C++ function directly so we will a objective-c wrapper file to handle that.
- TesseractWrapper.h
- TesseractWrapper.mm (not .m because this file is for C++ compilation)
TesseractWrapper.h
// // TesseractWrapper.h // TestTesseract // // Created by Briswell on 1/13/20. // Copyright © 2020 Briswell. All rights reserved. // #import "Foundation/Foundation.h" #import "UIKit/UIKit.h" @interface TesseractWrapper : NSObject +(NSString*)ocrUsingTesseractObjectiveC:(UIImage*)image language:(NSString*)language; @end
TesseractWrapper.mm
//
// TesseractWrapper.m
// TestTesseract
//
// Created by Briswell on 1/13/20.
// Copyright © 2020 Briswell. All rights reserved.
//
#import "TesseractWrapper.h"
#include "tesseract_wrapper.hpp"
@implementation TesseractWrapper
/*
ocrUsingTesseractObjectiveC()
call ocrUsingTesseractCPP() to recognize text from image
params:
image: image to recognize text
language: eng/jpn/vie
output:
recognized string
*/
+(NSString*)ocrUsingTesseractObjectiveC:(UIImage*)image language:(NSString*)language{
//get path of folder containing .traineddata files
NSString* data_path = [NSString stringWithFormat:@"%@/tessdata/",[[NSBundle mainBundle] bundlePath]];
//save image to app's cache directory
NSString* cache_dir = [NSSearchPathForDirectoriesInDomains(NSCachesDirectory, NSUserDomainMask, YES) lastObject];
NSString* image_path = [NSString stringWithFormat:@"%@/image.jpeg",cache_dir];
NSData* data = UIImageJPEGRepresentation(image, 0.5);
NSURL* url = [NSURL fileURLWithPath:image_path];
[data writeToURL:url atomically:true];
//get text from image using ocrUsingTesseractCPP() from file tesseract_wrapper.hpp
String str = ocrUsingTesseractCPP([image_path UTF8String], [data_path UTF8String], [language UTF8String]);
NSString* result_string = [NSString stringWithCString:str.c_str()
encoding:NSUTF8StringEncoding];
//remove cached image
[[NSFileManager defaultManager] removeItemAtURL:url error:nil];
return result_string;
}
@end
Create a simple screen with a textview and button only in ViewController.swift

ViewController.swift
//
// ViewController.swift
// TestTesseract
//
// Created by Briswell on 1/13/20.
// Copyright © 2020 Briswell. All rights reserved.
//
import UIKit
import CropViewController
class ViewController: UIViewController {
@IBOutlet weak var txt: UITextView!
override func viewDidLoad() {
super.viewDidLoad()
// Do any additional setup after loading the view.
}
@IBAction func ocr(_ sender: Any) {
//if camera not supported
if !UIImagePickerController.isSourceTypeAvailable(.camera){
return
}
//present camera to take image
let pickerController = UIImagePickerController()
pickerController.delegate = self as UIImagePickerControllerDelegate & UINavigationControllerDelegate
pickerController.sourceType = .camera
self.present(pickerController, animated: true, completion: nil)
}
}
extension ViewController: UIImagePickerControllerDelegate,UINavigationControllerDelegate{
func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {
picker.dismiss(animated: true) {
guard let image = info[.originalImage] as? UIImage else { return }
//present a crop image frame to focus on text content
let cropViewController = CropViewController.init(image: image)
cropViewController.delegate = self
self.present(cropViewController, animated: true, completion: nil)
}
}
func imagePickerControllerDidCancel(_ picker: UIImagePickerController) {
picker.dismiss(animated: true, completion: nil)
}
}
extension ViewController:CropViewControllerDelegate{
func cropViewController(_ cropViewController: CropViewController, didCropToImage image: UIImage, withRect cropRect: CGRect, angle: Int) {
cropViewController.dismiss(animated: true) {
//call objective-c wrapper with expected language
let str = TesseractWrapper.ocr(usingTesseract: image, language: "jpn")
self.txt.text = str
}
}
func cropViewController(_ cropViewController: CropViewController, didFinishCancelled cancelled: Bool) {
cropViewController.dismiss(animated: true, completion: nil)
}
}
Here is the test result with Japanese language. You can check with English and Vietnamese also with the same above way.

Conclusion
The text recognition on images is realizable task but there are some difficulties. The main problem is quality (size, lightning, contrast) of images. And each image has different problems so adding a filter tool so that user can edit manually, which is also an option. Refer to below link for improving image quality:
https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality
日本語
Vietnamese