tesseractocr-ocr参数都有哪些

Tesseract-OCR图片识别库的介绍
Tesseract-OCR图片识别库的介绍
生命的魅力
最近在公司老板让研究html5有没有能达到这种图片识别的技术,我一想html5肯定是没有啊,然后就在网上查找资料,了解到这么一个开源的图片识别。想要在html5端进行图片识别,只能够采用上传到服务器的方式,然后让服务器解析图片识别并返回结果信息给html5页面,然后显示。在网上查找了很多资料,(把第三方的图片认识库除外),有一个开源库Tesseract ,他的使用方式是下载Tesseract-OCR安装,然后在命令行输入命令操作。所以在服务器端的开发,就需要用php或者java等开发语言进行包装这个命令进行图片识别,从而达到图片识别的目的。这个开源库的中文识别率不是很好,大概只有70%左右,当然这个是在图片清晰度高并且是纯色背景的情况下,如下图解析识别结果:结果:
本文仅代表作者观点,不代表百度立场。系作者授权百家号发表,未经许可不得转载。
生命的魅力
百家号 最近更新:
简介: 在世界汇聚科技大和小东西;对数字未来的想象由于业务场景需要,需要接入OCR图像识别功能,记录一下经过几天的研究过程。
1、项目主页
/p/tesseract-ocr/
基本上涵盖了所有内容,download、wiki尤其重要,上面有许多知识
/p/tesseract-ocr/downloads/list下载exe安装程序安装即可;
3、字库训练
由于OCR识别必须要有字库,google提供了需要语言的字库,在download页面中,用于识别中文的字库非常不好用,识别率极低,因此需要自建字库训练。
具体流程如下:
(3-1)生成tif+box模板;
tif为字库图片,本文模板:vie.arial.exp0.tif,见附件;
box文件为字库描述文件,本文模板:vie.arial.exp0.box,格式为:
box文件会划定一个图片的矩形区域,指明其描述的含义:
[字符含义] [minx] [miny] [maxx] [maxy] [page_num]
核心思想是:通过tif图片生成图形,用box文件描述图形,共同生成模板。
(3-2)JTessBoxEditor自动化工具
从第一步可以看出,训练字库有一定成本,我们需要生成一个tif图片。然后还要用Tesseract生成对应的box文件。为了保证box文件正常,还需要手动编辑box文件,对其进行纠正。因此就有了JTessBoxEditor的产生,项目主页:
http://vietocr.sourceforge.net/training.html
有3大功能:
1、合并多个tif文件;
2、图形化纠正box文件;
3、根据文字,自动生成tif文件+box文件;
目前主要用到第3个功能。可以直接运行jar包执行JTessBoxEditor。推荐直接使用JTessBoxEditor提供的api接口:
TiffBoxGenerator generator = new TiffBoxGenerator(text,font,);
generator.setOutputFolder(new File(&D:\\workspace\\demo\\test2&));
generator.setFileName(&vie.arial.exp0.tif&);
generator.setTracking((float) 0.1);
generator.create();
指明需要生成的文本内容,字体,图片宽、图片高,run即可;
(3-3)编辑vie.font_properties
指明字体支持的类型,例如改字体是否支持“粗体”、“斜体”、“下划线”等,本例默认不支持,内容如下,文件见附件:
arial 0 0 0 0 0
(3-4)训练
具体训练脚本资料较多,不再论述,原理见:
/p/tesseract-ocr/wiki/TrainingTesseract3
本例提供一个train.bat文件,将vie.arial.exp0.tif、vie.arial.exp0.box、vie.font_properties、train.bat放在同一目录,执行train.bat即可
将生成的字库vie.traineddata复制到tessdata下
(3-5)识别
tesseract.exe &in_put out_put -l vie 即可
-l vie 表明使用vie.traineddata字库
4、参数配置
tesseract.exe &in_put out_put -l vie my_config
表明加载my_config配置文件,tesseract提供了多大600+配置项,每个具体配置项作用:
http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version
描述非常不详细,根据指点:
/p/tesseract-ocr/wiki/ControlParams
在my_config中加上:
enable_new_segsearch 0&
可以解决一个中文字符被识别成两个的问题
tessedit_write_images& 1
在执行识别后,可以看到二值化的图片:tessinput.tif
其他参数明显作用暂时未知;
5、ViewerDebugging
具体功能:
/p/tesseract-ocr/wiki/ViewerDebugging
在win32上使用遇到一只报:
waiting for server的问题,解决方法如下:
1、下载piccolox-1.2.jar、piccolo-1.2.jar ,看清楚了。是
piccolox-1.2.jar、piccolo-1.2.jar中
2、I Run “new ScrollView().main(new String[]{&8461&});” in IntelliJ ,and add “piccolox-1.2.jar” and “piccolo-1.2.jar”
3、Than I run &tesseract phototest.tif test1 segdemo inter&,the&&
本文已收录于以下专栏:
相关文章推荐
/physoft/archive//2107417.html
Tesseract OCR 文字識別庫識別率還是非常高的,但是前...
之前的OCRus开发工作告一段落,后端OCR识别利用开源OCR引擎Tesseract。此文介绍了Tesseract源码阅读环境的配置,并对Page layout analysis部分的源码进行分析跟踪...
第一部分:训练前的说明
        要训练一个新语言(自定义语言或者某种自然语言)对应的traineddata文件,需要产生下列过程文件:
lang.configlang.un...
Tesseract的历史Tesseract是一个开源的OCR引擎,惠普公司的布里斯托尔实验室在年开发完成。起初作为惠普的平板扫描仪的文字识别引擎。Tesseract在1995年UNL...
OCR(Optical Character Recognition):光学字符识别,是指对图片文件中的文字进行分析识别,获取的过程。
Tesseract:开源的OCR识别引擎,初期Tesseract...
最近在搞一个无人值守系统时,需要能自动登录,在登录时需要输入验证码,所以研究了验证码识别技术,否则我这个无人值守系统的作用就没有了。目前只测试了字母和数字的识别,准确率还是可以的,呵呵,已经够我自已用...
本文将介绍android平台上如何使用tesseract实现OCR。 tesseract出生于HP实验室,如今由Google负责维护,是最好的开源OCR Engine之一,并且支持中文。
他的最新文章
讲师:汪剑
讲师:刘道宽
您举报文章:
举报原因:
原文地址:
原因补充:
(最多只允许输入30个字)(命令行下运行tesseract --print-parameters 之后打印出来的所有参数)
Tesseract parameters:
(参数名,默认值,简短描述)
editor_image_xpos&& 590 Editorimage X Pos
editor_image_ypos&& 10& Editorimage Y Pos
editor_image_menuheight&&&& 50& Addto image height for menu bar
editor_image_word_bb_color 7&&& Wordbounding box colour
editor_image_blob_bb_color 4&&& Blobbounding box colour
editor_image_text_color& 2&&& Correcttext colour
editor_dbwin_xpos&& 50& Editordebug window X Pos
editor_dbwin_ypos&& 500 Editordebug window Y Pos
editor_dbwin_height 24& Editordebug window height
editor_dbwin_width& 80& Editordebug window width
editor_word_xpos&&& 60& Wordwindow X Pos
editor_word_ypos&&& 510 Wordwindow Y Pos
editor_word_height&& 240 Wordwindow height
editor_word_width&&& 655 Wordwindow width
classify_num_cp_levels&& 3&&& Numberof Class Pruner Levels
textord_debug_tabfind&&& 0&&& Debugtab finding
textord_debug_bugs 0&&& Turnon output related to bugs in tab finding
textord_testregion_left&&& -1&& Leftedge of debug reporting rectangle
textord_testregion_top&& -1&& Topedge of debug reporting rectangle
textord_testregion_right&
Rightedge of debug rectangle
textord_testregion_bottom&&
Bottomedge of debug rectangle
textord_tabfind_show_partitions& 0&&& Showpartition bounds, waiting if &1
devanagari_split_debuglevel& 0&&& Debuglevel for split shiro-rekha process.
edges_max_children_per_outline& 10& Maxnumber of children inside a character outline
edges_max_children_layers&& 5&&& Maxlayers of nested children inside a character outline
edges_children_per_grandchild&&&& 10& Importanceratio for chucking outlines
edges_children_count_limit&&& 45& Maxholes allowed in blob
edges_min_nonhole& 12& Minpixels for potential char in box
edges_patharea_ratio&&&& 40& Maxlensq/area for acceptable child outline
textord_fp_chop_error&&& 2&&& Maxallowed bending of chop cells
textord_tabfind_show_images&&&& 0&&& Showimage blobs
textord_skewsmooth_offset& 4&&& Forsmooth factor
textord_skewsmooth_offset2&&&&& 1&&& Forsmooth factor
textord_test_x&& -&&&& coordof test pt
textord_test_y&& -&&&& coordof test pt
textord_min_blobs_in_row&&& 4&&& Minblobs before gradient counted
textord_spline_minblobs 8&&& Minblobs in each spline segment
textord_spline_medianwin&&&& 6&&& Sizeof window for spline segmentation
textord_max_blob_overlaps& 4&&& Maxnumber of blobs a big blob can overlap
textord_min_xheight 10& Mincredible pixel xheight
textord_lms_line_trials&&& 12& Numberof linew fits to do
oldbl_holed_losscount&&& 10& Maxlost before fallback line used
pitsync_linear_version&&& 6&&& Usenew fast algorithm
pitsync_fake_depth& 1&&& Maxadvance fake generation
textord_tabfind_show_strokewidths&& 0&&& Showstroke widths
textord_dotmatrix_gap&& 3&&& Maxpixel gap for broken pixed pitch
textord_debug_block&&&&& 0&&& Blockto do debug on
textord_pitch_range& 2&&& Maxrange test on pitch
textord_words_veto_power&& 5&&& Rowsrequired to outvote a veto
equationdetect_save_bi_image&&& 0&&& Saveinput bi image
equationdetect_save_spt_image& 0&&& Savespecial character image
equationdetect_save_seed_image&&&&& 0&&& Savethe seed image
equationdetect_save_merged_image& 0&&& Savethe merged image
poly_debug 0&&& Debugold poly
poly_wide_objects_better&&&& 1&&& Moreaccurate approx on wide things
wordrec_display_splits&&& 0&&& Displaysplits
textord_debug_printable 0&&& Makedebug windows printable
textord_space_size_is_variable&&& 0&&& Iftrue, word delimiter spaces are assumed to have variable width, even thoughcharacters have fixed pitch.
textord_tabfind_show_initial_partitions&&&& 0&&& Showpartition bounds
textord_tabfind_show_reject_blobs&&& 0&&& Showblobs rejected as noise
textord_tabfind_show_columns&& 0&&& Showcolumn bounds
textord_tabfind_show_blocks&&&&& 0&&& Showfinal block bounds
textord_tabfind_find_tables&& 1&&& runtable detection
textord_tabfind_show_color_fit&&& 0&&& Showstroke widths
devanagari_split_debugimage&&&&& 0&&& Whetherto create a debug image for split shiro-rekha process.
textord_show_fixed_cuts 0&&& Drawfixed pitch cell boundaries
edges_use_new_outline_complexity&& 0&&& Usethe new outline complexity module
edges_debug&&&& 0&&& turnon debugging for this module
edges_children_fix&&& 0&&& Removeboxy parents of char-like children
gapmap_debug& 0&&& Saywhich blocks have tables
gapmap_use_ends&& 0&&& Uselarge space at start and end of rows
gapmap_no_isolated_quanta 0&&& Ensuregaps not less than 2quanta wide
textord_heavy_nr&&&& 0&&& Vigorouslyremove noise
textord_show_initial_rows&&&& 0&&& Displayrow accumulation
textord_show_parallel_rows& 0&&& Displaypage correlated rows
textord_show_expanded_rows&&& 0&&& Displayrows after expanding
textord_show_final_rows 0&&& Displayrows after final fitting
textord_show_final_blobs&&&& 0&&& Displayblob bounds after pre-ass
textord_test_landscape&& 0&&& Testsrefer to land/port
textord_parallel_baselines&&&& 1&&& Forceparallel baselines
textord_straight_baselines&&&& 0&&& Forcestraight baselines
textord_old_baselines&&&& 1&&& Useold baseline algorithm
textord_old_xheight& 0&&& Useold xheight algorithm
textord_fix_xheight_bug& 1&&& Usespline baseline
textord_fix_makerow_bug&&&& 1&&& Preventmultiple baselines
textord_debug_xheights& 0&&& Testxheight algorithms
textord_biased_skewcalc 1&&& Biasskew estimates with line length
textord_interpolating_skew&& 1&&& Interpolateacross gaps
textord_new_initial_xheight&&& 1&&& Usetest xheight mechanism
textord_debug_blob 0&&& Printtest blob information
textord_really_old_xheight&&&& 0&&& Useoriginal wiseowl xheight
textord_oldbl_debug 0&&& Debugold baseline generation
textord_debug_baselines 0&&& Debugbaseline generation
textord_oldbl_paradef&&& 1&&& Usepara default mechanism
textord_oldbl_split_splines&&& 1&&& Splitstepped splines
textord_oldbl_merge_parts&& 1&&& Mergesuspect partitions
oldbl_corrfix 1&&& Improvecorrelation of heights
oldbl_xhfix&& 0&&& Fixbug in modes threshold for xheights
textord_ocropus_mode&& 0&&& Makebaselines for ocropus
textord_tabfind_only_strokewidths&&&& 0&&& Onlyrun stroke widths
textord_tabfind_show_initialtabs& 0&&& Showtab candidates
textord_tabfind_show_finaltabs&& 0&&& Showtab vectors
textord_show_tables 0&&& Showtable regions
textord_tablefind_show_mark&&&& 0&&& Debugtable marking steps in detail
textord_tablefind_show_stats&&&&& 0&&& Showpage stats used in table finding
textord_tablefind_recognize_tables&&& 0&&& Enablesthe table recognizer for table layout and filtering.
textord_all_prop 0&&& Alldoc is proportial text
textord_debug_pitch_test&&&& 0&&& Debugon fixed pitch test
textord_disable_pitch_test&&& 0&&& Turnoff dp fixed pitch algorithm
textord_fast_pitch_test&& 0&&& Doeven faster pitch algorithm
textord_debug_pitch_metric& 0&&& Writefull metric stuff
textord_show_row_cuts& 0&&& Drawrow-level cuts
textord_show_page_cuts 0&&& Drawpage-level cuts
textord_pitch_cheat& 0&&& Usecorrect answer for fixed/prop
textord_blockndoc_fixed 0&&& Attemptwhole doc/block fixed pitch
textord_show_initial_words&& 0&&& Displayseparate words
textord_show_new_words&&&& 0&&& Displayseparate words
textord_show_fixed_words&&& 0&&& Displayforced fixed pitch words
textord_blocksall_fixed&& 0&&& Moanabout prop blocks
textord_blocksall_prop&& 0&&& Moanabout fixed pitch blocks
textord_blocksall_testing 0&&& Dumpstats when moaning
textord_test_mode&& 0&&& Docurrent test
textord_pitch_scalebigwords& 0&&& Scalescores on big words
textord_restore_underlines&&& 1&&& Chopunderlines & put back
textord_fp_chopping 1&&& Dofixed pitch chopping
textord_force_make_prop_words 0&&& Forceproportional word segmentation on all rows
textord_chopper_test&&&& 0&&& Chopperis being tested.
wordrec_display_all_blobs&&& 0&&& DisplayBlobs
wordrec_display_all_words&&& 0&&& DisplayWords
wordrec_blob_pause&&&&& 0&&& Blobpause
stream_filelist&&& 0&&& Streama filelist from stdin
editor_image_win_name EditorImage Editorimage window name
editor_dbwin_name& EditorDBWin Editordebug window name
editor_word_name&& BlnWords&&& BLnormalized word window
editor_debug_config_file&&&&&& Config file to apply to single words
debug_file&&&&&&&& File to send tprintf output to
classify_font_name& UnknownFont&&& Defaultfont name to be used in training
classify_training_file& MicroFeatures&&& Trainingfile
fx_debugfile FXDebug&&&& Nameof debugfile
classify_cp_angle_pad_loose 45& ClassPruner Angle Pad Loose
classify_cp_angle_pad_medium&& 20& ClassPruner Angle Pad Medium
classify_cp_angle_pad_tight& 10& CLassPruner Angle Pad Tight
classify_cp_end_pad_loose&& 0.5 ClassPruner End Pad Loose
classify_cp_end_pad_medium&&&& 0.5 ClassPruner End Pad Medium
classify_cp_end_pad_tight&&& 0.5 ClassPruner End Pad Tight
classify_cp_side_pad_loose&& 2.5 ClassPruner Side Pad Loose
classify_cp_side_pad_medium&&&& 1.2 ClassPruner Side Pad Medium
classify_cp_side_pad_tight&&& 0.6 ClassPruner Side Pad Tight
classify_pp_angle_pad&&& 45& ProtoPruner Angle Pad
classify_pp_end_pad&&&&& 0.5 ProtoPrune End Pad
classify_pp_side_pad&&&& 2.5 ProtoPruner Side Pad
classify_min_slope&& 0.414214&&&& Slopebelow which lines are called horizontal
classify_max_slope& 2.41421 Slopeabove which lines are called vertical
classify_norm_adj_midpoint& 32& Normadjust midpoint ...
classify_norm_adj_curl&&& 2&&& Normadjust curl ...
classify_pico_feature_length& 0.05&&&& PicoFeature Length
textord_underline_threshold&& 0.5 Fractionof width occupied
edges_childarea 0.5 Minarea fraction of child outline
edges_boxarea& 0.875&&& Minarea fraction of grandchild for box
textord_fp_chop_snap&&& 0.5 Maxdistance of chop pt from vertex
gapmap_big_gaps&& 1.75&&&& xhtmultiplier
textord_spline_shift_fraction& 0.02&&&& Fractionof line spacing for quad
textord_spline_outlier_fraction&&&& 0.1 Fractionof line spacing for outlier
textord_skew_ile&&&&& 0.5 Ileof gradients for page skew
textord_skew_lag&&&& 0.02&&&& Lagfor skew on row accumulation
textord_linespace_iqrlimit&&&&& 0.2 Maxiqr/median for linespace
textord_width_limit&& 8&&& Maxwidth of blobs to make rows
textord_chop_width& 1.5 Maxwidth before chopping
textord_expansion_factor&&&& 1&&& Factorto expand rows by in expand_rows
textord_overlap_x&&& 0.375&&& Fractionof linespace for good overlap
textord_minxh&&& 0.25&&&& fractionof linesize for min xheight
textord_min_linesize 1.25&&&& *blob height for initial linesize
textord_excess_blobsize 1.3 Newrow made if blob makes row this big
textord_occupancy_threshold&&&&& 0.4 Fractionof neighbourhood
textord_underline_width& 2&&& Multipleof line_size for underline
textord_min_blob_height_fraction&&&&& 0.75&&&& Minblob height/top to include blob top into xheight stats
textord_xheight_mode_fraction&&& 0.4 Minpile height to make xheight
textord_ascheight_mode_fraction 0.08&&&& Minpile height to make ascheight
textord_descheight_mode_fraction&&&& 0.08&&&& Minpile height to make descheight
textord_ascx_ratio_min&& 1.25&&&& Mincap/xheight
textord_ascx_ratio_max& 1.8 Maxcap/xheight
textord_descx_ratio_min 0.25&&&& Mindesc/xheight
textord_descx_ratio_max 0.6 Maxdesc/xheight
textord_xheight_error_margin&&&&& 0.1 Acceptedvariation
oldbl_xhfract&&&& 0.4 Fractionof est allowed in calc
oldbl_dot_error_size 1.26&&&& Maxaspect ratio of a dot
textord_oldbl_jumplimit& 0.15&&&& Xfraction for new partition
pitsync_joined_edge 0.75&&&& Distinside big blob for chopping
pitsync_offset_freecut_fraction&&& 0.25&&&& Fractionof cut for free cuts
textord_tabvector_vertical_gap_fraction&& 0.5 maxfraction of mean blob width allowed for vertical gaps in vertical text
textord_tabvector_vertical_box_ratio 0.5 Fractionof box matches required to declare a line vertical
textord_projection_scale 0.2 Dingrate for mid-cuts
textord_balance_factor&& 1&&& Dingrate for unbalanced char cells
textord_wordstats_smooth_factor&&&& 0.05&&&& Smoothinggap stats
textord_width_smooth_factor&&&&& 0.1 Smoothingwidth stats
textord_words_width_ile& 0.4 Ileof blob widths for space est
textord_words_maxspace&&&& 4&&& Multipleof xheight
textord_words_default_maxspace&&&&& 3.5 Maxbelievable third space
textord_words_default_minspace 0.6 Fractionof xheight
textord_words_min_minspace&&&& 0.3 Fractionof xheight
textord_words_default_nonspace 0.2 Fractionof xheight
textord_words_initial_lower&& 0.25&&&& Maxinitial cluster size
textord_words_initial_upper&& 0.15&&&& Mininitial cluster spacing
textord_words_minlarge& 0.75&&&& Fractionof valid gaps needed
textord_words_pitchsd_threshold 0.04&&&& Pitchsync threshold
textord_words_def_fixed 0.016&&& Thresholdfor definite fixed
textord_words_def_prop 0.09&&&& Thresholdfor definite prop
textord_pitch_rowsimilarity&&& 0.08&&&& Fractionof xheight for sameness
words_initial_lower&& 0.5 Maxinitial cluster size
words_initial_upper& 0.15&&&& Mininitial cluster spacing
words_default_prop_nonspace&&& 0.25&&&& Fractionof xheight
words_default_fixed_space&& 0.75&&&& Fractionof xheight
words_default_fixed_limit&&&&& 0.6 Allowedsize variance
textord_words_definite_spread&&& 0.3 Non-fuzzyspacing region
textord_spacesize_ratiofp&&&& 2.8 Minratio space/nonspace
textord_spacesize_ratioprop 2&&& Minratio space/nonspace
textord_fpiqr_ratio&& 1.5 PitchIQR/Gap IQR threshold
textord_max_pitch_iqr&&& 0.2 Xhfraction noise in pitch
textord_fp_min_width&&&& 0.5 Minwidth of decent blobs
textord_underline_offset 0.1 Fractionof x to ignore
ambigs_debug_level 0&&& Debuglevel for unichar ambiguities
tessedit_single_match&&&& 0&&& Topchoice only from CP
classify_debug_level 0&&& Classifydebug level
classify_norm_method&&& 1&&& NormalizationMethod&& ...
matcher_debug_level&&&& 0&&& MatcherDebug Level
matcher_debug_flags&&&& 0&&& MatcherDebug Flags
classify_learning_debug_level 0&&& LearningDebug Level:
matcher_permanent_classes_min 1&&& Min #of permanent classes
matcher_min_examples_for_prototyping& 3&&& ReliableConfig Threshold
matcher_sufficient_examples_for_prototyping 5&&& Enableadaption even if the ambiguities have not been seen
classify_adapt_proto_threshold&& 230 Thresholdfor good protos during adaptive 0-255
classify_adapt_feature_threshold 230 Thresholdfor good features during adaptive 0-255
classify_class_pruner_threshold&& 229 ClassPruner Threshold 0-255
classify_class_pruner_multiplier&& 15& ClassPruner Multiplier 0-255:&&&&&&
classify_cp_cutoff_strength&& 7&&& ClassPruner CutoffStrength:&&&&&&&&
classify_integer_matcher_multiplier&&& 10& IntegerMatcher Multiplier& 0-255:&&
il1_adaption_test&&&& 0&&& Don'tadapt to i/I at beginning of word
dawg_debug_level&&& 0&&& Setto 1 for general debug info, to 2 for more details, to 3 to see all the debugmessages
hyphen_debug_level 0&&& Debuglevel for hyphenated words.
max_viterbi_list_size 10& Maximumsize of viterbi list.
stopper_smallword_size& 2&&& Sizeof dict word to be treated as non-dict word
stopper_debug_level&&&&& 0&&& Stopperdebug level
tessedit_truncate_wordchoice_log&&&&& 10& Maxwords to keep in list
fragments_debug&&&& 0&&& Debugcharacter fragments
max_permuter_attempts 10000&& Maximumnumber of different character choices to consider during permutation. Thislimit is especially useful when user patterns are specified, since overlygeneric patterns can result in dawg search exploring an overly large number
ofoptions.
repair_unchopped_blobs 1&&& Fixblobs that aren't chopped
chop_debug 0&&& Chopdebug
chop_split_length&&&& 10000&& SplitLength
chop_same_distance&&&&& 2&&& Samedistance
chop_min_outline_points 6&&& MinNumber of Points on Outline
chop_seam_pile_size&&&& 150 Maxnumber of seams in seam_pile
chop_inside_angle&&& -50 MinInside Angle Bend
chop_min_outline_area&& 2000&&&& MinOutline Area
chop_centered_maxwidth&&&& 90& Widthof (smaller) chopped blobs above which we don't care that a chop is not nearthe center.
chop_x_y_weight&&&& 3&&& X/ Y& length weight
segment_adjust_debug&& 0&&& Segmentationadjustment debug
wordrec_debug_level&&&&& 0&&& Debuglevel for wordrec
wordrec_max_join_chunks&&& 4&&& Maxnumber of broken pieces to associate
segsearch_debug_level&& 0&&& SegSearchdebug level
segsearch_max_pain_points& 2000&&&& Maximumnumber of pain points stored in the queue
segsearch_max_futile_classifications& 20& Maximumnumber of pain point classifications per chunk thatdid not result in finding abetter word choice.
language_model_debug_level&&&&& 0&&& Languagemodel debug level
language_model_ngram_order&&&& 8&&& Maximumorder of the character ngram model
language_model_viterbi_list_max_num_prunable&& 10& Maximumnumber of prunable (those for which PrunablePath() is true) entries in eachviterbi list recorded in BLOB_CHOICEs
language_model_viterbi_list_max_size&&&&& 500 Maximumsize of viterbi lists recorded in BLOB_CHOICEs
language_model_min_compound_length&& 3&&& Minimumlength of compound words
wordrec_display_segmentations& 0&&& DisplaySegmentations
tessedit_pageseg_mode 6&& Pageseg mode: 0=osd only, 1=auto+osd, 2=auto, 3=col, 4=block, 5=line, 6=word,7=char (Values from PageSegMode enum in publictypes.h)
tessedit_ocr_engine_mode&&& 2&&& WhichOCR engine(s) to run (Tesseract, LSTM, both). Defaults to loading and runningthe most accurate available.
pageseg_devanagari_split_strategy&&& 0&&& Whetherto use the top-line splitting process for Devanagari documents while performingpage-segmentation.
ocr_devanagari_split_strategy&&&& 0&&& Whetherto use the top-line splitting process for Devanagari documents while performingocr.
bidi_debug& 0&&& Debuglevel for BiDi
applybox_debug&&&&& 1&&& Debuglevel
applybox_page& 0&&& Pagenumber to apply boxes from
tessedit_bigram_debug&& 0&&& Amountof debug output for bigram correction.
debug_noise_removal&&& 0&&& Debugreassignment of small outlines
noise_maxperblob&& 8&&& Maxdiacritics to apply to a blob
noise_maxperword&& 16& Maxdiacritics to apply to a word
debug_x_ht_level&&&& 0&&& Reestimatedebug
quality_min_initial_alphas_reqd&&& 2&&& alphasin a good word
tessedit_tess_adaption_mode&&&& 39& Adaptationdecision algorithm for tess
tessedit_test_adaption_mode&&&&& 3&&& Adaptationdecision algorithm for tess
multilang_debug_level&&& 0&&& Printmultilang debug info.
paragraph_debug_level&& 0&&& Printparagraph debug info.
tessedit_preserve_min_wd_len&&& 2&&& Onlypreserve wds longer than this
crunch_rating_max&& 10& Foradj length in rating per ch
crunch_pot_indicators&&&& 1&&& Howmany potential indicators needed
crunch_leave_lc_strings&& 4&&& Don'tcrunch words with long lower case strings
crunch_leave_uc_strings& 4&&& Don'tcrunch words with long lower case strings
crunch_long_repetitions& 3&&& Crunchwords with long repetitions
crunch_debug&&& 0&&& Asit says
fixsp_non_noise_limit&&&& 1&&& Howmany non-noise blbs either side?
fixsp_done_mode&&& 1&&& Whatconstitues done for spacing
debug_fix_space_level&&& 0&&& Contextualfixspace debug
x_ht_acceptance_tolerance&& 8&&& Maxallowed deviation of blob top outside of font data
x_ht_min_change&&&& 8&&& Minchange in xht before actually trying it
superscript_debug&&& 0&&& Debuglevel for sub & superscript fixer
suspect_level&&&& 99& Suspectmarker level
suspect_space_level 100 Minsuspect level for rejecting spaces
suspect_short_words&&&& 2&&& Don'tsuspect dict wds longer than this
tessedit_reject_mode&&&& 0&&& Rejectionalgorithm
tessedit_image_border&& 2&&& Rejblbs near image edge limit
min_sane_x_ht_pixels&&&& 8&&& Rejectany x-ht lt or eq than this
tessedit_page_number&& -1&& -1-& All pages , else specific page to process
tessdata_manager_debug_level&& 0&&& Debuglevel for TessdataManager functions.
tessedit_parallelize&& 0&&& Runin parallel where possible
tessedit_ok_mode&&& 5&&& Acceptancedecision algorithm
segment_debug 0&&& Debugthe whole segmentation process
language_model_fixed_length_choices_depth& 3&&& Depthof blob choice lists to explore when fixed length dawgs are on
tosp_debug_level&&&& 0&&& Debugdata
tosp_enough_space_samples_for_median&&&&& 3&&& orshould we use mean
tosp_redo_kern_limit&&&&& 10& No.samplesreqd to reestimate for row
tosp_few_samples&& 40& No.gapsreqd with 1 large gap to treat as a table
tosp_short_row 20& No.gapsreqd with few cert spaces to use certs
tosp_sanity_method 1&&& Howto avoid being silly
textord_max_noise_size& 7&&& Pixelsize of noise
textord_baseline_debug& 0&&& Baselinedebug level
textord_noise_sizefraction&&&& 10& Fractionof size for maxima
textord_noise_translimit& 16& Transitionsfor normal blob
textord_noise_sncount&&& 1&&& supernorm blobs to save row
use_definite_ambigs_for_classifier&&&& 0&&& Usedefinite ambiguities when running character classifier
use_ambigs_for_adaption&&&& 0&&& Useambigs for deciding whether to adapt to a character
allow_blob_division& 1&&& Usedivisible blobs chopping
prioritize_division&&&& 0&&& Prioritizeblob division over chopping
classify_enable_learning& 1&&& Enableadaptive classifier
tess_cn_matching&&& 0&&& CharacterNormalized Matching
tess_bn_matching&&& 0&&& BaselineNormalized Matching
classify_enable_adaptive_matcher&&&& 1&&& Enableadaptive classifier
classify_use_pre_adapted_templates 0&&& Usepre-adapted classifier templates
classify_save_adapted_templates&&&&& 0&&& Saveadapted templates to a file
classify_enable_adaptive_debugger&& 0&&& Enablematch debugger
classify_nonlinear_norm& 0&&& Non-linearstroke-density normalization
disable_character_fragments 1&&& Donot include character fragments in the results of the classifier
classify_debug_character_fragments& 0&&& Bringup graphical debugging windows for fragments training
matcher_debug_separate_windows&&& 0&&& Usetwo different windows for debugging the matching: One for the protos and onefor the features.
classify_bln_numeric_mode&& 0&&& Assumethe input is numbers [0-9].
load_system_dawg&& 1&&& Loadsystem word dawg.
load_freq_dawg 1&&& Loadfrequent word dawg.
load_unambig_dawg 1&&& Loadunambiguous word dawg.
load_punc_dawg&&&&& 1&&& Loaddawg with punctuation patterns.
load_number_dawg& 1&&& Loaddawg with number patterns.
load_bigram_dawg&& 1&&& Loaddawg with special word bigrams.
use_only_first_uft8_step 0&&& Useonly the first UTF8 step of the given string when computing log probabilities.
stopper_no_acceptable_choices& 0&&& MakeAcceptableChoice() always return false. Useful when there is a need to exploreall segmentations
save_raw_choices&&& 0&&& Deprecated-backward compatibility only
segment_nonalphabetic_script&&& 0&&& Don'tuse any alphabetic-specific tricks.Set to true in the traineddata config filefor scripts that are cursive or inherently fixed-pitch
save_doc_words&&&&& 0&&& SaveDocument Words
merge_fragments_in_matrix& 1&&& Mergethe fragments in the ratings matrix and delete them after merging
wordrec_no_block&&& 0&&& Don'toutput block information
wordrec_enable_assoc&& 1&&& AssociatorEnable
force_word_assoc&&& 0&&& forceassociator to run regardless of what enable_assoc is.This is used for CJK wherecomponent grouping is necessary.
fragments_guide_chopper&&&& 0&&& Useinformation from fragments to guide chopping process
chop_enable&&&& 1&&& Chopenable
chop_vertical_creep 0&&& Verticalcreep
chop_new_seam_pile&&&& 1&&& Usenew seam_pile
assume_fixed_pitch_char_segment&&& 0&&& includefixed-pitch heuristics in char segmentation
wordrec_skip_no_truth_words&&&& 0&&& Onlyrun OCR for words that had truth recorded in BlamerBundle
wordrec_debug_blamer& 0&&& Printblamer debug messages
wordrec_run_blamer 0&&& Tryto set the blame for errors
save_alt_choices&&&&& 1&&& Savealternative paths found during chopping and segmentation search
language_model_ngram_on&& 0&&& Turnon/off the use of character ngram model
language_model_ngram_use_only_first_uft8_step 0&&& Useonly the first UTF8 step of the given string when computing log probabilities.
language_model_ngram_space_delimited_language&&&& 1&&& Wordsare delimited by space
language_model_use_sigmoidal_certainty 0&&& Usesigmoidal score for certainty
tessedit_resegment_from_boxes 0&&& Takesegmentation and labeling from box file
tessedit_resegment_from_line_boxes 0&&& Conversionof word/line box file to char box file
tessedit_train_from_boxes&&& 0&&& Generatetraining data from boxed chars
tessedit_make_boxes_from_boxes&&& 0&&& Generatemore boxes from boxed chars
tessedit_train_line_recognizer 0&&& Breakinput into lines and remap boxes if present
tessedit_dump_pageseg_images& 0&&& Dumpintermediate images made during page segmentation
tessedit_ambigs_training 0&&& Performtraining for ambiguities
tessedit_adaption_debug 0&&& Generateand print debug information for adaption
applybox_learn_chars_and_char_frags_mode& 0&&& Learnboth character fragments (as is done in the special low exposure mode) as wellas unfragmented characters.
applybox_learn_ngrams_mode&&& 0&&& Eachbounding box is assumed to contain ngrams. Only learn the ngrams whose outlinesoverlap horizontally.
tessedit_display_outwords&&& 0&&& Drawoutput words
tessedit_dump_choices&& 0&&& Dumpchar choices
tessedit_timing_debug&&& 0&&& Printtiming stats
tessedit_fix_fuzzy_spaces&&&& 1&&& Tryto improve fuzzy spaces
tessedit_unrej_any_wd&& 0&&& Don'tbother with word plausibility
tessedit_fix_hyphens 1&&& Crunchdouble hyphens?
tessedit_redo_xheight&&&& 1&&& Check/Correctx-height
tessedit_enable_doc_dict 1&&& Addwords to the document dictionary
tessedit_debug_fonts&&&& 0&&& Outputfont info per char
tessedit_debug_block_rejection&& 0&&& Blockand Row stats
tessedit_enable_bigram_correction&&& 1&&& Enablecorrection based on the word bigram dictionary.
tessedit_enable_dict_correction&& 0&&& Enablesingle word correction based on the dictionary.
enable_noise_removal&&& 1&&& Removeand conditionally reassign small outlines when they confuse layout analysis,determining diacritics vs noise
debug_acceptable_wds&& 0&&& Dumpword pass/fail chk
tessedit_minimal_rej_pass1&& 0&&& Dominimal rejection on pass 1 output
tessedit_test_adaption&&& 0&&& Testadaption criteria
tessedit_matcher_log&&&& 0&&& Logmatcher activity
test_pt& 0&&& Testfor point
paragraph_text_based&&& 1&&& Runparagraph detection on the post-text-recognition (more accurate)
lstm_use_matrix 1&&& Useratings matrix/beam search with lstm
docqual_excuse_outline_errs 0&&& Allowoutline errs in unrejection?
tessedit_good_quality_unrej& 1&&& Reducerejection on good docs
tessedit_use_reject_spaces&& 1&&& Rejectspaces?
tessedit_preserve_blk_rej_perfect_wds&&& 1&&& Onlyrej partially rejected words in block rejection
tessedit_preserve_row_rej_perfect_wds&& 1&&& Onlyrej partially rejected words in row rejection
tessedit_dont_blkrej_good_wds&& 0&&& Useword segmentation quality metric
tessedit_dont_rowrej_good_wds& 0&&& Useword segmentation quality metric
tessedit_row_rej_good_docs 1&&& Applyrow rejection to good docs
tessedit_reject_bad_qual_wds&&&& 1&&& Rejectall bad quality wds
tessedit_debug_doc_rejection&&&& 0&&& Pagestats
tessedit_debug_quality_metrics&& 0&&& Outputdata to debug file
bland_unrej 0&&& unrejpotential with no checks
unlv_tilde_crunching& 1&&& Markv.bad words for tilde crunch
hocr_font_info&& 0&&& Addfont info to hocr output
crunch_early_merge_tess_fails&&& 1&&& Beforeword crunch?
crunch_early_convert_bad_unlv_chs&& 0&&& Takeout ~^ early?
crunch_terrible_garbage& 1&&& Asit says
crunch_pot_garbage 1&&& POTENTIALcrunch garbage
crunch_leave_ok_strings 1&&& Don'ttouch sensible strings
crunch_accept_ok&&& 1&&& Useacceptability in okstring
crunch_leave_accept_strings 0&&& Don'tpot crunch sensible strings
crunch_include_numerals 0&&& Fiddlealpha figures
tessedit_prefer_joined_punct 0&&& Rewardpunctation joins
tessedit_write_block_separators& 0&&& Writeblock separators in output
tessedit_write_rep_codes&&&&& 0&&& Writerepetition char code
tessedit_write_unlv&& 0&&& Write.unlv output file
tessedit_create_txt&& 0&&& Write.txt output file
tessedit_create_hocr 0&&& Write.html hOCR output file
tessedit_create_tsv& 0&&& Write.tsv output file
tessedit_create_pdf& 0&&& Write.pdf output file
textonly_pdf&&&&& 0&&& CreatePDF with only one invisible text layer
suspect_constrain_1Il&&&& 0&&& UNLVkeep 1Il chars rejected
tessedit_minimal_rejection&&& 0&&& Onlyreject tess failures
tessedit_zero_rejection&& 0&&& Don'treject ANYTHING
tessedit_word_for_word& 0&&& Makeoutput have exactly one word per WERD
tessedit_zero_kelvin_rejection&&&& 0&&& Don'treject ANYTHING AT ALL
tessedit_consistent_reps 1&&& Forceall rep chars the same
tessedit_rejection_debug 0&&& Adaptiondebug
tessedit_flip_0O 1&&& Contextual0O O0 flips
rej_trust_doc_dawg& 0&&& UseDOC dawg in 11l conf. detector
rej_1Il_use_dict_word&&&& 0&&& Usedictword test
rej_1Il_trust_permuter_type& 1&&& Don'tdouble check
rej_use_tess_accepted&& 1&&& Individualrejection control
rej_use_tess_blanks 1&&& Individualrejection control
rej_use_good_perm 1&&& Individualrejection control
rej_use_sensible_wd 0&&& Extendpermuter check
rej_alphas_in_number_perm 0&&& Extendpermuter check
tessedit_create_boxfile&& 0&&& Outputtext with boxes
tessedit_write_images&&& 1&&& Capturethe image from the IPE
interactive_display_mode&&&&& 0&&& Runinteractively?
tessedit_override_permuter&& 1&&& Accordingto dict_word
tessedit_use_primary_params_model 0&&& Inmultilingual mode use params model of the primary language
textord_tabfind_show_vlines 0&&& Debugline finding
textord_use_cjk_fp_model&&& 0&&& UseCJK fixed pitch model
poly_allow_detailed_fx&&& 0&&& Allowfeature extractors to see the original outline
tessedit_init_config_only 0&&& Onlyinitialize with the config file. Useful if the instance is not going to be usedfor OCR but say only for layout analysis.
textord_equation_detect 0&&& Turnon equation detector
textord_tabfind_vertical_text 1&&& Enablevertical detection
textord_tabfind_force_vertical_text&&& 0&&& Forceusing vertical text page mode
preserve_interword_spaces&& 0&&& Preservemultiple interword spaces
include_page_breaks&&&&& 0&&& Includepage separator string in output text after each image/page.
textord_tabfind_vertical_horizontal_mix&&& 1&&& findhorizontal lines such as headers in vertical page mode
load_fixed_length_dawgs 1&&& Loadfixed length dawgs (e.g. for non-space delimited languages)
permute_debug 0&&& Debugchar permutation process
permute_script_word&&&& 0&&& Turnon word script consistency permuter
segment_segcost_rating 0&&& incorporatesegmentation cost in word rating?
permute_fixed_length_dawg& 0&&& Turnon fixed-length phrasebook search permuter
permute_chartype_word 0&&& Turnon character type (property) consistency permuter
ngram_permuter_activated&& 0&&& Activatecharacter-level n-gram-based permuter
permute_only_top&& 0&&& Runonly the top choice permuter
use_new_state_cost 0&&& usenew state cost heuristics for segmentation state evaluation
enable_new_segsearch&& 1&&& Enablenew segmentation search path.
textord_single_height_mode& 0&&& Scripthas no xheight, so use a single mode
tosp_old_to_method&&&&& 0&&& Spacestats use prechopping?
tosp_old_to_constrain_sp_kn&&&&& 0&&& Constrainrelative values of inter and intra-word gaps for old_to_method.
tosp_only_use_prop_rows&&& 1&&& Blockstats to use fixed pitch rows?
tosp_force_wordbreak_on_punct 0&&& Forceword breaks on punct to break long lines in non-space delimited langs
tosp_use_pre_chopping& 0&&& Spacestats use prechopping?
tosp_old_to_bug_fix 0&&& Fixsuspected bug in old code
tosp_block_use_cert_spaces 1&&& Onlystat OBVIOUS spaces
tosp_row_use_cert_spaces&& 1&&& Onlystat OBVIOUS spaces
tosp_narrow_blobs_not_cert 1&&& Onlystat OBVIOUS spaces
tosp_row_use_cert_spaces1 1&&& Onlystat OBVIOUS spaces
tosp_recovery_isolated_row_stats&&&& 1&&& Userow alone when inadequate cert spaces
tosp_only_small_gaps_for_kern& 0&&& Betterguess
tosp_all_flips_fuzzy& 0&&& PassANY flip to context?
tosp_fuzzy_limit_all& 1&&& Don'trestrict kn-&sp fuzzy limit to tables
tosp_stats_use_xht_gaps&&&& 1&&& Usewithin xht gap for wd breaks
tosp_use_xht_gaps& 1&&& Usewithin xht gap for wd breaks
tosp_only_use_xht_gaps 0&&& Onlyuse within xht gap for wd breaks
tosp_rule_9_test_punct& 0&&& Don'tchng kn to space next to punct
tosp_flip_fuzz_kn_to_sp 1&&& Defaultflip
tosp_flip_fuzz_sp_to_kn 1&&& Defaultflip
tosp_improve_thresh&&&& 0&&& Enableimprovement heuristic
textord_no_rejects&& 0&&& Don'tremove noise blobs
textord_show_blobs 0&&& Displayunsorted blobs
textord_show_boxes 0&&& Displayunsorted blobs
textord_noise_rejwords&& 1&&& Rejectnoise-like words
textord_noise_rejrows&&& 1&&& Rejectnoise-like rows
textord_noise_debug&&&&& 0&&& Debugrow garbage detector
m_data_sub_dir tessdata/&&& Directoryfor data files
tessedit_module_name&& libtesseract400.dll&&& Module colocated with tessdata dir
classify_learn_debug_str&&&&&& Class str to debug learning
user_words_file&&&&&& A filename of user-provided words.
user_words_suffix&&&&&&&&& A suffix of user-provided wordslocated in tessdata.
user_patterns_file&&&&&&&&& A filename of user-provided patterns.
user_patterns_suffix&&&&&& A suffix of user-provided patternslocated in tessdata.
output_ambig_words_file&&&&&&&&&& Output file for ambiguities found inthe dictionary
word_to_debug&&&&&& Word for which stopper debug informationshould be printed to stdout
word_to_debug_lengths&&&&&& Lengths of unichars in word_to_debug
tessedit_char_blacklist&&&&&&&&& Blacklist of chars not to recognize
tessedit_char_whitelist&&&&&&&&& Whitelist of chars to recognize
tessedit_char_unblacklist&&&&& List of chars to overridetessedit_char_blacklist
tessedit_write_params_to_file&&&&&&&&&& Write all parameters to the givenfile.
applybox_exposure_pattern& .exp&&&& Exposurevalue follows this pattern in the image filename. The name of the image filesare expected to be in the form [lang].[fontname].exp[num].tif
chs_leading_punct&&& ('`& Leadingpunctuation
chs_trailing_punct1&& ).,;:?!&&& 1stTrailing punctuation
chs_trailing_punct2&& )'`& 2ndTrailing punctuation
outlines_odd&&&&& %| &&&&& Nonstandard number of outlines
outlines_2&& ij!?%&:; Nonstandard number of outlines
numeric_punctuation .,&&& Punct.chs expected WITHIN numbers
unrecognised_char&& |&&& Outputchar for unidentified blobs
ok_repeated_ch_non_alphanum_wds -?*=&&&& AllowNN to unrej
conflict_set_I_l_1&&&& Il1[]&&&& Il1conflict set
file_type&&&& .tif& Filenameextension
tessedit_load_sublangs&&&&&&&& List of languages to load with this one
page_separator
&&&&& Page separator (default is form feedcontrol character)
classify_char_norm_range&&&& 0.2 CharacterNormalization Range ...
classify_min_norm_scale_x&& 0&&& Minchar x-norm scale ...
classify_max_norm_scale_x&& 0.325&&& Maxchar x-norm scale ...
classify_min_norm_scale_y&& 0&&& Minchar y-norm scale ...
classify_max_norm_scale_y& 0.325&&& Maxchar y-norm scale ...
classify_max_rating_ratio&&&&& 1.5 Vetoratio between classifier ratings
classify_max_certainty_margin&&& 5.5 Vetodifference between classifier certainties
matcher_good_threshold 0.125&&& GoodMatch (0-1)
matcher_reliable_adaptive_result 0&&& GreatMatch (0-1)
matcher_perfect_threshold&&& 0.02&&&& PerfectMatch (0-1)
matcher_bad_match_pad&&&& 0.15&&&& BadMatch Pad (0-1)
matcher_rating_margin&& 0.1 Newtemplate margin (0-1)
matcher_avg_noise_size 12& Avg.noise blob length
matcher_clustering_max_angle_delta& 0.015&&& Maximumangle delta for prototype clustering
classify_misfit_junk_penalty& 0&&& Penaltyto apply when a non-alnum is vertically out of its expected textline position
rating_scale 1.5 Ratingscaling factor
certainty_scale&& 20& Certaintyscaling factor
tessedit_class_miss_scale&&&& 0. Scalefactor for features not used
classify_adapted_pruning_factor& 2.5 Prunepoor adapted results this much worse than best result
classify_adapted_pruning_threshold&& -1&& Thresholdat which classify_adapted_pruning_factor starts
classify_character_fragments_garbage_certainty_threshold -3&& Excludefragments that do not look like whole characters from training and adaption
speckle_large_max_size& 0.3 Maxlarge speckle size
speckle_rating_penalty&& 10& Penaltyto add to worst rating for noise
xheight_penalty_subscripts&& 0.125&&& Scorepenalty (0.1 = 10%) added if there are subscripts or superscripts in a word,but it is otherwise OK.
xheight_penalty_inconsistent 0.25&&&& Scorepenalty (0.1 = 10%) added if an xheight is inconsistent.
segment_penalty_dict_frequent_word 1&&& Scoremultiplier for word matches which have good case andare frequent in the givenlanguage (lower is better).
segment_penalty_dict_case_ok&& 1.1 Scoremultiplier for word matches that have good case (lower is better).
segment_penalty_dict_case_bad& 1.3125& Defaultscore multiplier for word matches, which may have case issues (lower isbetter).
segment_penalty_ngram_best_choice&&&&& 1.24&&&& Multiplerto for the best choice from the ngram model.
segment_penalty_dict_nonword&& 1.25&&&& Scoremultiplier for glyph fragment segmentations which do not match a dictionaryword (lower is better).
segment_penalty_garbage&&& 1.5 Scoremultiplier for poorly cased strings that are not in the dictionary andgenerally look like garbage (lower is better).
certainty_scale&& 20& Certaintyscaling factor
stopper_nondict_certainty_base& -2.5 Certaintythreshold for non-dict words
stopper_phase2_certainty_rejection_offset&&&& 1&&& Rejectcertainty offset
stopper_certainty_per_char&& -0.5 Certaintyto add for each dict char above small word size.
stopper_allowable_character_badness&&&& 3&&& Maxcertaintly variation allowed in a word (in sigma)
doc_dict_pending_threshold& 0&&& Worstcertainty for using pending dictionary
doc_dict_certainty_threshold -2.25&&& Worstcertainty for words that can be inserted into thedocument dictionary
wordrec_worst_state&&&&& 1&&& Worstsegmentation state
tessedit_certainty_threshold& -2.25&&& Goodblob limit
chop_split_dist_knob&&&& 0.5 Splitlength adjustment
chop_overlap_knob 0.9 Splitoverlap adjustment
chop_center_knob&& 0.15&&&& Splitcenter adjustment
chop_sharpness_knob&&& 0.06&&&& Splitsharpness adjustment
chop_width_change_knob&&&& 5&&& Widthchange adjustment
chop_ok_split&&& 100 OKsplit limit
chop_good_split 50& Goodsplit limit
segsearch_max_char_wh_ratio&&& 2&&& Maximumcharacter width-to-height ratio
language_model_ngram_small_prob& 1e-06&&& Toavoid overly small denominators use this as the floor of the probabilityreturned by the ngram model.
language_model_ngram_nonmatch_score -40 Averageclassifier score of a non-matching unichar.
language_model_ngram_scale_factor 0.03&&&& Strengthof the character ngram model relative to the character classifier
language_model_ngram_rating_factor 16& Factorto bring log-probs into the same range as ratings when multiplied by outlinelength
language_model_penalty_non_freq_dict_word 0.1 Penaltyfor words not in the frequent word dictionary
language_model_penalty_non_dict_word& 0.15&&&& Penaltyfor non-dictionary words
language_model_penalty_punc&&& 0.2 Penaltyfor inconsistent punctuation
language_model_penalty_case&&& 0.1 Penaltyfor inconsistent case
language_model_penalty_script&& 0.5 Penaltyfor inconsistent script
language_model_penalty_chartype&&& 0.3 Penaltyfor inconsistent character type
language_model_penalty_font&&&& 0&&& Penaltyfor inconsistent font
language_model_penalty_spacing&&&&& 0.05&&&& Penaltyfor inconsistent spacing
language_model_penalty_increment&& 0.01&&&& Penaltyincrement
noise_cert_basechar -8&& Hingepointfor base char certainty
noise_cert_disjoint&& -1&& Hingepointfor disjoint certainty
noise_cert_punc -3&& Thresholdfor new punc char certainty
noise_cert_factor&&&& 0.375&&& Scalingon certainty diff from Hingepoint
quality_rej_pc&&& 0.08&&&& good_quality_doclte rejection limit
quality_blob_pc 0&&& good_quality_docgte good blobs limit
quality_outline_pc&&& 1&&& good_quality_doclte outline error limit
quality_char_pc& 0.95&&&& good_quality_docgte good char limit
test_pt_x&&& 100000& xcoord
test_pt_y&&& 100000& ycoord
tessedit_reject_doc_percent& 65& %rejallowed before rej whole doc
tessedit_reject_block_percent&&&& 45& %rejallowed before rej whole block
tessedit_reject_row_percent& 40& %rejallowed before rej whole row
tessedit_whole_wd_rej_row_percent& 70& Numberof row rejects in whole word rejectswhich prevents whole row rejection
tessedit_good_doc_still_rowrej_wd&&& 1.1 rejgood doc wd if more than this fraction rejected
quality_rowrej_pc&&&& 1.1 good_quality_docgte good char limit
crunch_terrible_rating&&&& 80& crunchrating lt this
crunch_poor_garbage_cert&&& -9&& crunchgarbage cert lt this
crunch_poor_garbage_rate&& 60& crunchgarbage rating lt this
crunch_pot_poor_rate&&& 40& POTENTIALcrunch rating lt this
crunch_pot_poor_cert&&& -8&& POTENTIALcrunch cert lt this
crunch_del_rating&&&& 60& POTENTIALcrunch rating lt this
crunch_del_cert& -10 POTENTIALcrunch cert lt this
crunch_del_min_ht&& 0.7 Delif word ht lt xht x this
crunch_del_max_ht&& 3&&& Delif word ht gt xht x this
crunch_del_min_width&&&& 3&&& Delif word width lt xht x this
crunch_del_high_word&&& 1.5 Delif word gt xht x this above bl
crunch_del_low_word&&&& 0.5 Delif word gt xht x this below bl
crunch_small_outlines_size&&& 0.6 Smallif lt xht x this
fixsp_small_outlines_size 0.28&&&& Smallif lt xht x this
superscript_worse_certainty& 2&&& Howmany times worse certainty does a superscript position glyph need to be for usto try classifying it as a char with a different baseline?
superscript_bettered_certainty&&& 0.97&&&& Whatreduction in badness do we think sufficient to choose a superscript over whatwe'd thought.& For example, a value of0.6 means we want to reduce badness of certainty by at least 40%
superscript_scaledown_ratio& 0.4 Asuperscript scaled down more than this is unbelievably small.& For example, 0.3 means we expect the fontsize to be no smaller than 30% of the text line font size.
subscript_max_y_top&&&& 0.5 Maximumtop of a character measured as a multiple of x-height above the baseline for usto reconsider whether it's a subscript.
superscript_min_y_bottom&& 0.3 Minimumbottom of a character measured as a multiple of x-height above the baseline forus to reconsider whether it's a superscript.
suspect_rating_per_ch&&& 999.9&&& Don'ttouch bad rating limit
suspect_accept_rating&&& -999.9&& Acceptgood rating limit
tessedit_lower_flip_hyphen&& 1.5 Aspectratio dot/hyphen test
tessedit_upper_flip_hyphen&& 1.8 Aspectratio dot/hyphen test
rej_whole_of_mostly_reject_word_fract&& 0.85&&&& if&this fract
min_orientation_margin& 7&&& Minacceptable orientation margin
textord_tabfind_vertical_text_ratio&&&& 0.5 Fractionof textlines deemed vertical to use vertical page mode
textord_tabfind_aligned_gap_fraction 0.75&&&& Fractionof height used as a minimum gap for aligned blobs.
bestrate_pruning_factor& 2&&& Multiplyingfactor of current best rate to prune other hypotheses
segment_reward_script&& 0.95&&&& Scoremultipler for script consistency within a word. Being a 'reward' factor, itshould be &= 1. Smaller value implies bigger reward.
segment_reward_chartype&&& 0.97&&&& Scoremultipler for char type consistency within a word.
segment_reward_ngram_best_choice 0.99&&&& Scoremultipler for ngram permuter's best choice (only used in the Han script path).
heuristic_segcost_rating_base&&&& 1.25&&&& basefactor for adding segmentation cost into word rating.It's a multiplying factor,the larger the value above 1, the bigger the effect of segmentation cost.
heuristic_weight_rating&&& 1&&& weightassociated with char rating in combined cost ofstate
heuristic_weight_width&&& 1000&&&& weightassociated with width evidence in combined cost of state
heuristic_weight_seamcut&&&& 0&&& weightassociated with seam cut in combined cost of state
heuristic_max_char_wh_ratio 2&&& maxchar width-to-height ratio allowed in segmentation
segsearch_max_fixed_pitc
***** VIDEOINPUTLIBRARY - 0.1995 - TFW07 *****
h_char_wh_ratio 2&&& Maximumcharacter width-to-height ratio for fixed-pitch fonts
tosp_old_sp_kn_th_factor&&& 2&&& Factorfor defining space threshold in terms of space and kern sizes
tosp_threshold_bias1&&&& 0&&& howfar between kern and space?
tosp_threshold_bias2&&&& 0&&& howfar between kern and space?
tosp_narrow_fraction&&&& 0.3 Fractof xheight for narrow
tosp_narrow_aspect_ratio&&& 0.48&&&& narrowif w/h less than this
tosp_wide_fraction&& 0.52&&&& Fractof xheight for wide
tosp_wide_aspect_ratio& 0&&& wideif w/h less than this
tosp_fuzzy_space_factor&&&&& 0.6 Fractof xheight for fuzz sp
tosp_fuzzy_space_factor1&&& 0.5 Fractof xheight for fuzz sp
tosp_fuzzy_space_factor2&&& 0.72&&&& Fractof xheight for fuzz sp
tosp_gap_factor 0.83&&&& gapratio to flip sp-&kern
tosp_kern_gap_factor1&& 2&&& gapratio to flip kern-&sp
tosp_kern_gap_factor2&& 1.3 gapratio to flip kern-&sp
tosp_kern_gap_factor3&& 2.5 gapratio to flip kern-&sp
tosp_ignore_big_gaps&&& -1&& xhtmultiplier
tosp_ignore_very_big_gaps& 3.5 xhtmultiplier
tosp_rep_space 1.6 repgap multiplier for space
tosp_enough_small_gaps&&&& 0.65&&&& Fractof kerns reqd for isolated row stats
tosp_table_kn_sp_ratio& 2.25&&&& Mindifference of kn & sp in table
tosp_table_xht_sp_ratio 0.33&&&& Expectspaces bigger than this
tosp_table_fuzzy_kn_sp_ratio&&&& 3&&& Fuzzyif less than this
tosp_fuzzy_kn_fraction&& 0.5 Newfuzzy kn alg
tosp_fuzzy_sp_fraction&& 0.5 Newfuzzy sp alg
tosp_min_sane_kn_sp&&& 1.5 Don'ttrust spaces less than this time kn
tosp_init_guess_kn_mult 2.2 Threshguess - mult kn by this
tosp_init_guess_xht_mult&&&&& 0.28&&&& Threshguess - mult xht by this
tosp_max_sane_kn_thresh&&& 5&&& Multiplieron kn to limit thresh
tosp_flip_caution&&&& 0&&& Don'tautoflip kn to sp when large separation
tosp_large_kerning&& 0.19&&&& Limituse of xht gap with large kns
tosp_dont_fool_with_small_kerns&&&&& -1&& Limituse of xht gap with odd small kns
tosp_near_lh_edge&& 0&&& Don'treduce box if the top left is non blank
tosp_silly_kn_sp_gap&&&& 0.2 Don'tlet sp minus kn get too small
tosp_pass_wide_fuzz_sp_to_context 0.75&&&& Howwide fuzzies need context
textord_blob_size_bigile& 95& Percentilefor large blobs
textord_noise_area_ratio 0.7 Fractionof bounding box for noise
textord_blob_size_smallile&&& 20& Percentilefor small blobs
textord_initialx_ile&&&& 0.75&&&& Ileof sizes for xheight guess
textord_initialasc_ile& 0.9 Ileof sizes for xheight guess
textord_noise_sizelimit&&& 0.5 Fractionof x for big t count
textord_noise_normratio 2&&& Dotto norm ratio for deletion
textord_noise_syfract&&&& 0.2 xhfract height error for norm blobs
textord_noise_sxfract&&&& 0.4 xhfract width error for norm blobs
textord_noise_hfract 0.015625&&&& Heightfraction to discard outlines as speckle noise
textord_noise_rowratio&& 6&&& Dotto norm ratio for deletion
textord_blshift_maxshift& 0&&& Maxbaseline shift
textord_blshift_xfraction& 9.99&&&& Minsize of baseline shift
本文已收录于以下专栏:
相关文章推荐
由于业务场景需要,需要接入OCR图像识别功能,记录一下经过几天的研究过程。
1、项目主页
/p/tesseract-ocr/
基本上涵盖了所有内容,dow...
http://www.zmonster.me//tesseract-install-usage.html
简介获取,安装与配置
LinuxWin...
原文:/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
tesseract 4.0之后开始使用...
原文:/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
tesseract 4.0之后开始使用机器学习来进...
tesseract::DotProductAVX() at /tesseract/arch/dotproductavx.cpp
WeightMatrix::DotProduct() at /lst...
下载chi_sim.traindata字库
下载tesseract-ocr-setup-3.02.02.exe
下载地址:/p/tesseract-o...
本文参考http://blog.csdn.net/zhoushuyan/archive//5948289.aspx#1567946实现,在这里感谢该文章的作者。
当我浏览http...
0x00:用Python进行验证码识别
2、tesseract-ocr
3、pytesseract模块
File &D:\P\Python\lib\subproces...
【Tesseract-OCR】在VS2010环境下使用的方法---精简快速入门之总结
原文在:  http://blog.csdn.net/zfdxx369/article/details/98...
先用英文做个示例:
1. 拿到一张chi.pingfang.exp0.jpg:
2. 将它转化为tif:http://image./convert-to-t...
他的最新文章
讲师:汪剑
讲师:刘道宽
您举报文章:
举报原因:
原文地址:
原因补充:
(最多只允许输入30个字)}

我要回帖

更多关于 tesseractocr 的文章

更多推荐

版权声明:文章内容来源于网络,版权归原作者所有,如有侵权请点击这里与我们联系,我们将及时删除。

点击添加站长微信