Character level recognition gives the same results as the word level recognition. #877

Kishlay-notabot · 2024-01-24T16:34:30Z

Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)
Latest release version 5.0.4

Describe the bug
A clear and concise description of what the bug is.

Running Tesseract.js code in 2 different PSM modes gives the same output.
Is tesseract configured to give word level outputs only?
Am I guessing it right that PSMs just refine the recognition scope, but do not affect the output because it will always will be in words?
Running in SINGLE_CHAR and PSM_SINGLE_WORD gives the same output from the same sample.
I want to sort the result character by character and in order to do that, I want the bbox data of each character detected to be extracted, and used further. Is this possible?

Device Version:

OS + Version: [e.g. iOS8.1, Windows 10]
Windows 11
Browser [e.g. chrome, safari] or Node version [e.g. Node v18]
Edge

The text was updated successfully, but these errors were encountered:

Balearica · 2024-01-24T19:52:09Z

Page segmentation mode (PSM) has no impact on the format or level of granularity of the output. Running with PSM SINGLE_WORD tells the Tesseract "I believe the input image contains a single word," and running with SINGLE_CHAR tells Tesseract "I believe the input image contains a single character."

If you want more granular output with character-level bounding boxes, look at the blocks output format.

Kishlay-notabot · 2024-01-25T17:49:07Z

Thankyou for giving an insight, will close after experimenting

o7

Kishlay-notabot · 2024-01-26T15:35:13Z

@Balearica While working on my project I have created some good beginner friendly and advanced programs which could be added in the example docs, shall I add them and open a pull request?
Thankyou

Kishlay-notabot closed this as completed Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Character level recognition gives the same results as the word level recognition. #877

Character level recognition gives the same results as the word level recognition. #877

Kishlay-notabot commented Jan 24, 2024

Balearica commented Jan 24, 2024

Kishlay-notabot commented Jan 25, 2024

Kishlay-notabot commented Jan 26, 2024

Character level recognition gives the same results as the word level recognition. #877

Character level recognition gives the same results as the word level recognition. #877

Comments

Kishlay-notabot commented Jan 24, 2024

Balearica commented Jan 24, 2024

Kishlay-notabot commented Jan 25, 2024

Kishlay-notabot commented Jan 26, 2024