Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got wrong result in Vietnamese language #869

Closed
sonht1109 opened this issue Jan 10, 2024 · 1 comment
Closed

Got wrong result in Vietnamese language #869

sonht1109 opened this issue Jan 10, 2024 · 1 comment

Comments

@sonht1109
Copy link

sonht1109 commented Jan 10, 2024

Tesseract.js version ^5.0.4

Describe the bug
I had tested ocr on one image which included Vietnamese text. Everything seemed okay but it still left one issue involving digits.

To Reproduce
My code

import { createWorker } from 'tesseract.js';

(async () => {
  const worker = await createWorker('vie');
  const ret = await worker.recognize('https://vov.vn/sites/default/files/inline-images/bai1sapo1.jpg');
  console.log(ret.data.text);
  await worker.terminate();
})();

Output:
image

Expected behavior
21/0/1973 => 21/9/1973

Device Version:

  • MacOS: 14.1
  • Node version: 18.19.0
@Balearica
Copy link
Collaborator

Tesseract.js is the Javascript/Webassembly port of Tesseract. We do not make any edits to the recognition engine, so any accuracy issues with the Tesseract engine are outside of the scope of this project. If you would like to pursue further, you should (1) check whether the issue is present when using the main (CLI) Tesseract project and (2) if so, and you believe this constitutes a bug, raise the issue with that project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants