Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tessedit_char_blacklist is not a valid type, but seems to be working #831

Closed
RaenonX opened this issue Oct 1, 2023 · 3 comments · Fixed by #845
Closed

tessedit_char_blacklist is not a valid type, but seems to be working #831

RaenonX opened this issue Oct 1, 2023 · 3 comments · Fixed by #845

Comments

@RaenonX
Copy link

RaenonX commented Oct 1, 2023

Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)
v5.0.0

Describe the bug

await worker.setParameters({
  // `tessedit_char_blacklist` is not valid per TypeScript error, but it does blacklist characters
  tessedit_char_blacklist: '$',
});

To Reproduce
Run the script below using the attached image below

const worker = await createWorker(
  'jpn`,
  OEM.DEFAULT,
);
await worker.setParameters({
  // Try include or excluding this parameter
  tessedit_char_blacklist: '$',
});

const {data: {text}} = await worker.recognize(canvasRef.current.toDataURL('image/jpeg'));

JP-Subskill

Expected behavior
If blacklist is active, the result is

【 o ro / し o ozs )
ス キ ル レ ベ ル ア ッ プ M ス キ ル レ ベ ル ア ッ プ S

CTCE FT
お て つ だ い ス ピ ー ド M 最 大 所 持 数 ア ッ プ L
お て つ だ い ス ピ ー ド S

If not, the result is

【 o ro / し o ozs )
ス キ ル レ ベ ル ア ッ プ M ス キ ル レ ベ ル ア ッ プ $

CTCE FT
お て つ だ い ス ピ ー ド M 最 大 所 持 数 ア ッ プ L
お て つ だ い ス ピ ー ド $S

Device Version:

  • OS + Version: Windows 10
  • Edge 117.0.2045.47 (Official build) (64-bit) + Node v18
@Balearica
Copy link
Collaborator

There are hundreds of Tesseract parameters, and the vast majority are not defined in the type file. I agree this is an issue as it is annoying to get warnings/errors when you have done nothing wrong.

I do not believe that maintaining a list of all possible parameters is within the scope of this project (as Tesseract is constantly adding and removing options, and half of them don't function as described), so I think the solution here is to edit the types to accept any object with name/value pairs and not check the exact parameters being set

@Balearica
Copy link
Collaborator

I resolved in #845 by (1) adding tessedit_char_blacklist explicitly and (2) adding a line that allows any parameters (even if not defined). This is resolved in the master branch currently, and will be reflected in the next npm release.

@RaenonX
Copy link
Author

RaenonX commented Oct 24, 2023

I resolved in #845 by (1) adding tessedit_char_blacklist explicitly and (2) adding a line that allows any parameters (even if not defined). This is resolved in the master branch currently, and will be reflected in the next npm release.

Just saw that, thanks!

Balearica added a commit that referenced this issue Oct 29, 2023
* Updated types to support all parameters per #831

* Updated require statements to reduce file size per #847
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants