Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract.js fails on nodejs when trying to package it as standalone #882

Closed
reisalin-stout opened this issue Jan 31, 2024 · 3 comments
Closed

Comments

@reisalin-stout
Copy link

reisalin-stout commented Jan 31, 2024

Tesseract.js version 5.0.4)

Describe the bug
When trying to package with pkg tesseract.js silently fails on createWorker even when including the two folders (tesseract.js and tesseract-core") separately.
can't even log any error with logger/errorHandler nor in a try/catch block

I would also like to add that it probably fails cause of this error "TypeError: r.g.addEventListener is not a function"
But it could also just be that I am messing trying to include the worker/tesseract and lang files separately.
If I serve from my packaged app a webpage with a tesseract script it will work, I just can't seem to use it as packaged.

Device Version:

  • Windows 10
  • Node 18.19.0
  • pkg 10.2.3
@Balearica
Copy link
Collaborator

In general, issues related to packaging are generally caused by:

  1. Paths (workerPath and/or corePath) needing to be set manually
    1. See this comment
  2. Using the browser code when targeting Node.js (or vice versa)
    1. If you are using Node.js, make sure you are using the Node.js code. If you are targeting a browser, be sure to use the browser code.
    2. This sounds like a likely cause here--addEventListener is only used in the browser version of Tesseract.js, so if you are trying to build for Node.js, then that indicates you are using the wrong version.

If the above does not answer your question, we would need a reproducible example repo to troubleshoot further.

@reisalin-stout
Copy link
Author

reisalin-stout commented Jan 31, 2024

I am not a professional coder so my code is pretty messy currently and I am working with non sharable stuff.
I can say that my only code for tesseract is


 const { createWorker } = require("tesseract.js");
  console.log("creating worker");
  const worker = await createWorker("eng");
  console.log("worker loaded");
  await worker
    .recognize("https://tesseract.projectnaptha.com/img/eng_bw.png")
    .then((result) => {
      console.log("result :");
      console.log(result.data.text);
    });

I also tried creating the worker with:

        const worker = await createWorker("eng", 1, {
          corePath: "dist/core",
          langPath: "dist/lang",
          cachePath: "dist/lang",
          workerPath: "dist/worker.min.js",
          gzip: false, (also tried omitting this)
        });

tesseract-filepath

I tried requiring the "tesseract.min.js" (hence why it's in the screenshot) like I would in browser but I get hit by "TypeError: r.g.addEventListener is not a function" (kinda expected) since im not using it in browser context.
The thing that leaves me puzzled is that even trying a try/catch or adding to the worker a logger/errorHandle function gives no output whatsoever

In both cases when running with node it works (i did it to check syntax and that the paths were right) but after packaging it doesnt (Also note that I run my exe in the same location as when I run it using node and I tried both packaging or excluding the pathfolders when running pkg)

Just to be sure i will add a screenshot of what I include in the external folders (maybe I misread from the faq at this link?
Local Installation)
I greatly appreciate your interest and help, thank you

@reisalin-stout
Copy link
Author

Thank you very much! With your help and the post you tagged I was able to solve this. Leaving a verbose response if anyone else ever needs it

My code and paths
` const { createWorker } = require("tesseract.js");
logmsg("creating worker");
const worker = await createWorker("eng", 1, {
workerPath: "./app/web/dist/src/worker-script/node/index.js",
corePath: "./app/web/dist/core/",
cachePath: "./app/web/dist/lang/",
});
logmsg("worker loaded");
await worker
.recognize("https://tesseract.projectnaptha.com/img/eng_bw.png")
.then((result) => {
logmsg("result :");
logmsg(result.data.text);
});

logmsg("moving on");`

I copied the whole folder from ./node-modules/tesseract.js/src to another directory and pointed the worker path there, also included the core files as in the Local Installation FAQ and a pre downloaded eng.traineddata. Picture to show code+ relative folder structure

tesseract-local-fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants