Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker loading language traineddata progress 0 #414

Closed
IAndreaGiuseppe opened this issue Feb 24, 2020 · 11 comments
Closed

Worker loading language traineddata progress 0 #414

IAndreaGiuseppe opened this issue Feb 24, 2020 · 11 comments

Comments

@IAndreaGiuseppe
Copy link

IAndreaGiuseppe commented Feb 24, 2020

Describe the bug
Using a basic example code I'm unable to get an extracted text from an image.

Object { status: "loading tesseract core", progress: 0 }
Object { status: "loading tesseract core", progress: 1 }
Object { workerId: "Worker-0-ac418", status: "initializing tesseract", progress: 0 }
Object { workerId: "Worker-0-ac418", status: "initialized tesseract", progress: 1 }
Object { workerId: "Worker-0-ac418", status: "loading language traineddata", progress: 0 }

after this point nothing happen.

To Reproduce

<template>
    <div>
        <button v-on:click="recognize">recognize</button>
    </div>
</template>

<script>
import { createWorker } from "tesseract.js";

const worker = createWorker({
    logger: m => console.log(m)
});

export default {
    name: "ocr-reader",

    methods: {
        "recognize": function() {
            await worker.load();
            await worker.loadLanguage("eng");
            await worker.initialize("eng");
            await worker.initialize("eng");
            const {
                data: { text }
            } = await worker.recognize("http://localhost:8000/WEZK8.png");
            console.log(text);
            await worker.terminate();
        }
    }
};
</script>

simplest Vue component

Expected behavior
I expect to see a text message on console

Additional context
I'm doing a test on my localhost. I checked everything is correctly loaded. even traineddata file is correctly downloaded with status 200

@IAndreaGiuseppe
Copy link
Author

Sooo, the problem is in Firefox, with chrome I can get the text extracted

@barryZZJ
Copy link

I'm using chrome and experiencing the same problem.

@jeromewu
Copy link
Member

jeromewu commented Mar 9, 2020

I have tried both Chrome and Firefox and it works perfectly.
@IAndreaGiuseppe may I know the version of Firefox?
@barryZZJ You might face some network issue, you can try this offline version to verify: https://github.com/jeromewu/tesseract.js-offline

@IAndreaGiuseppe
Copy link
Author

@jeromewu I'm actually on FF 73.0.1 (64bit) on Windows

@jeromewu
Copy link
Member

Hi @IAndreaGiuseppe, I have tried Firefox 74.0 (64bit) on Windows and it still works. Maybe you try Private Window in Firefox to avoid potential cache issue, and wait for a little more if your network is not fast.

@IAndreaGiuseppe
Copy link
Author

Hi @jeromewu and thank you, I'm almost again on the subject and will be able to test this process again soon. Please don't close this issue.

@chungym
Copy link

chungym commented Dec 22, 2021

I experience the same problem in firefox.
If langPath is set to a remote path such as 'https://tessdata.projectnaptha.com/4.0.0_fast', it works fine.
However, if langPath is set to a relative path inside the extension, it fails to load lang data.
No issue in chrome.

@TwoAbove
Copy link

TwoAbove commented Jan 27, 2022

I'm experiencing the same issue, but with any langPath. I tried the default one, https://tessdata.projectnaptha.com/4.0.0_fast, a relative path. All variants are stuck on loading language traineddata:

 { workerId: "Worker-0-2c495", status: "loading language traineddata", progress: 0, userJobId: "Job-1-6281d" }

Ubuntu, Firefox 96.0.2 (64-bit)

@TwoAbove
Copy link

TwoAbove commented Jan 27, 2022

The fix for me was to add

      cacheMethod: 'none'

@Balearica
Copy link
Collaborator

I think the only thing immediately actionable here is that the promise returned by worker.loadLanguage is neither resolved or rejected (no error message--it just gets "stuck"). Once that is resolved users can catch the error and try again with a different cacheMethod, and if there is an underlying bug with Tesseract.js we will have an error message to work off of.

It looks like this is the offending part--when a DOMException is encountered the promise is never rejected. I will edit such that all errors lead to a rejected promise.

if (isWebWorker && err instanceof DOMException) {
/*
* For some reason google chrome throw DOMException in loadLang,
* while other browser is OK, for now we ignore this exception
* and hopefully to find the root cause one day.
*/
} else {

@Balearica
Copy link
Collaborator

I updated the master branch so now there should be no errors that do not lead to a message/promise rejection. This will be reflected in the next npm release (3.0.4). If anybody using version >=3.04 encounters either a non-resolving promise from worker.loadLanguage or an error due to an underlying bug they should open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants