Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -12,4 +12,66 @@ license: apache-2.0
|
|
| 12 |
short_description: This Space provides a web interface for Optical Character Re
|
| 13 |
---
|
| 14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
| 12 |
short_description: This Space provides a web interface for Optical Character Re
|
| 13 |
---
|
| 14 |
|
| 15 |
+
# Asemmezdey Asekdan n Teqbaylit - Kabyle OCR
|
| 16 |
+
|
| 17 |
+
By Bouaziz Ait Driss
|
| 18 |
+
|
| 19 |
+
This Space provides a web interface for Optical Character Recognition (OCR) tailored for the Taqbaylit (Kabyle) language
|
| 20 |
+
using a custom Tesseract model ('kab.traineddata') with support for
|
| 21 |
+
special characters (ɣ, ɛ, ḍ, ṭ, ḥ, ṛ, ṣ, ẓ, ǧ, č).
|
| 22 |
+
|
| 23 |
+
## Features
|
| 24 |
+
|
| 25 |
+
* Upload PDF, PNG, JPG, or JPEG files.
|
| 26 |
+
* Perform OCR using the custom 'kab' model.
|
| 27 |
+
* Preview documents (for PDFs).
|
| 28 |
+
* Edit the extracted text.
|
| 29 |
+
* Download the final text as a UTF-8 encoded `.txt` file.
|
| 30 |
+
* Adjust display DPI and font size for better user experience.
|
| 31 |
+
|
| 32 |
+
## How to Use
|
| 33 |
+
|
| 34 |
+
1. Upload a file using the sidebar.
|
| 35 |
+
2. Click "Sekker PDF (Askan n Yisebtar)" if it's a PDF to load previews.
|
| 36 |
+
3. Click "Sekker OCR" to start the OCR process.
|
| 37 |
+
4. Edit the text in the right panel if needed.
|
| 38 |
+
5. Download the final text using the "Zdem Aḍris" button.
|
| 39 |
+
|
| 40 |
+
## Known Limitations
|
| 41 |
+
|
| 42 |
+
* Numbers: Limited training data.
|
| 43 |
+
* Some old less used characters such as "Г" equivalent to "ɣ" and "ţ" equivalent to "tt".
|
| 44 |
+
* Performance degrades with poor scan quality.
|
| 45 |
+
* Best results on printed text (not handwritten).
|
| 46 |
+
|
| 47 |
+
==============================================================================
|
| 48 |
+
|
| 49 |
+
English will follow
|
| 50 |
+
|
| 51 |
+
Annar-a d afecku iteddun deg uẓeṭṭa n internet i usemmezdey aseklan n Teqbaylit (OCR). Yettunefk-d ilmend n tutlayt Taqbaylit.
|
| 52 |
+
Yebna ɣef tmudemt Tesseract ('kab.traineddata') ideg kkin yisekkilen n Teqbaylit / Tamaziɣt (ɣ, ɛ, ḍ, ṭ, ḥ, ṛ, ṣ, ẓ, ǧ, č).
|
| 53 |
+
|
| 54 |
+
Tiwura
|
| 55 |
+
|
| 56 |
+
* Sali afaylu PDF, PNG, JPG, neɣ JPEG
|
| 57 |
+
* Sekker OCR suseqdec n tmudemt 'kab'.
|
| 58 |
+
* Sekker PDF i uskan n yisebtar.
|
| 59 |
+
* Zṛeg aḍris, seɣti tira-s ma ilaq.
|
| 60 |
+
* Zdem aḍris s talɣa UTF8, afaylu `.txt`.
|
| 61 |
+
* Beddel DPI n uskan akked tiddi n yisekkilen.
|
| 62 |
+
|
| 63 |
+
Amek iteddu
|
| 64 |
+
|
| 65 |
+
1. Sal afaylu seg ufeggag n yifecka
|
| 66 |
+
2. Tekki ɣef "Sekker PDF (Askan n Yisebtar)" ma d aPDF akken ad d-iban.
|
| 67 |
+
3. Tekki ɣef "Sekker OCR" akken ad yebdu usemmezdey n yisekkilen (OCR).
|
| 68 |
+
4. Zṛeg aḍris i d-yettkaden deg usfaylu yeffes ma ilaq.
|
| 69 |
+
5. Tekki ɣef "Zdem Aḍris" akken ad d-yeḥrez ufaylu.
|
| 70 |
+
|
| 71 |
+
Ayen ixuṣṣen
|
| 72 |
+
|
| 73 |
+
* Amḍan: Ur yemmid ara uselmed ɣef yimḍanen.
|
| 74 |
+
* Kra isekkilen iqburen ur ten-yesemmezdey (ɛeqqel) ara am "Г" yettwarun "ɣ" akked "ţ" yettwarun "tt".
|
| 75 |
+
* Tamellit tɣelli mi ara yeɣli umerkid n uskan n tugniwin.
|
| 76 |
+
* Asufeɣ n usemmezdey n uḍris ad yelhu i yiḍrisen yettḍebɛen (anagar ayen yuran s uɣanib ufus).
|
| 77 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|