ID Card Recognition Based on Arabic OCR System

Overview

This project focuses on extracting and processing text from Egyptian and Marrocan ID cards, specifically targeting Arabic OCR (Optical Character Recognition). The system uses deep learning models to detect and segment ID cards, and OCR technology to recognize text within the segments.

Features

ID Detection: Identifies whether the uploaded image contains an ID card.
Segmentation: Segments the ID card into different regions (e.g., name, number, etc.).
Text Extraction: Uses OCR to extract text from each segmented region.
Custom Text Processing: Cleans and filters the extracted text to enhance readability.
Ordering of Text Segments: Processes text segments in a specific, user-defined order.

Installation

To run this project, you'll need to have Python 3.8 or higher installed. Additionally, you will need to install the following dependencies:

FastAPI: For creating the API.
Pillow: For image processing.
PyTesseract: For OCR functionality.
NLTK: For text processing.
Tesseract-OCR: The OCR engine.

You can install the required Python libraries using pip:

pip install fastapi uvicorn pillow pytesseract nltk

Tesseract-OCR needs to be installed separately. You can download and install it from Tesseract's official repository.

Configuration

Models:
- classification.pt: Model for detecting ID cards.
- segmentation.pt: Model for segmenting the ID card into regions.
Tesseract Configuration:
- Ensure that pytesseract.pytesseract.tesseract_cmd points to the path where Tesseract-OCR is installed.

Usage

Start the API Server

To run the FastAPI server, execute the following command in your terminal:
```
uvicorn main:app --reload
```

This command starts the server and enables automatic reloading for development. 2. Upload an Image: You can send a POST request to the /predict/ endpoint with an image file. You can use tools like curl, Postman, or create a simple frontend to interact with the API.

Example Image

Example Output

The API will process the image and return the extracted text from the ID card in the specified order. The response will be in JSON format.

{
   "id informations": [
       "أحمد",
       "محمد إبراهيم علي",
       "منطقة الليدو",
       "المنتزه",
       "الإسكندرية",
       "7542907 3658463",
       "QW5613784"
   ]
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
__pycache__		__pycache__
.gitignore		.gitignore
README.md		README.md
classification.pt		classification.pt
main.py		main.py
segmentation.pt		segmentation.pt
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ID Card Recognition Based on Arabic OCR System

Overview

Features

Installation

Configuration

Usage

Example Image

Example Output

About

Releases

Packages

Languages

arij01/ArabID

Folders and files

Latest commit

History

Repository files navigation

ID Card Recognition Based on Arabic OCR System

Overview

Features

Installation

Configuration

Usage

Example Image

Example Output

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages