7.5 KiB

Raw Permalink Blame History

Lemontropia Suite - OCR System Implementation Summary

Overview

Implemented a robust multi-backend OCR system that handles PyTorch DLL errors on Windows Store Python and provides graceful fallbacks to working backends.

Problem Solved

PyTorch fails to load c10.dll on Windows Store Python 3.13
PaddleOCR requires PyTorch which causes DLL errors
Need working OCR for game text detection without breaking dependencies

Solution Architecture

1. OCR Backends (Priority Order)

Backend	File	Speed	Accuracy	Dependencies	Windows Store Python
OpenCV EAST	`opencv_east_backend.py`	⚡ Fastest	Detection only	None	✅ Works
EasyOCR	`easyocr_backend.py`	🚀 Fast	⭐⭐⭐ Good	PyTorch	❌ May fail
Tesseract	`tesseract_backend.py`	🐢 Slow	⭐⭐ Medium	Tesseract binary	✅ Works
PaddleOCR	`paddleocr_backend.py`	🚀 Fast	⭐⭐⭐⭐⭐ Best	PaddlePaddle	❌ May fail

2. Hardware Detection

File: modules/hardware_detection.py

Detects GPU availability (CUDA, MPS, DirectML)
Detects PyTorch with safe error handling for DLL errors
Detects Windows Store Python
Recommends best OCR backend based on hardware

3. Unified OCR Interface

File: modules/game_vision_ai.py (updated)

UnifiedOCRProcessor - Main OCR interface
Auto-selects best available backend
Graceful fallback chain
Backend switching at runtime

Files Created/Modified

New Files

modules/
├── __init__.py                              # Module exports
├── hardware_detection.py                    # GPU/ML framework detection
└── ocr_backends/
    ├── __init__.py                          # Backend factory and base classes
    ├── opencv_east_backend.py               # OpenCV EAST text detector
    ├── easyocr_backend.py                   # EasyOCR backend
    ├── tesseract_backend.py                 # Tesseract OCR backend
    └── paddleocr_backend.py                 # PaddleOCR backend with DLL handling

test_ocr_system.py                           # Comprehensive test suite
demo_ocr.py                                  # Interactive demo
requirements-ocr.txt                         # OCR dependencies
OCR_SETUP.md                                 # Setup guide

Modified Files

modules/
└── game_vision_ai.py                        # Updated to use unified OCR interface

vision_example.py                            # Updated examples

Key Features

1. PyTorch DLL Error Handling

# The system detects and handles PyTorch DLL errors gracefully
try:
    import torch
    # If this fails with DLL error on Windows Store Python...
except OSError as e:
    if 'dll' in str(e).lower() or 'c10' in str(e).lower():
        # Automatically use fallback backends
        logger.warning("PyTorch DLL error - using fallback OCR")

2. Auto-Selection Logic

# Priority order (skips PyTorch-based if DLL error detected)
DEFAULT_PRIORITY = [
    'paddleocr',   # Best accuracy (if PyTorch works)
    'easyocr',     # Good balance (if PyTorch works)
    'tesseract',   # Stable fallback
    'opencv_east', # Always works
]

3. Simple Usage

from modules.game_vision_ai import GameVisionAI

# Initialize (auto-selects best backend)
vision = GameVisionAI()

# Process screenshot
result = vision.process_screenshot("game_screenshot.png")

print(f"Backend: {result.ocr_backend}")
print(f"Text regions: {len(result.text_regions)}")

4. Backend Diagnostics

from modules.game_vision_ai import GameVisionAI

# Run diagnostics
diag = GameVisionAI.diagnose()

# Check available backends
for backend in diag['ocr_backends']:
    print(f"{backend['name']}: {'Available' if backend['available'] else 'Not available'}")

Testing

Run Test Suite

python test_ocr_system.py

Run Demo

python demo_ocr.py

Run Examples

# Hardware detection
python vision_example.py --hardware

# List OCR backends
python vision_example.py --backends

# Full diagnostics
python vision_example.py --diagnostics

# Test with image
python vision_example.py --full path/to/screenshot.png

Installation

Option 1: Minimal (OpenCV EAST Only)

pip install opencv-python numpy pillow

Option 2: With EasyOCR

pip install torch torchvision  # May fail on Windows Store Python
pip install easyocr
pip install opencv-python numpy pillow

Option 3: With Tesseract

# Install Tesseract binary first
choco install tesseract  # Windows
# or download from https://github.com/UB-Mannheim/tesseract/wiki

pip install pytesseract opencv-python numpy pillow

Windows Store Python Compatibility

The Problem

OSError: [WinError 126] The specified module could not be found
File "torch\__init__.py", line xxx, in <module>
    from torch._C import *  # DLL load failed

The Solution

The system automatically:

Detects Windows Store Python
Detects PyTorch DLL errors on import
Excludes PyTorch-based backends from selection
Falls back to OpenCV EAST or Tesseract

Workarounds for Full PyTorch Support

Use Python from python.org instead of Windows Store
Use Anaconda/Miniconda for better compatibility
Use WSL2 (Windows Subsystem for Linux)

API Reference

Hardware Detection

from modules.hardware_detection import (
    HardwareDetector,
    print_hardware_summary,
    recommend_ocr_backend
)

# Get hardware info
info = HardwareDetector.detect_all()
print(f"PyTorch available: {info.pytorch_available}")
print(f"PyTorch DLL error: {info.pytorch_dll_error}")

# Get recommendation
recommended = recommend_ocr_backend()  # Returns: 'opencv_east', 'easyocr', etc.

OCR Backends

from modules.ocr_backends import OCRBackendFactory

# Check all backends
backends = OCRBackendFactory.check_all_backends()

# Create specific backend
backend = OCRBackendFactory.create_backend('opencv_east')

# Get best available
backend = OCRBackendFactory.get_best_backend()

Unified OCR

from modules.game_vision_ai import UnifiedOCRProcessor

# Auto-select best backend
ocr = UnifiedOCRProcessor()

# Force specific backend
ocr = UnifiedOCRProcessor(backend_priority=['tesseract', 'opencv_east'])

# Extract text
regions = ocr.extract_text("image.png")

# Switch backend
ocr.set_backend('tesseract')

Game Vision AI

from modules.game_vision_ai import GameVisionAI

# Initialize
vision = GameVisionAI()

# Or with specific backend
vision = GameVisionAI(ocr_backend='tesseract')

# Process screenshot
result = vision.process_screenshot("screenshot.png")

# Switch backend at runtime
vision.switch_ocr_backend('opencv_east')

Performance Notes

OpenCV EAST: ~97 FPS on GPU, ~23 FPS on CPU
EasyOCR: ~10 FPS on CPU, faster on GPU
Tesseract: Slower but very stable
PaddleOCR: Fastest with GPU, best accuracy

Troubleshooting

Issue	Solution
"No OCR backend available"	Install opencv-python
"PyTorch DLL error"	Use OpenCV EAST or Tesseract
"Tesseract not found"	Install Tesseract binary
Low accuracy	Use EasyOCR or PaddleOCR
Slow performance	Enable GPU or use OpenCV EAST

Future Enhancements

ONNX Runtime backend (lighter than PyTorch)
TensorFlow Lite backend
Custom trained models for game UI
YOLO-based UI element detection
Online learning for icon recognition

7.5 KiB Raw Permalink Blame History