feat: comprehensive settings dialog and UI cleanup

- Create new ui/settings_dialog.py with tabbed interface
- Tabs: General, Screenshot Hotkeys, Computer Vision, Advanced
- Remove old SettingsDialog from main_window.py
- Update on_settings() to use new comprehensive dialog
- Screenshot hotkey settings now in Settings (Ctrl+,)
- Computer Vision settings now in Settings dialog
This commit is contained in:
LemonNexus 2026-02-11 13:26:31 +00:00
parent 53f7896dfa
commit 27b3bd0fe1
14 changed files with 3390 additions and 413 deletions

227
OCR_SETUP.md Normal file
View File

@ -0,0 +1,227 @@
# Lemontropia Suite - OCR Setup Guide
This guide helps you set up OCR (Optical Character Recognition) for game text detection in Lemontropia Suite.
## Quick Start (No Installation Required)
The system works **out of the box** with OpenCV EAST text detection - no additional dependencies needed!
```python
from modules.game_vision_ai import GameVisionAI
# Initialize (auto-detects best available backend)
vision = GameVisionAI()
# Process screenshot
result = vision.process_screenshot("screenshot.png")
print(f"Detected {len(result.text_regions)} text regions")
```
## OCR Backend Options
The system supports multiple OCR backends, automatically selecting the best available one:
| Backend | Speed | Accuracy | Dependencies | Windows Store Python |
|---------|-------|----------|--------------|---------------------|
| **OpenCV EAST** | ⚡ Fastest | ⭐ Detection only | None (included) | ✅ Works |
| **EasyOCR** | 🚀 Fast | ⭐⭐⭐ Good | PyTorch | ❌ May fail |
| **Tesseract** | 🐢 Slow | ⭐⭐ Medium | Tesseract binary | ✅ Works |
| **PaddleOCR** | 🚀 Fast | ⭐⭐⭐⭐⭐ Best | PaddlePaddle | ❌ May fail |
## Windows Store Python Compatibility
⚠️ **Important**: If you're using Python from the Microsoft Store, PyTorch-based OCR (EasyOCR, PaddleOCR) may fail with DLL errors like:
```
OSError: [WinError 126] The specified module could not be found (c10.dll)
```
**Solutions:**
1. Use **OpenCV EAST** (works out of the box)
2. Use **Tesseract OCR** (install Tesseract binary)
3. Switch to Python from [python.org](https://python.org)
4. Use Anaconda/Miniconda instead
## Installation Options
### Option 1: OpenCV EAST Only (Recommended for Windows Store Python)
No additional installation needed! OpenCV is already included.
```bash
pip install opencv-python numpy pillow
```
### Option 2: With EasyOCR (Better accuracy, requires PyTorch)
```bash
# Install PyTorch first (see pytorch.org for your CUDA version)
pip install torch torchvision
# Then install EasyOCR
pip install easyocr
# Install remaining dependencies
pip install opencv-python numpy pillow
```
### Option 3: With Tesseract (Most stable)
1. **Install Tesseract OCR:**
- Windows: `choco install tesseract` or download from [UB Mannheim](https://github.com/UB-Mannheim/tesseract/wiki)
- Linux: `sudo apt-get install tesseract-ocr`
- macOS: `brew install tesseract`
2. **Install Python package:**
```bash
pip install pytesseract opencv-python numpy pillow
```
3. **(Windows only) Add Tesseract to PATH** or set path in code:
```python
from modules.ocr_backends import TesseractBackend
backend = TesseractBackend(tesseract_cmd=r"C:\Program Files\Tesseract-OCR\tesseract.exe")
```
### Option 4: Full Installation (All Backends)
```bash
# Install all OCR backends
pip install -r requirements-ocr.txt
# Or selectively:
pip install opencv-python numpy pillow easyocr pytesseract paddleocr
```
## Testing Your Setup
Run the test script to verify everything works:
```bash
python test_ocr_system.py
```
Expected output:
```
============================================================
HARDWARE DETECTION TEST
============================================================
GPU Backend: CPU
...
📋 Recommended OCR backend: opencv_east
============================================================
OCR BACKEND TESTS
============================================================
OPENCV_EAST:
Status: ✅ Available
GPU: 💻 CPU
...
🎉 All tests passed! OCR system is working correctly.
```
## Usage Examples
### Basic Text Detection
```python
from modules.game_vision_ai import GameVisionAI
# Initialize with auto-selected backend
vision = GameVisionAI()
# Process image
result = vision.process_screenshot("game_screenshot.png")
# Print detected text
for region in result.text_regions:
print(f"Text: {region.text} (confidence: {region.confidence:.2f})")
print(f"Location: {region.bbox}")
```
### Force Specific Backend
```python
from modules.game_vision_ai import GameVisionAI
# Use specific backend
vision = GameVisionAI(ocr_backend='tesseract')
# Or switch at runtime
vision.switch_ocr_backend('easyocr')
```
### Check Available Backends
```python
from modules.game_vision_ai import GameVisionAI
vision = GameVisionAI()
# List all backends
for backend in vision.get_ocr_backends():
print(f"{backend['name']}: {'Available' if backend['available'] else 'Not available'}")
```
### Hardware Diagnostics
```python
from modules.hardware_detection import print_hardware_summary
from modules.game_vision_ai import GameVisionAI
# Print hardware info
print_hardware_summary()
# Run full diagnostic
diag = GameVisionAI.diagnose()
print(diag)
```
## Troubleshooting
### "No OCR backend available"
- Make sure `opencv-python` is installed: `pip install opencv-python`
- The EAST model will auto-download on first use (~95MB)
### "PyTorch DLL error"
- You're likely using Windows Store Python
- Use OpenCV EAST or Tesseract instead
- Or install Python from [python.org](https://python.org)
### "Tesseract not found"
- Install Tesseract OCR binary (see Option 3 above)
- Add to PATH or specify path in code
### Low detection accuracy
- OpenCV EAST only detects text regions, doesn't recognize text
- For text recognition, use EasyOCR or PaddleOCR
- Ensure good screenshot quality and contrast
## Backend Priority
The system automatically selects backends in this priority order:
1. **PaddleOCR** - If PyTorch works and Paddle is installed
2. **EasyOCR** - If PyTorch works and EasyOCR is installed
3. **Tesseract** - If Tesseract binary is available
4. **OpenCV EAST** - Always works (ultimate fallback)
You can customize priority:
```python
from modules.game_vision_ai import UnifiedOCRProcessor
processor = UnifiedOCRProcessor(
backend_priority=['tesseract', 'opencv_east'] # Custom order
)
```
## Performance Tips
- **OpenCV EAST**: Fastest, use for real-time detection
- **GPU acceleration**: Significant speedup for EasyOCR/PaddleOCR
- **Preprocessing**: Better contrast = better OCR accuracy
- **Region of interest**: Crop to relevant areas for faster processing

105
UI_CLEANUP_SUMMARY.md Normal file
View File

@ -0,0 +1,105 @@
# Lemontropia Suite UI Cleanup - Summary
## Changes Made
### 1. New File: `ui/settings_dialog.py`
Created a comprehensive SettingsDialog that consolidates all settings into a single dialog with tabs:
**Tabs:**
- **📋 General**: Player settings (avatar name, log path), default activity, application settings
- **📸 Screenshot Hotkeys**: Hotkey configuration for screenshots (moved from separate Tools menu)
- **👁️ Computer Vision**: AI vision settings (OCR, icon detection, directories)
- **🎮 GPU & Performance**: GPU detection, backend selection, performance tuning
**Also includes:**
- `NewSessionTemplateDialog` - Moved from main_window.py
- `TemplateStatsDialog` - Moved from main_window.py
- Data classes: `PlayerSettings`, `ScreenshotHotkeySettings`, `VisionSettings`
### 2. Updated: `ui/main_window.py`
**Menu Structure Cleanup:**
```
File
- New Template (Ctrl+N)
- Exit (Alt+F4)
Session
- Start (F5)
- Stop (Shift+F5)
- Pause (F6)
Tools
- Loadout Manager (Ctrl+L)
- Computer Vision →
- Settings
- Calibrate
- Test
- Select Gear →
- Weapon (Ctrl+W)
- Armor (Ctrl+Shift+A)
- Finder (Ctrl+Shift+F)
- Medical Tool (Ctrl+M)
View
- Show HUD (F9)
- Hide HUD (F10)
- Session History (Ctrl+H)
- Screenshot Gallery (Ctrl+G)
- Settings (Ctrl+,)
Help
- Setup Wizard (Ctrl+Shift+W)
- About
```
**Code Organization:**
- Removed dialog classes (moved to settings_dialog.py)
- Cleaned up imports
- Removed orphaned screenshot hotkey menu item (now in Settings)
- Added tooltips to all menu actions
- Fixed menu separators for cleaner grouping
### 3. UI Audit Results
**Features with Menu Access:**
| Feature | Menu | Shortcut | Status |
|---------|------|----------|--------|
| Session History | View | Ctrl+H | ✅ |
| Gallery | View | Ctrl+G | ✅ |
| Loadout Manager | Tools | Ctrl+L | ✅ |
| Computer Vision | Tools (submenu) | - | ✅ |
| Setup Wizard | Help | Ctrl+Shift+W | ✅ |
| Settings | View | Ctrl+, | ✅ |
| Screenshot Hotkeys | Settings (tab) | - | ✅ (moved) |
| Select Gear | Tools (submenu) | Various | ✅ |
**Modules Analysis:**
- `crafting_tracker.py` - Backend module, no UI needed
- `loot_analyzer.py` - Backend module, no UI needed
- `game_vision.py`, `game_vision_ai.py` - Used by Vision dialogs
- `screenshot_hotkey.py` - Integrated into Settings
- Other modules - Backend/utilities
## Benefits
1. **Consolidated Settings**: All settings in one place with organized tabs
2. **Cleaner Menu Structure**: Logical grouping of features
3. **Better Code Organization**: Dialogs in separate file, main_window focused on main UI
4. **No Orphaned Features**: All major features accessible via menus
5. **Backward Compatibility**: Existing functionality preserved
## Files Modified
- `ui/settings_dialog.py` - NEW (consolidated settings)
- `ui/main_window.py` - UPDATED (clean menu structure)
## Testing Checklist
- [ ] Settings dialog opens with Ctrl+,
- [ ] All tabs accessible in Settings
- [ ] Player name saves correctly
- [ ] Screenshot hotkeys configurable in Settings
- [ ] Vision settings accessible
- [ ] All menu shortcuts work
- [ ] Loadout Manager opens with Ctrl+L
- [ ] Session History opens with Ctrl+H
- [ ] Gallery opens with Ctrl+G

41
modules/__init__.py Normal file
View File

@ -0,0 +1,41 @@
"""
Lemontropia Suite - Modules
Game automation and analysis modules.
"""
# OCR and Vision
from .game_vision_ai import GameVisionAI, UnifiedOCRProcessor
from .hardware_detection import (
HardwareDetector,
HardwareInfo,
GPUBackend,
get_hardware_info,
print_hardware_summary,
recommend_ocr_backend,
)
# OCR Backends
from .ocr_backends import (
BaseOCRBackend,
OCRTextRegion,
OCRBackendInfo,
OCRBackendFactory,
)
__all__ = [
# Vision
'GameVisionAI',
'UnifiedOCRProcessor',
# Hardware
'HardwareDetector',
'HardwareInfo',
'GPUBackend',
'get_hardware_info',
'print_hardware_summary',
'recommend_ocr_backend',
# OCR
'BaseOCRBackend',
'OCRTextRegion',
'OCRBackendInfo',
'OCRBackendFactory',
]

View File

@ -1,7 +1,14 @@
"""
Lemontropia Suite - Game Vision AI Module
Advanced computer vision with local GPU-accelerated AI models.
Supports OCR (PaddleOCR) and icon detection for game UI analysis.
Advanced computer vision with multiple OCR backends and GPU acceleration.
OCR Backends (in priority order):
1. OpenCV EAST - Fastest, no dependencies (primary fallback)
2. EasyOCR - Good accuracy, lighter than PaddleOCR
3. Tesseract OCR - Traditional, stable
4. PaddleOCR - Best accuracy (requires working PyTorch)
Handles PyTorch DLL errors on Windows Store Python gracefully.
"""
import cv2
@ -17,34 +24,17 @@ import hashlib
logger = logging.getLogger(__name__)
# Optional PyTorch import with fallback
try:
import torch
TORCH_AVAILABLE = True
except Exception as e:
logger.warning(f"PyTorch not available: {e}")
TORCH_AVAILABLE = False
torch = None
# Import hardware detection
from .hardware_detection import (
HardwareDetector, HardwareInfo, GPUBackend,
recommend_ocr_backend, get_hardware_info
)
# Import OpenCV text detector as fallback
from .opencv_text_detector import OpenCVTextDetector, TextDetection as OpenCVTextDetection
# Optional PaddleOCR import with fallback
try:
from paddleocr import PaddleOCR
PADDLE_AVAILABLE = True
except Exception as e:
logger.warning(f"PaddleOCR not available: {e}")
PADDLE_AVAILABLE = False
PaddleOCR = None
class GPUBackend(Enum):
"""Supported GPU backends."""
CUDA = "cuda" # NVIDIA CUDA
MPS = "mps" # Apple Metal Performance Shaders
DIRECTML = "directml" # Windows DirectML
CPU = "cpu" # Fallback CPU
# Import OCR backends
from .ocr_backends import (
BaseOCRBackend, OCRTextRegion, OCRBackendInfo,
OCRBackendFactory
)
@dataclass
@ -54,14 +44,27 @@ class TextRegion:
confidence: float
bbox: Tuple[int, int, int, int] # x, y, w, h
language: str = "en"
backend: str = "unknown" # Which OCR backend detected this
def to_dict(self) -> Dict[str, Any]:
return {
'text': self.text,
'confidence': self.confidence,
'bbox': self.bbox,
'language': self.language
'language': self.language,
'backend': self.backend
}
@classmethod
def from_ocr_region(cls, region: OCRTextRegion, backend: str = "unknown"):
"""Create from OCR backend region."""
return cls(
text=region.text,
confidence=region.confidence,
bbox=region.bbox,
language=region.language,
backend=backend
)
@dataclass
@ -105,6 +108,7 @@ class VisionResult:
icon_regions: List[IconRegion] = field(default_factory=list)
processing_time_ms: float = 0.0
gpu_backend: str = "cpu"
ocr_backend: str = "unknown"
timestamp: float = field(default_factory=time.time)
def to_dict(self) -> Dict[str, Any]:
@ -113,6 +117,7 @@ class VisionResult:
'icon_count': len(self.icon_regions),
'processing_time_ms': self.processing_time_ms,
'gpu_backend': self.gpu_backend,
'ocr_backend': self.ocr_backend,
'timestamp': self.timestamp
}
@ -123,153 +128,143 @@ class GPUDetector:
@staticmethod
def detect_backend() -> GPUBackend:
"""Detect best available GPU backend."""
# Check CUDA first (most common)
if torch.cuda.is_available():
logger.info(f"CUDA available: {torch.cuda.get_device_name(0)}")
return GPUBackend.CUDA
# Check Apple MPS
if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
logger.info("Apple MPS (Metal) available")
return GPUBackend.MPS
# Check DirectML on Windows
try:
import torch_directml
if torch_directml.is_available():
logger.info("DirectML available")
return GPUBackend.DIRECTML
except ImportError:
pass
logger.info("No GPU backend available, using CPU")
return GPUBackend.CPU
@staticmethod
def get_device_string(backend: GPUBackend) -> str:
"""Get PyTorch device string for backend."""
if backend == GPUBackend.CUDA:
return "cuda:0"
elif backend == GPUBackend.MPS:
return "mps"
elif backend == GPUBackend.DIRECTML:
return "privateuseone:0" # DirectML device
return "cpu"
info = HardwareDetector.detect_all()
return info.gpu_backend
@staticmethod
def get_gpu_info() -> Dict[str, Any]:
"""Get detailed GPU information."""
info = {
'backend': GPUDetector.detect_backend().value,
'cuda_available': torch.cuda.is_available(),
'mps_available': hasattr(torch.backends, 'mps') and torch.backends.mps.is_available(),
'devices': []
}
if torch.cuda.is_available():
for i in range(torch.cuda.device_count()):
info['devices'].append({
'id': i,
'name': torch.cuda.get_device_name(i),
'memory_total': torch.cuda.get_device_properties(i).total_memory
})
return info
info = HardwareDetector.detect_all()
return info.to_dict()
class OCRProcessor:
"""OCR text extraction using PaddleOCR or OpenCV fallback with GPU support."""
class UnifiedOCRProcessor:
"""
Unified OCR processor with multiple backend support.
SUPPORTED_LANGUAGES = ['en', 'sv', 'latin'] # English, Swedish, Latin script
Automatically selects the best available backend based on:
1. Hardware capabilities
2. PyTorch DLL compatibility
3. User preferences
def __init__(self, use_gpu: bool = True, lang: str = 'en'):
Gracefully falls through backends if one fails.
"""
SUPPORTED_LANGUAGES = ['en', 'sv', 'latin', 'de', 'fr', 'es']
# Default priority (can be overridden)
DEFAULT_PRIORITY = [
'paddleocr', # Best accuracy if available
'easyocr', # Good balance
'tesseract', # Stable fallback
'opencv_east', # Fastest, always works
]
def __init__(self, use_gpu: bool = True, lang: str = 'en',
backend_priority: Optional[List[str]] = None,
auto_select: bool = True):
"""
Initialize Unified OCR Processor.
Args:
use_gpu: Enable GPU acceleration if available
lang: Language for OCR ('en', 'sv', 'latin', etc.)
backend_priority: Custom backend priority order
auto_select: Automatically select best backend
"""
self.use_gpu = use_gpu
self.lang = lang if lang in self.SUPPORTED_LANGUAGES else 'en'
self.ocr = None
self.backend = GPUBackend.CPU
self.opencv_detector = None
self._primary_backend = None # 'paddle' or 'opencv'
self._init_ocr()
self.backend_priority = backend_priority or self.DEFAULT_PRIORITY
self._backend: Optional[BaseOCRBackend] = None
self._backend_name: str = "unknown"
self._hardware_info: HardwareInfo = HardwareDetector.detect_all()
# Initialize
if auto_select:
self._auto_select_backend()
logger.info(f"UnifiedOCR initialized with backend: {self._backend_name}")
def _init_ocr(self):
"""Initialize OCR with PaddleOCR or OpenCV fallback."""
# Try PaddleOCR first (better accuracy)
if PADDLE_AVAILABLE:
try:
self._init_paddle()
if self.ocr is not None:
self._primary_backend = 'paddle'
return
except Exception as e:
logger.warning(f"PaddleOCR init failed: {e}")
# Fallback to OpenCV text detection
logger.info("Using OpenCV text detection as fallback")
self.opencv_detector = OpenCVTextDetector(use_gpu=self.use_gpu)
if self.opencv_detector.is_available():
self._primary_backend = 'opencv'
self.backend = GPUBackend.CUDA if self.opencv_detector.check_gpu_available() else GPUBackend.CPU
logger.info(f"OpenCV text detector ready (GPU: {self.backend == GPUBackend.CUDA})")
def _auto_select_backend(self):
"""Automatically select the best available backend."""
# Check for PyTorch DLL errors first
if self._hardware_info.pytorch_dll_error:
logger.warning(
"PyTorch DLL error detected - avoiding PyTorch-based backends"
)
# Remove PyTorch-dependent backends from priority
safe_backends = [
b for b in self.backend_priority
if b not in ['paddleocr', 'easyocr']
]
else:
logger.error("No OCR backend available")
def _init_paddle(self):
"""Initialize PaddleOCR with appropriate backend."""
# Detect GPU
if self.use_gpu:
self.backend = GPUDetector.detect_backend()
use_gpu_flag = self.backend != GPUBackend.CPU
else:
use_gpu_flag = False
safe_backends = self.backend_priority
# Map language codes
lang_map = {
'en': 'en',
'sv': 'latin', # Swedish uses latin script model
'latin': 'latin'
}
paddle_lang = lang_map.get(self.lang, 'en')
# Get recommended backend
recommended = HardwareDetector.recommend_ocr_backend()
logger.info(f"Initializing PaddleOCR (lang={paddle_lang}, gpu={use_gpu_flag})")
# Try to create backend
for name in safe_backends:
backend = OCRBackendFactory.create_backend(
name,
use_gpu=self.use_gpu,
lang=self.lang
)
if backend is not None and backend.is_available():
self._backend = backend
self._backend_name = name
logger.info(f"Selected OCR backend: {name}")
return
self.ocr = PaddleOCR(
lang=paddle_lang,
use_gpu=use_gpu_flag,
show_log=False,
use_angle_cls=True,
det_db_thresh=0.3,
det_db_box_thresh=0.5,
rec_thresh=0.5,
# Ultimate fallback - OpenCV EAST always works
logger.warning("All preferred backends failed, trying OpenCV EAST...")
backend = OCRBackendFactory.create_backend(
'opencv_east',
use_gpu=self.use_gpu,
lang=self.lang
)
logger.info(f"PaddleOCR initialized successfully (backend: {self.backend.value})")
def preprocess_for_ocr(self, image: np.ndarray) -> np.ndarray:
"""Preprocess image for better OCR results."""
# Convert to grayscale if needed
if len(image.shape) == 3:
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
if backend is not None and backend.is_available():
self._backend = backend
self._backend_name = 'opencv_east'
logger.info("Using OpenCV EAST as ultimate fallback")
else:
gray = image
logger.error("CRITICAL: No OCR backend available!")
def set_backend(self, name: str) -> bool:
"""
Manually set OCR backend.
# Denoise
denoised = cv2.fastNlMeansDenoising(gray, None, 10, 7, 21)
# Adaptive threshold for better text contrast
binary = cv2.adaptiveThreshold(
denoised, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY, 11, 2
Args:
name: Backend name ('paddleocr', 'easyocr', 'tesseract', 'opencv_east')
Returns:
True if successful
"""
backend = OCRBackendFactory.create_backend(
name,
use_gpu=self.use_gpu,
lang=self.lang
)
return binary
if backend is not None and backend.is_available():
self._backend = backend
self._backend_name = name
logger.info(f"Switched to OCR backend: {name}")
return True
else:
logger.error(f"Failed to switch to OCR backend: {name}")
return False
def extract_text(self, image: Union[str, np.ndarray, Path]) -> List[TextRegion]:
"""
Extract text from image using PaddleOCR or OpenCV fallback.
Extract text from image using selected backend.
Args:
image: Image path or numpy array
Returns:
List of detected text regions
"""
@ -282,68 +277,29 @@ class OCRProcessor:
else:
img = image.copy()
# Use appropriate backend
if self._primary_backend == 'paddle' and self.ocr is not None:
return self._extract_text_paddle(img)
elif self._primary_backend == 'opencv' and self.opencv_detector is not None:
return self._extract_text_opencv(img)
else:
logger.warning("No OCR backend available")
# Check backend
if self._backend is None:
logger.error("No OCR backend available")
return []
def _extract_text_opencv(self, img: np.ndarray) -> List[TextRegion]:
"""Extract text using OpenCV EAST detector."""
detections = self.opencv_detector.detect_text(img)
# Convert to TextRegion format (no text recognition, just detection)
regions = []
for det in detections:
regions.append(TextRegion(
text="", # OpenCV detector doesn't recognize text, just finds regions
confidence=det.confidence,
bbox=det.bbox,
language=self.lang
))
return regions
def _extract_text_paddle(self, img: np.ndarray) -> List[TextRegion]:
"""Extract text using PaddleOCR."""
# Preprocess
processed = self.preprocess_for_ocr(img)
try:
# Run OCR
result = self.ocr.ocr(processed, cls=True)
# Extract text using backend
ocr_regions = self._backend.extract_text(img)
detected = []
if result and result[0]:
for line in result[0]:
if line is None:
continue
bbox, (text, confidence) = line
# Calculate bounding box
x_coords = [p[0] for p in bbox]
y_coords = [p[1] for p in bbox]
x, y = int(min(x_coords)), int(min(y_coords))
w = int(max(x_coords) - x)
h = int(max(y_coords) - y)
detected.append(TextRegion(
text=text.strip(),
confidence=float(confidence),
bbox=(x, y, w, h),
language=self.lang
))
# Convert to TextRegion with backend info
regions = [
TextRegion.from_ocr_region(r, self._backend_name)
for r in ocr_regions
]
return detected
logger.debug(f"Extracted {len(regions)} text regions using {self._backend_name}")
return regions
except Exception as e:
logger.error(f"OCR processing failed: {e}")
logger.error(f"OCR extraction failed: {e}")
return []
def extract_text_from_region(self, image: np.ndarray,
def extract_text_from_region(self, image: np.ndarray,
region: Tuple[int, int, int, int]) -> List[TextRegion]:
"""Extract text from specific region of image."""
x, y, w, h = region
@ -360,6 +316,34 @@ class OCRProcessor:
r.bbox = (x + rx, y + ry, rw, rh)
return regions
def get_available_backends(self) -> List[OCRBackendInfo]:
"""Get information about all available backends."""
return OCRBackendFactory.check_all_backends(self.use_gpu, self.lang)
def get_current_backend(self) -> str:
"""Get name of current backend."""
return self._backend_name
def get_backend_info(self) -> Dict[str, Any]:
"""Get information about current backend."""
if self._backend:
return self._backend.get_info().to_dict()
return {"error": "No backend initialized"}
def is_recognition_supported(self) -> bool:
"""
Check if current backend supports text recognition.
Note: OpenCV EAST only detects text regions, doesn't recognize text.
"""
return self._backend_name not in ['opencv_east']
# Legacy class for backward compatibility
class OCRProcessor(UnifiedOCRProcessor):
"""Legacy OCR processor - now wraps UnifiedOCRProcessor."""
pass
class IconDetector:
@ -395,13 +379,8 @@ class IconDetector:
logger.error(f"Failed to load template {template_file}: {e}")
def detect_loot_window(self, image: np.ndarray) -> Optional[Tuple[int, int, int, int]]:
"""
Detect loot window in screenshot.
Returns bounding box of loot window or None if not found.
"""
"""Detect loot window in screenshot."""
# Look for common loot window indicators
# Method 1: Template matching for "Loot" text or window frame
if 'loot_window' in self.templates:
result = cv2.matchTemplate(
image, self.templates['loot_window'], cv2.TM_CCOEFF_NORMED
@ -412,13 +391,9 @@ class IconDetector:
return (*max_loc, w, h)
# Method 2: Detect based on typical loot window characteristics
# Loot windows usually have a grid of items with consistent spacing
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Look for high-contrast regions that could be icons
_, thresh = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY)
# Find contours
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Filter for icon-sized squares
@ -427,7 +402,6 @@ class IconDetector:
x, y, w, h = cv2.boundingRect(cnt)
aspect = w / h if h > 0 else 0
# Check if dimensions match typical icon sizes
for size_name, (sw, sh) in self.ICON_SIZES.items():
if abs(w - sw) < 5 and abs(h - sh) < 5 and 0.8 < aspect < 1.2:
potential_icons.append((x, y, w, h))
@ -435,7 +409,6 @@ class IconDetector:
# If we found multiple icons in a grid pattern, assume loot window
if len(potential_icons) >= 2:
# Calculate bounding box of all icons
xs = [p[0] for p in potential_icons]
ys = [p[1] for p in potential_icons]
ws = [p[2] for p in potential_icons]
@ -444,7 +417,6 @@ class IconDetector:
min_x, max_x = min(xs), max(xs) + max(ws)
min_y, max_y = min(ys), max(ys) + max(hs)
# Add padding
padding = 20
return (
max(0, min_x - padding),
@ -455,20 +427,10 @@ class IconDetector:
return None
def extract_icons_from_region(self, image: np.ndarray,
def extract_icons_from_region(self, image: np.ndarray,
region: Tuple[int, int, int, int],
icon_size: str = 'medium') -> List[IconRegion]:
"""
Extract icons from a specific region (e.g., loot window).
Args:
image: Full screenshot
region: Bounding box (x, y, w, h)
icon_size: Size preset ('small', 'medium', 'large')
Returns:
List of detected icon regions
"""
"""Extract icons from a specific region."""
x, y, w, h = region
roi = image[y:y+h, x:x+w]
@ -478,7 +440,6 @@ class IconDetector:
target_size = self.ICON_SIZES.get(icon_size, (48, 48))
gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
# Multiple threshold attempts for different icon styles
icons = []
thresholds = [(200, 255), (180, 255), (150, 255)]
@ -490,35 +451,30 @@ class IconDetector:
cx, cy, cw, ch = cv2.boundingRect(cnt)
aspect = cw / ch if ch > 0 else 0
# Match icon size with tolerance
if (abs(cw - target_size[0]) < 8 and
abs(ch - target_size[1]) < 8 and
if (abs(cw - target_size[0]) < 8 and
abs(ch - target_size[1]) < 8 and
0.7 < aspect < 1.3):
# Extract icon image
icon_img = roi[cy:cy+ch, cx:cx+cw]
# Resize to standard size
icon_img = cv2.resize(icon_img, target_size, interpolation=cv2.INTER_AREA)
icons.append(IconRegion(
image=icon_img,
bbox=(x + cx, y + cy, cw, ch),
confidence=0.8 # Placeholder confidence
confidence=0.8
))
# Remove duplicates (icons that overlap significantly)
# Remove duplicates
unique_icons = self._remove_duplicate_icons(icons)
return unique_icons
def _remove_duplicate_icons(self, icons: List[IconRegion],
def _remove_duplicate_icons(self, icons: List[IconRegion],
iou_threshold: float = 0.5) -> List[IconRegion]:
"""Remove duplicate icons based on IoU."""
if not icons:
return []
# Sort by confidence
sorted_icons = sorted(icons, key=lambda x: x.confidence, reverse=True)
kept = []
@ -533,9 +489,9 @@ class IconDetector:
return kept
def _calculate_iou(self, box1: Tuple[int, int, int, int],
def _calculate_iou(self, box1: Tuple[int, int, int, int],
box2: Tuple[int, int, int, int]) -> float:
"""Calculate Intersection over Union of two bounding boxes."""
"""Calculate Intersection over Union."""
x1, y1, w1, h1 = box1
x2, y2, w2, h2 = box2
@ -551,33 +507,24 @@ class IconDetector:
union_area = box1_area + box2_area - inter_area
return inter_area / union_area if union_area > 0 else 0
def detect_icons_yolo(self, image: np.ndarray,
model_path: Optional[str] = None) -> List[IconRegion]:
"""
Detect icons using YOLO model (if available).
This is a placeholder for future YOLO integration.
"""
# TODO: Implement YOLO detection when model is trained
logger.debug("YOLO detection not yet implemented")
return []
class GameVisionAI:
"""
Main AI vision interface for game screenshot analysis.
Combines OCR and icon detection with GPU acceleration.
Combines OCR and icon detection with multiple backend support.
"""
def __init__(self, use_gpu: bool = True, ocr_lang: str = 'en',
ocr_backend: Optional[str] = None,
data_dir: Optional[Path] = None):
"""
Initialize Game Vision AI.
Args:
use_gpu: Enable GPU acceleration if available
ocr_lang: Language for OCR ('en', 'sv', 'latin')
ocr_lang: Language for OCR
ocr_backend: Specific OCR backend to use (None for auto)
data_dir: Directory for storing extracted data
"""
self.use_gpu = use_gpu
@ -585,42 +532,34 @@ class GameVisionAI:
self.extracted_icons_dir = self.data_dir / "extracted_icons"
self.extracted_icons_dir.mkdir(parents=True, exist_ok=True)
# Detect GPU
self.backend = GPUDetector.detect_backend() if use_gpu else GPUBackend.CPU
# Detect hardware
self.hardware_info = HardwareDetector.detect_all()
self.backend = self.hardware_info.gpu_backend
# Initialize processors
self.ocr = OCRProcessor(use_gpu=use_gpu, lang=ocr_lang)
# Initialize OCR processor
self.ocr = UnifiedOCRProcessor(
use_gpu=use_gpu,
lang=ocr_lang,
auto_select=(ocr_backend is None)
)
# Set specific backend if requested
if ocr_backend:
self.ocr.set_backend(ocr_backend)
# Initialize icon detector
self.icon_detector = IconDetector()
# Icon matching cache
self.icon_cache: Dict[str, ItemMatch] = {}
logger.info(f"GameVisionAI initialized (GPU: {self.backend.value})")
logger.info(f"GameVisionAI initialized (GPU: {self.backend.value}, "
f"OCR: {self.ocr.get_current_backend()})")
def extract_text_from_image(self, image_path: Union[str, Path]) -> List[TextRegion]:
"""
Extract all text from an image.
Args:
image_path: Path to screenshot image
Returns:
List of detected text regions
"""
"""Extract all text from an image."""
return self.ocr.extract_text(image_path)
def extract_icons_from_image(self, image_path: Union[str, Path],
auto_detect_window: bool = True) -> List[IconRegion]:
"""
Extract item icons from image.
Args:
image_path: Path to screenshot image
auto_detect_window: Automatically detect loot window
Returns:
List of detected icon regions
"""
"""Extract item icons from image."""
image = cv2.imread(str(image_path))
if image is None:
logger.error(f"Failed to load image: {image_path}")
@ -635,7 +574,6 @@ class GameVisionAI:
)
else:
logger.debug("No loot window detected, scanning full image")
# Scan full image
h, w = image.shape[:2]
return self.icon_detector.extract_icons_from_region(
image, (0, 0, w, h)
@ -646,26 +584,6 @@ class GameVisionAI:
image, (0, 0, w, h)
)
def match_icon_to_database(self, icon_image: np.ndarray,
database_path: Optional[Path] = None) -> Optional[ItemMatch]:
"""
Match extracted icon to item database.
Args:
icon_image: Icon image (numpy array)
database_path: Path to icon database directory
Returns:
ItemMatch if found, None otherwise
"""
from .icon_matcher import IconMatcher
# Lazy load matcher
if not hasattr(self, '_icon_matcher'):
self._icon_matcher = IconMatcher(database_path)
return self._icon_matcher.match_icon(icon_image)
def process_screenshot(self, image_path: Union[str, Path],
extract_text: bool = True,
extract_icons: bool = True) -> VisionResult:
@ -676,13 +594,16 @@ class GameVisionAI:
image_path: Path to screenshot
extract_text: Enable text extraction
extract_icons: Enable icon extraction
Returns:
VisionResult with all detections
"""
start_time = time.time()
result = VisionResult(gpu_backend=self.backend.value)
result = VisionResult(
gpu_backend=self.backend.value,
ocr_backend=self.ocr.get_current_backend()
)
# Load image once
image = cv2.imread(str(image_path))
@ -717,28 +638,31 @@ class GameVisionAI:
def get_gpu_info(self) -> Dict[str, Any]:
"""Get GPU information."""
return GPUDetector.get_gpu_info()
return self.hardware_info.to_dict()
def is_gpu_available(self) -> bool:
"""Check if GPU acceleration is available."""
return self.backend != GPUBackend.CPU
def get_ocr_backends(self) -> List[Dict[str, Any]]:
"""Get information about all available OCR backends."""
backends = self.ocr.get_available_backends()
return [b.to_dict() for b in backends]
def switch_ocr_backend(self, name: str) -> bool:
"""Switch to a different OCR backend."""
return self.ocr.set_backend(name)
def calibrate_for_game(self, sample_screenshots: List[Path]) -> Dict[str, Any]:
"""
Calibrate vision system using sample screenshots.
Args:
sample_screenshots: List of sample game screenshots
Returns:
Calibration results
"""
"""Calibrate vision system using sample screenshots."""
calibration = {
'screenshots_processed': 0,
'text_regions_detected': 0,
'icons_detected': 0,
'average_processing_time_ms': 0,
'detected_regions': {}
'detected_regions': {},
'ocr_backend': self.ocr.get_current_backend(),
'gpu_backend': self.backend.value,
}
total_time = 0
@ -763,17 +687,36 @@ class GameVisionAI:
)
return calibration
@staticmethod
def diagnose() -> Dict[str, Any]:
"""Run full diagnostic on vision system."""
return {
'hardware': HardwareDetector.detect_all().to_dict(),
'ocr_backends': [
b.to_dict() for b in
OCRBackendFactory.check_all_backends()
],
'recommendations': {
'ocr_backend': HardwareDetector.recommend_ocr_backend(),
'gpu': GPUDetector.detect_backend().value,
}
}
# Export main classes
__all__ = [
'GameVisionAI',
'UnifiedOCRProcessor',
'OCRProcessor', # Legacy
'TextRegion',
'IconRegion',
'IconRegion',
'ItemMatch',
'VisionResult',
'GPUBackend',
'GPUDetector',
'OCRProcessor',
'IconDetector'
'IconDetector',
'HardwareDetector',
'OCRBackendFactory',
'BaseOCRBackend',
]

View File

@ -0,0 +1,367 @@
"""
Lemontropia Suite - Hardware Detection Module
Detect GPU and ML framework availability with error handling.
"""
import logging
from typing import Dict, Any, Optional, List
from dataclasses import dataclass, field
from enum import Enum
logger = logging.getLogger(__name__)
class GPUBackend(Enum):
"""Supported GPU backends."""
CUDA = "cuda" # NVIDIA CUDA
MPS = "mps" # Apple Metal Performance Shaders
DIRECTML = "directml" # Windows DirectML
CPU = "cpu" # Fallback CPU
@dataclass
class HardwareInfo:
"""Complete hardware information."""
# GPU Info
gpu_backend: GPUBackend = GPUBackend.CPU
cuda_available: bool = False
cuda_device_count: int = 0
cuda_devices: List[Dict] = field(default_factory=list)
mps_available: bool = False
directml_available: bool = False
# OpenCV GPU
opencv_cuda_available: bool = False
opencv_cuda_devices: int = 0
# ML Frameworks
pytorch_available: bool = False
pytorch_version: Optional[str] = None
pytorch_error: Optional[str] = None
pytorch_dll_error: bool = False
paddle_available: bool = False
paddle_version: Optional[str] = None
# System
platform: str = "unknown"
python_executable: str = "unknown"
is_windows_store_python: bool = False
def to_dict(self) -> Dict[str, Any]:
return {
'gpu': {
'backend': self.gpu_backend.value,
'cuda_available': self.cuda_available,
'cuda_devices': self.cuda_devices,
'mps_available': self.mps_available,
'directml_available': self.directml_available,
'opencv_cuda': self.opencv_cuda_available,
},
'ml_frameworks': {
'pytorch': {
'available': self.pytorch_available,
'version': self.pytorch_version,
'error': self.pytorch_error,
'dll_error': self.pytorch_dll_error,
},
'paddle': {
'available': self.paddle_available,
'version': self.paddle_version,
}
},
'system': {
'platform': self.platform,
'python': self.python_executable,
'windows_store': self.is_windows_store_python,
}
}
class HardwareDetector:
"""Detect hardware capabilities with error handling."""
@staticmethod
def detect_all() -> HardwareInfo:
"""Detect all hardware capabilities."""
info = HardwareInfo()
# Detect system info
info = HardwareDetector._detect_system(info)
# Detect OpenCV GPU
info = HardwareDetector._detect_opencv_cuda(info)
# Detect PyTorch (with special error handling)
info = HardwareDetector._detect_pytorch_safe(info)
# Detect PaddlePaddle
info = HardwareDetector._detect_paddle(info)
# Determine best GPU backend
info = HardwareDetector._determine_gpu_backend(info)
return info
@staticmethod
def _detect_system(info: HardwareInfo) -> HardwareInfo:
"""Detect system information."""
import sys
import platform
info.platform = platform.system()
info.python_executable = sys.executable
# Detect Windows Store Python
exe_lower = sys.executable.lower()
info.is_windows_store_python = (
'windowsapps' in exe_lower or
'microsoft' in exe_lower
)
if info.is_windows_store_python:
logger.warning(
"Windows Store Python detected - may have DLL compatibility issues"
)
return info
@staticmethod
def _detect_opencv_cuda(info: HardwareInfo) -> HardwareInfo:
"""Detect OpenCV CUDA support."""
try:
import cv2
cuda_count = cv2.cuda.getCudaEnabledDeviceCount()
info.opencv_cuda_devices = cuda_count
info.opencv_cuda_available = cuda_count > 0
if info.opencv_cuda_available:
try:
device_name = cv2.cuda.getDevice().name()
logger.info(f"OpenCV CUDA device: {device_name}")
except:
logger.info(f"OpenCV CUDA available ({cuda_count} devices)")
except Exception as e:
logger.debug(f"OpenCV CUDA detection failed: {e}")
info.opencv_cuda_available = False
return info
@staticmethod
def _detect_pytorch_safe(info: HardwareInfo) -> HardwareInfo:
"""
Detect PyTorch with safe error handling for DLL issues.
This is critical for Windows Store Python compatibility.
"""
try:
import torch
info.pytorch_available = True
info.pytorch_version = torch.__version__
# Check CUDA
info.cuda_available = torch.cuda.is_available()
if info.cuda_available:
info.cuda_device_count = torch.cuda.device_count()
for i in range(info.cuda_device_count):
info.cuda_devices.append({
'id': i,
'name': torch.cuda.get_device_name(i),
'memory': torch.cuda.get_device_properties(i).total_memory
})
logger.info(f"PyTorch CUDA: {info.cuda_devices}")
# Check MPS (Apple Silicon)
if hasattr(torch.backends, 'mps'):
info.mps_available = torch.backends.mps.is_available()
if info.mps_available:
logger.info("PyTorch MPS (Metal) available")
logger.info(f"PyTorch {info.pytorch_version} available")
except ImportError:
info.pytorch_available = False
info.pytorch_error = "PyTorch not installed"
logger.debug("PyTorch not installed")
except OSError as e:
# DLL error - common with Windows Store Python
error_str = str(e).lower()
info.pytorch_available = False
info.pytorch_dll_error = True
info.pytorch_error = str(e)
if any(x in error_str for x in ['dll', 'c10', 'specified module']):
logger.error(
f"PyTorch DLL error (Windows Store Python?): {e}"
)
logger.info(
"This is a known issue. Use alternative OCR backends."
)
else:
logger.error(f"PyTorch OS error: {e}")
except Exception as e:
info.pytorch_available = False
info.pytorch_error = str(e)
logger.error(f"PyTorch detection failed: {e}")
return info
@staticmethod
def _detect_paddle(info: HardwareInfo) -> HardwareInfo:
"""Detect PaddlePaddle availability."""
try:
import paddle
info.paddle_available = True
info.paddle_version = paddle.__version__
logger.info(f"PaddlePaddle {info.paddle_version} available")
except ImportError:
info.paddle_available = False
logger.debug("PaddlePaddle not installed")
except Exception as e:
info.paddle_available = False
logger.debug(f"PaddlePaddle detection failed: {e}")
return info
@staticmethod
def _determine_gpu_backend(info: HardwareInfo) -> HardwareInfo:
"""Determine the best available GPU backend."""
# Priority: CUDA > MPS > DirectML > CPU
if info.cuda_available:
info.gpu_backend = GPUBackend.CUDA
elif info.mps_available:
info.gpu_backend = GPUBackend.MPS
elif info.directml_available:
info.gpu_backend = GPUBackend.DIRECTML
else:
info.gpu_backend = GPUBackend.CPU
return info
@staticmethod
def get_gpu_summary() -> str:
"""Get a human-readable GPU summary."""
info = HardwareDetector.detect_all()
lines = ["=" * 50]
lines.append("HARDWARE DETECTION SUMMARY")
lines.append("=" * 50)
# GPU Section
lines.append(f"\nGPU Backend: {info.gpu_backend.value.upper()}")
if info.cuda_available:
lines.append(f"CUDA Devices: {info.cuda_device_count}")
for dev in info.cuda_devices:
gb = dev['memory'] / (1024**3)
lines.append(f" [{dev['id']}] {dev['name']} ({gb:.1f} GB)")
if info.mps_available:
lines.append("Apple MPS (Metal): Available")
if info.opencv_cuda_available:
lines.append(f"OpenCV CUDA: {info.opencv_cuda_devices} device(s)")
# ML Frameworks
lines.append("\nML Frameworks:")
if info.pytorch_available:
lines.append(f" PyTorch: {info.pytorch_version}")
lines.append(f" CUDA: {'Yes' if info.cuda_available else 'No'}")
else:
lines.append(f" PyTorch: Not available")
if info.pytorch_dll_error:
lines.append(f" ⚠️ DLL Error (Windows Store Python?)")
if info.paddle_available:
lines.append(f" PaddlePaddle: {info.paddle_version}")
else:
lines.append(f" PaddlePaddle: Not installed")
# System
lines.append(f"\nSystem: {info.platform}")
if info.is_windows_store_python:
lines.append("⚠️ Windows Store Python (may have DLL issues)")
lines.append("=" * 50)
return "\n".join(lines)
@staticmethod
def can_use_paddleocr() -> bool:
"""Check if PaddleOCR can be used (no DLL errors)."""
info = HardwareDetector.detect_all()
return info.pytorch_available and not info.pytorch_dll_error
@staticmethod
def recommend_ocr_backend() -> str:
"""
Recommend the best OCR backend based on hardware.
Returns:
Name of recommended backend
"""
info = HardwareDetector.detect_all()
# If PyTorch has DLL error, avoid PaddleOCR and EasyOCR (which uses PyTorch)
if info.pytorch_dll_error:
logger.info("PyTorch DLL error detected - avoiding PyTorch-based OCR")
# Check OpenCV CUDA first
if info.opencv_cuda_available:
return 'opencv_east'
# Check Tesseract
try:
import pytesseract
return 'tesseract'
except ImportError:
pass
# Fall back to OpenCV EAST (CPU)
return 'opencv_east'
# No DLL issues - can use any backend
# Priority: PaddleOCR > EasyOCR > Tesseract > OpenCV EAST
if info.pytorch_available and info.paddle_available:
return 'paddleocr'
if info.pytorch_available:
try:
import easyocr
return 'easyocr'
except ImportError:
pass
try:
import pytesseract
return 'tesseract'
except ImportError:
pass
return 'opencv_east'
# Convenience functions
def get_hardware_info() -> HardwareInfo:
"""Get complete hardware information."""
return HardwareDetector.detect_all()
def print_hardware_summary():
"""Print hardware summary to console."""
print(HardwareDetector.get_gpu_summary())
def recommend_ocr_backend() -> str:
"""Get recommended OCR backend."""
return HardwareDetector.recommend_ocr_backend()

View File

@ -0,0 +1,254 @@
"""
Lemontropia Suite - OCR Backends Base Interface
Unified interface for multiple OCR backends with auto-fallback.
"""
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import List, Tuple, Optional, Dict, Any, Union
from pathlib import Path
import numpy as np
import logging
logger = logging.getLogger(__name__)
@dataclass
class OCRTextRegion:
"""Detected text region with metadata."""
text: str
confidence: float
bbox: Tuple[int, int, int, int] # x, y, w, h
language: str = "en"
def to_dict(self) -> Dict[str, Any]:
return {
'text': self.text,
'confidence': self.confidence,
'bbox': self.bbox,
'language': self.language
}
@dataclass
class OCRBackendInfo:
"""Information about an OCR backend."""
name: str
available: bool
gpu_accelerated: bool = False
error_message: Optional[str] = None
version: Optional[str] = None
def to_dict(self) -> Dict[str, Any]:
return {
'name': self.name,
'available': self.available,
'gpu_accelerated': self.gpu_accelerated,
'error_message': self.error_message,
'version': self.version
}
class BaseOCRBackend(ABC):
"""Abstract base class for OCR backends."""
NAME = "base"
SUPPORTS_GPU = False
def __init__(self, use_gpu: bool = True, lang: str = 'en', **kwargs):
self.use_gpu = use_gpu
self.lang = lang
self._available = False
self._error_msg = None
self._version = None
@abstractmethod
def _initialize(self) -> bool:
"""Initialize the backend. Return True if successful."""
pass
@abstractmethod
def extract_text(self, image: np.ndarray) -> List[OCRTextRegion]:
"""Extract text from image."""
pass
def is_available(self) -> bool:
"""Check if backend is available."""
return self._available
def get_info(self) -> OCRBackendInfo:
"""Get backend information."""
return OCRBackendInfo(
name=self.NAME,
available=self._available,
gpu_accelerated=self.SUPPORTS_GPU and self.use_gpu,
error_message=self._error_msg,
version=self._version
)
def preprocess_image(self, image: np.ndarray,
grayscale: bool = True,
denoise: bool = True,
contrast: bool = True) -> np.ndarray:
"""Preprocess image for better OCR results."""
processed = image.copy()
# Convert to grayscale if needed
if grayscale and len(processed.shape) == 3:
processed = self._to_grayscale(processed)
# Denoise
if denoise:
processed = self._denoise(processed)
# Enhance contrast
if contrast:
processed = self._enhance_contrast(processed)
return processed
def _to_grayscale(self, image: np.ndarray) -> np.ndarray:
"""Convert image to grayscale."""
if len(image.shape) == 3:
import cv2
return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
return image
def _denoise(self, image: np.ndarray) -> np.ndarray:
"""Denoise image."""
import cv2
if len(image.shape) == 2:
return cv2.fastNlMeansDenoising(image, None, 10, 7, 21)
return image
def _enhance_contrast(self, image: np.ndarray) -> np.ndarray:
"""Enhance image contrast."""
import cv2
if len(image.shape) == 2:
# CLAHE (Contrast Limited Adaptive Histogram Equalization)
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
return clahe.apply(image)
return image
class OCRBackendFactory:
"""Factory for creating OCR backends with auto-fallback."""
# Priority order: fastest/most reliable first
BACKEND_PRIORITY = [
'opencv_east', # Fastest, no dependencies, detection only
'easyocr', # Good accuracy, lighter than PaddleOCR
'tesseract', # Traditional, stable
'paddleocr', # Best accuracy but heavy dependencies
]
_backends: Dict[str, Any] = {}
_backend_classes: Dict[str, type] = {}
@classmethod
def register_backend(cls, name: str, backend_class: type):
"""Register a backend class."""
cls._backend_classes[name] = backend_class
logger.debug(f"Registered OCR backend: {name}")
@classmethod
def create_backend(cls, name: str, use_gpu: bool = True,
lang: str = 'en', **kwargs) -> Optional[BaseOCRBackend]:
"""Create a specific backend by name."""
if name not in cls._backend_classes:
logger.error(f"Unknown OCR backend: {name}")
return None
try:
backend = cls._backend_classes[name](use_gpu=use_gpu, lang=lang, **kwargs)
if backend._initialize():
logger.info(f"Created OCR backend: {name}")
return backend
else:
logger.warning(f"Failed to initialize OCR backend: {name}")
return None
except Exception as e:
logger.error(f"Error creating OCR backend {name}: {e}")
return None
@classmethod
def get_best_backend(cls, use_gpu: bool = True, lang: str = 'en',
priority: Optional[List[str]] = None,
**kwargs) -> Optional[BaseOCRBackend]:
"""Get the best available backend based on priority order."""
priority = priority or cls.BACKEND_PRIORITY
logger.info(f"Searching for best OCR backend (priority: {priority})")
for name in priority:
if name not in cls._backend_classes:
continue
backend = cls.create_backend(name, use_gpu=use_gpu, lang=lang, **kwargs)
if backend is not None and backend.is_available():
info = backend.get_info()
logger.info(f"Selected OCR backend: {name} (GPU: {info.gpu_accelerated})")
return backend
logger.error("No OCR backend available!")
return None
@classmethod
def check_all_backends(cls, use_gpu: bool = True, lang: str = 'en') -> List[OCRBackendInfo]:
"""Check availability of all backends."""
results = []
for name in cls.BACKEND_PRIORITY:
if name not in cls._backend_classes:
continue
try:
backend = cls._backend_classes[name](use_gpu=use_gpu, lang=lang)
backend._initialize()
results.append(backend.get_info())
except Exception as e:
results.append(OCRBackendInfo(
name=name,
available=False,
error_message=str(e)
))
return results
@classmethod
def list_available_backends(cls, use_gpu: bool = True, lang: str = 'en') -> List[str]:
"""List names of available backends."""
info_list = cls.check_all_backends(use_gpu, lang)
return [info.name for info in info_list if info.available]
# Import and register backends
def _register_backends():
"""Register all available backends."""
try:
from .opencv_east_backend import OpenCVEASTBackend
OCRBackendFactory.register_backend('opencv_east', OpenCVEASTBackend)
except ImportError as e:
logger.debug(f"OpenCV EAST backend not available: {e}")
try:
from .easyocr_backend import EasyOCRBackend
OCRBackendFactory.register_backend('easyocr', EasyOCRBackend)
except ImportError as e:
logger.debug(f"EasyOCR backend not available: {e}")
try:
from .tesseract_backend import TesseractBackend
OCRBackendFactory.register_backend('tesseract', TesseractBackend)
except ImportError as e:
logger.debug(f"Tesseract backend not available: {e}")
try:
from .paddleocr_backend import PaddleOCRBackend
OCRBackendFactory.register_backend('paddleocr', PaddleOCRBackend)
except ImportError as e:
logger.debug(f"PaddleOCR backend not available: {e}")
# Auto-register on import
_register_backends()

View File

@ -0,0 +1,184 @@
"""
Lemontropia Suite - EasyOCR Backend
Text recognition using EasyOCR - lighter than PaddleOCR.
"""
import numpy as np
import logging
from typing import List, Optional
from . import BaseOCRBackend, OCRTextRegion
logger = logging.getLogger(__name__)
class EasyOCRBackend(BaseOCRBackend):
"""
OCR backend using EasyOCR.
Pros:
- Lighter than PaddleOCR
- Good accuracy
- Supports many languages
- Can run on CPU reasonably well
Cons:
- First run downloads models (~100MB)
- Slower than OpenCV EAST
Installation: pip install easyocr
"""
NAME = "easyocr"
SUPPORTS_GPU = True
def __init__(self, use_gpu: bool = True, lang: str = 'en', **kwargs):
super().__init__(use_gpu=use_gpu, lang=lang, **kwargs)
self.reader = None
self._gpu_available = False
# Language mapping
self.lang_map = {
'en': 'en',
'sv': 'sv', # Swedish
'de': 'de',
'fr': 'fr',
'es': 'es',
'latin': 'latin',
}
def _initialize(self) -> bool:
"""Initialize EasyOCR reader."""
try:
import easyocr
# Map language code
easyocr_lang = self.lang_map.get(self.lang, 'en')
# Check GPU availability
self._gpu_available = self._check_gpu()
use_gpu_flag = self.use_gpu and self._gpu_available
logger.info(f"Initializing EasyOCR (lang={easyocr_lang}, gpu={use_gpu_flag})")
# Create reader
# EasyOCR downloads models automatically on first run
self.reader = easyocr.Reader(
[easyocr_lang],
gpu=use_gpu_flag,
verbose=False
)
self._available = True
self._version = easyocr.__version__ if hasattr(easyocr, '__version__') else 'unknown'
logger.info(f"EasyOCR initialized successfully (GPU: {use_gpu_flag})")
return True
except ImportError:
self._error_msg = "EasyOCR not installed. Run: pip install easyocr"
logger.warning(self._error_msg)
return False
except Exception as e:
# Handle specific PyTorch/CUDA errors
error_str = str(e).lower()
if 'cuda' in error_str or 'c10' in error_str or 'gpu' in error_str:
self._error_msg = f"EasyOCR GPU initialization failed: {e}"
logger.warning(f"{self._error_msg}. Try with use_gpu=False")
# Try CPU fallback
if self.use_gpu:
logger.info("Attempting EasyOCR CPU fallback...")
self.use_gpu = False
return self._initialize()
else:
self._error_msg = f"EasyOCR initialization failed: {e}"
logger.error(self._error_msg)
return False
def _check_gpu(self) -> bool:
"""Check if GPU is available for EasyOCR."""
try:
import torch
if torch.cuda.is_available():
logger.info(f"CUDA available: {torch.cuda.get_device_name(0)}")
return True
# Check MPS (Apple Silicon)
if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
logger.info("Apple MPS available")
return True
return False
except ImportError:
return False
except Exception as e:
logger.debug(f"GPU check failed: {e}")
return False
def extract_text(self, image: np.ndarray) -> List[OCRTextRegion]:
"""
Extract text from image using EasyOCR.
Args:
image: Input image (BGR format from OpenCV)
Returns:
List of detected text regions with recognized text
"""
if not self._available or self.reader is None:
logger.error("EasyOCR backend not initialized")
return []
try:
# EasyOCR expects RGB format
if len(image.shape) == 3:
import cv2
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
else:
image_rgb = image
# Run OCR
results = self.reader.readtext(image_rgb)
regions = []
for detection in results:
# EasyOCR returns: (bbox, text, confidence)
bbox, text, conf = detection
# Calculate bounding box from polygon
# bbox is list of 4 points: [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
x_coords = [p[0] for p in bbox]
y_coords = [p[1] for p in bbox]
x = int(min(x_coords))
y = int(min(y_coords))
w = int(max(x_coords) - x)
h = int(max(y_coords) - y)
regions.append(OCRTextRegion(
text=text.strip(),
confidence=float(conf),
bbox=(x, y, w, h),
language=self.lang
))
logger.debug(f"EasyOCR detected {len(regions)} text regions")
return regions
except Exception as e:
logger.error(f"EasyOCR extraction failed: {e}")
return []
def get_info(self):
"""Get backend information."""
info = super().get_info()
info.gpu_accelerated = self._gpu_available and self.use_gpu
return info

View File

@ -0,0 +1,315 @@
"""
Lemontropia Suite - OpenCV EAST OCR Backend
Fast text detection using OpenCV DNN with EAST model.
No heavy dependencies, works with Windows Store Python.
"""
import cv2
import numpy as np
import logging
from pathlib import Path
from typing import List, Tuple, Optional
import urllib.request
from . import BaseOCRBackend, OCRTextRegion
logger = logging.getLogger(__name__)
class OpenCVEASTBackend(BaseOCRBackend):
"""
Text detector using OpenCV DNN with EAST model.
This is the primary fallback backend because:
- Pure OpenCV, no PyTorch/TensorFlow dependencies
- Fast (CPU: ~23 FPS, GPU: ~97 FPS)
- Works with Windows Store Python
- Detects text regions (does not recognize text)
Based on: https://pyimagesearch.com/2022/03/14/improving-text-detection-speed-with-opencv-and-gpus/
"""
NAME = "opencv_east"
SUPPORTS_GPU = True
# EAST model download URL (frozen inference graph)
EAST_MODEL_URL = "https://github.com/oyyd/frozen_east_text_detection.pb/raw/master/frozen_east_text_detection.pb"
def __init__(self, use_gpu: bool = True, lang: str = 'en', **kwargs):
super().__init__(use_gpu=use_gpu, lang=lang, **kwargs)
self.net = None
self.model_path = kwargs.get('model_path')
# Input size (must be multiple of 32)
self.input_width = kwargs.get('input_width', 320)
self.input_height = kwargs.get('input_height', 320)
# Detection thresholds
self.confidence_threshold = kwargs.get('confidence_threshold', 0.5)
self.nms_threshold = kwargs.get('nms_threshold', 0.4)
# GPU status
self._gpu_enabled = False
def _initialize(self) -> bool:
"""Initialize EAST text detector."""
try:
# Determine model path
if not self.model_path:
model_dir = Path.home() / ".lemontropia" / "models"
model_dir.mkdir(parents=True, exist_ok=True)
self.model_path = str(model_dir / "frozen_east_text_detection.pb")
model_file = Path(self.model_path)
# Download model if needed
if not model_file.exists():
if not self._download_model():
return False
# Load the model
logger.info(f"Loading EAST model from {self.model_path}")
self.net = cv2.dnn.readNet(self.model_path)
# Enable GPU if requested
if self.use_gpu:
self._gpu_enabled = self._enable_gpu()
self._available = True
self._version = cv2.__version__
logger.info(f"OpenCV EAST backend initialized (GPU: {self._gpu_enabled})")
return True
except Exception as e:
self._error_msg = f"Failed to initialize EAST: {e}"
logger.error(self._error_msg)
return False
def _download_model(self) -> bool:
"""Download EAST model if not present."""
try:
logger.info(f"Downloading EAST model from {self.EAST_MODEL_URL}")
logger.info(f"This is a one-time download (~95 MB)...")
# Create progress callback
def progress_hook(count, block_size, total_size):
percent = int(count * block_size * 100 / total_size)
if percent % 10 == 0: # Log every 10%
logger.info(f"Download progress: {percent}%")
urllib.request.urlretrieve(
self.EAST_MODEL_URL,
self.model_path,
reporthook=progress_hook
)
logger.info("EAST model downloaded successfully")
return True
except Exception as e:
self._error_msg = f"Failed to download EAST model: {e}"
logger.error(self._error_msg)
return False
def _enable_gpu(self) -> bool:
"""Enable CUDA GPU acceleration."""
try:
# Check CUDA availability
cuda_count = cv2.cuda.getCudaEnabledDeviceCount()
if cuda_count > 0:
self.net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
self.net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
# Get device info
try:
device_name = cv2.cuda.getDevice().name()
logger.info(f"CUDA enabled: {device_name}")
except:
logger.info(f"CUDA enabled ({cuda_count} device(s))")
return True
else:
logger.warning("CUDA not available in OpenCV, using CPU")
return False
except Exception as e:
logger.warning(f"Failed to enable CUDA: {e}, using CPU")
return False
def extract_text(self, image: np.ndarray) -> List[OCRTextRegion]:
"""
Detect text regions in image.
Note: EAST only detects text regions, it does not recognize text.
The 'text' field will be empty, but bbox and confidence are accurate.
Args:
image: Input image (BGR format from OpenCV)
Returns:
List of detected text regions
"""
if not self._available or self.net is None:
logger.error("EAST backend not initialized")
return []
try:
# Get image dimensions
(H, W) = image.shape[:2]
# Resize to input size
resized = cv2.resize(image, (self.input_width, self.input_height))
# Create blob from image
blob = cv2.dnn.blobFromImage(
resized,
scalefactor=1.0,
size=(self.input_width, self.input_height),
mean=(123.68, 116.78, 103.94), # ImageNet means
swapRB=True,
crop=False
)
# Forward pass
self.net.setInput(blob)
layer_names = [
"feature_fusion/Conv_7/Sigmoid", # Scores
"feature_fusion/concat_3" # Geometry
]
scores, geometry = self.net.forward(layer_names)
# Decode predictions
rectangles, confidences = self._decode_predictions(scores, geometry)
# Apply non-maximum suppression
boxes = self._apply_nms(rectangles, confidences)
# Scale boxes back to original image size
ratio_w = W / float(self.input_width)
ratio_h = H / float(self.input_height)
regions = []
for (startX, startY, endX, endY, conf) in boxes:
# Scale coordinates
startX = int(startX * ratio_w)
startY = int(startY * ratio_h)
endX = int(endX * ratio_w)
endY = int(endY * ratio_h)
# Ensure valid coordinates
startX = max(0, startX)
startY = max(0, startY)
endX = min(W, endX)
endY = min(H, endY)
w = endX - startX
h = endY - startY
if w > 0 and h > 0:
regions.append(OCRTextRegion(
text="", # EAST doesn't recognize text
confidence=float(conf),
bbox=(startX, startY, w, h),
language=self.lang
))
logger.debug(f"EAST detected {len(regions)} text regions")
return regions
except Exception as e:
logger.error(f"EAST detection failed: {e}")
return []
def _decode_predictions(self, scores: np.ndarray,
geometry: np.ndarray) -> Tuple[List, List]:
"""Decode EAST model output to bounding boxes."""
(num_rows, num_cols) = scores.shape[2:4]
rectangles = []
confidences = []
for y in range(0, num_rows):
scores_data = scores[0, 0, y]
x0 = geometry[0, 0, y]
x1 = geometry[0, 1, y]
x2 = geometry[0, 2, y]
x3 = geometry[0, 3, y]
angles = geometry[0, 4, y]
for x in range(0, num_cols):
if scores_data[x] < self.confidence_threshold:
continue
# Compute offset
offset_x = x * 4.0
offset_y = y * 4.0
# Extract rotation angle and compute cos/sin
angle = angles[x]
cos = np.cos(angle)
sin = np.sin(angle)
# Compute box dimensions
h = x0[x] + x2[x]
w = x1[x] + x3[x]
# Compute box coordinates
end_x = int(offset_x + (cos * x1[x]) + (sin * x2[x]))
end_y = int(offset_y - (sin * x1[x]) + (cos * x2[x]))
start_x = int(end_x - w)
start_y = int(end_y - h)
rectangles.append((start_x, start_y, end_x, end_y))
confidences.append(scores_data[x])
return rectangles, confidences
def _apply_nms(self, rectangles: List, confidences: List) -> List[Tuple]:
"""Apply non-maximum suppression."""
if not rectangles:
return []
# Convert to float32 for NMS
boxes = np.array(rectangles, dtype=np.float32)
confidences = np.array(confidences, dtype=np.float32)
# OpenCV NMSBoxes expects (x, y, w, h) format
nms_boxes = []
for (x1, y1, x2, y2) in boxes:
nms_boxes.append([x1, y1, x2 - x1, y2 - y1])
# Apply NMS
indices = cv2.dnn.NMSBoxes(
nms_boxes,
confidences,
self.confidence_threshold,
self.nms_threshold
)
results = []
if len(indices) > 0:
# Handle different OpenCV versions
if isinstance(indices, tuple):
indices = indices[0]
for i in indices.flatten() if hasattr(indices, 'flatten') else indices:
x1, y1, x2, y2 = rectangles[i]
results.append((x1, y1, x2, y2, confidences[i]))
return results
def get_info(self):
"""Get backend information."""
info = super().get_info()
info.gpu_accelerated = self._gpu_enabled
return info
@staticmethod
def is_opencv_cuda_available() -> bool:
"""Check if OpenCV was built with CUDA support."""
try:
return cv2.cuda.getCudaEnabledDeviceCount() > 0
except:
return False

View File

@ -0,0 +1,294 @@
"""
Lemontropia Suite - PaddleOCR Backend
High-accuracy OCR using PaddleOCR - best quality but heavy dependencies.
"""
import numpy as np
import logging
from typing import List, Optional
from . import BaseOCRBackend, OCRTextRegion
logger = logging.getLogger(__name__)
class PaddleOCRBackend(BaseOCRBackend):
"""
OCR backend using PaddleOCR.
Pros:
- Best accuracy among open-source OCR
- Good multilingual support
- Fast with GPU
Cons:
- Heavy dependencies (PyTorch/PaddlePaddle)
- Can fail with DLL errors on Windows Store Python
- Large model download
Installation: pip install paddleocr
Note: This backend has special handling for PyTorch/Paddle DLL errors
that commonly occur with Windows Store Python installations.
"""
NAME = "paddleocr"
SUPPORTS_GPU = True
def __init__(self, use_gpu: bool = True, lang: str = 'en', **kwargs):
super().__init__(use_gpu=use_gpu, lang=lang, **kwargs)
self.ocr = None
self._gpu_available = False
self._dll_error = False # Track if we hit a DLL error
# Language mapping for PaddleOCR
self.lang_map = {
'en': 'en',
'sv': 'latin', # Swedish uses latin script
'de': 'latin',
'fr': 'latin',
'es': 'latin',
'latin': 'latin',
}
# Detection thresholds
self.det_db_thresh = kwargs.get('det_db_thresh', 0.3)
self.det_db_box_thresh = kwargs.get('det_db_box_thresh', 0.5)
self.rec_thresh = kwargs.get('rec_thresh', 0.5)
def _initialize(self) -> bool:
"""Initialize PaddleOCR with PyTorch DLL error handling."""
try:
# First, check if PyTorch is importable without DLL errors
if not self._check_pytorch():
return False
# Import PaddleOCR
from paddleocr import PaddleOCR as PPOCR
# Map language
paddle_lang = self.lang_map.get(self.lang, 'en')
# Check GPU availability
self._gpu_available = self._check_gpu()
use_gpu_flag = self.use_gpu and self._gpu_available
logger.info(f"Initializing PaddleOCR (lang={paddle_lang}, gpu={use_gpu_flag})")
# Initialize PaddleOCR
self.ocr = PPOCR(
lang=paddle_lang,
use_gpu=use_gpu_flag,
show_log=False,
use_angle_cls=True,
det_db_thresh=self.det_db_thresh,
det_db_box_thresh=self.det_db_box_thresh,
rec_thresh=self.rec_thresh,
)
self._available = True
self._version = "2.x" # PaddleOCR doesn't expose version easily
logger.info(f"PaddleOCR initialized successfully (GPU: {use_gpu_flag})")
return True
except ImportError as e:
self._error_msg = f"PaddleOCR not installed. Run: pip install paddleocr"
logger.warning(self._error_msg)
return False
except Exception as e:
error_str = str(e).lower()
# Check for common DLL-related errors
if any(x in error_str for x in ['dll', 'c10', 'torch', 'paddle', 'lib']):
self._dll_error = True
self._error_msg = f"PaddleOCR DLL error (Windows Store Python?): {e}"
logger.warning(self._error_msg)
logger.info("This is a known issue with Windows Store Python. Using fallback OCR.")
else:
self._error_msg = f"PaddleOCR initialization failed: {e}"
logger.error(self._error_msg)
return False
def _check_pytorch(self) -> bool:
"""
Check if PyTorch can be imported without DLL errors.
This is the critical check for Windows Store Python compatibility.
"""
try:
# Try importing torch - this is where DLL errors typically occur
import torch
# Try a simple operation to verify it works
_ = torch.__version__
logger.debug("PyTorch import successful")
return True
except ImportError:
self._error_msg = "PyTorch not installed"
logger.warning(self._error_msg)
return False
except OSError as e:
# This is the Windows Store Python DLL error
error_str = str(e).lower()
if 'dll' in error_str or 'c10' in error_str or 'specified module' in error_str:
self._dll_error = True
self._error_msg = (
f"PyTorch DLL load failed: {e}\n"
"This is a known issue with Windows Store Python.\n"
"Solutions:\n"
"1. Use Python from python.org instead of Windows Store\n"
"2. Install PyTorch with conda instead of pip\n"
"3. Use alternative OCR backend (EasyOCR, Tesseract, or OpenCV EAST)"
)
logger.error(self._error_msg)
else:
self._error_msg = f"PyTorch load failed: {e}"
logger.error(self._error_msg)
return False
except Exception as e:
self._error_msg = f"Unexpected PyTorch error: {e}"
logger.error(self._error_msg)
return False
def _check_gpu(self) -> bool:
"""Check if GPU is available for PaddleOCR."""
try:
import torch
if torch.cuda.is_available():
device_name = torch.cuda.get_device_name(0)
logger.info(f"CUDA available: {device_name}")
return True
# Check for MPS (Apple Silicon)
if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
logger.info("Apple MPS available")
return True
return False
except Exception as e:
logger.debug(f"GPU check failed: {e}")
return False
def extract_text(self, image: np.ndarray) -> List[OCRTextRegion]:
"""
Extract text from image using PaddleOCR.
Args:
image: Input image (BGR format from OpenCV)
Returns:
List of detected text regions with recognized text
"""
if not self._available or self.ocr is None:
logger.error("PaddleOCR backend not initialized")
return []
try:
# Preprocess image
processed = self.preprocess_image(image)
# Run OCR
result = self.ocr.ocr(processed, cls=True)
regions = []
if result and result[0]:
for line in result[0]:
if line is None:
continue
# Parse result: [bbox, (text, confidence)]
bbox, (text, conf) = line
# Calculate bounding box from polygon
x_coords = [p[0] for p in bbox]
y_coords = [p[1] for p in bbox]
x = int(min(x_coords))
y = int(min(y_coords))
w = int(max(x_coords) - x)
h = int(max(y_coords) - y)
regions.append(OCRTextRegion(
text=text.strip(),
confidence=float(conf),
bbox=(x, y, w, h),
language=self.lang
))
logger.debug(f"PaddleOCR detected {len(regions)} text regions")
return regions
except Exception as e:
logger.error(f"PaddleOCR extraction failed: {e}")
return []
def get_info(self):
"""Get backend information."""
info = super().get_info()
info.gpu_accelerated = self._gpu_available and self.use_gpu
if self._dll_error:
info.error_message = "PyTorch DLL error - incompatible with Windows Store Python"
return info
def has_dll_error(self) -> bool:
"""Check if this backend failed due to DLL error."""
return self._dll_error
@staticmethod
def diagnose_windows_store_python() -> dict:
"""
Diagnose if running Windows Store Python and potential issues.
Returns:
Dictionary with diagnostic information
"""
import sys
import platform
diag = {
'platform': platform.system(),
'python_version': sys.version,
'executable': sys.executable,
'is_windows_store': False,
'pytorch_importable': False,
'recommendations': []
}
# Check if Windows Store Python
exe_path = sys.executable.lower()
if 'windowsapps' in exe_path or 'microsoft' in exe_path:
diag['is_windows_store'] = True
diag['recommendations'].append(
"You are using Windows Store Python which has known DLL compatibility issues."
)
# Check PyTorch
try:
import torch
diag['pytorch_importable'] = True
diag['pytorch_version'] = torch.__version__
diag['pytorch_cuda'] = torch.cuda.is_available()
except Exception as e:
diag['pytorch_error'] = str(e)
diag['recommendations'].append(
"PyTorch cannot be loaded. Use alternative OCR backends."
)
if not diag['pytorch_importable'] and diag['is_windows_store']:
diag['recommendations'].extend([
"Install Python from https://python.org instead of Windows Store",
"Or use conda/miniconda for better compatibility",
"Recommended OCR backends: opencv_east, easyocr, tesseract"
])
return diag

View File

@ -0,0 +1,289 @@
"""
Lemontropia Suite - Tesseract OCR Backend
Traditional OCR using Tesseract - stable, no ML dependencies.
"""
import numpy as np
import logging
from typing import List, Optional, Tuple
from pathlib import Path
import shutil
from . import BaseOCRBackend, OCRTextRegion
logger = logging.getLogger(__name__)
class TesseractBackend(BaseOCRBackend):
"""
OCR backend using Tesseract OCR.
Pros:
- Very stable and mature
- No PyTorch/TensorFlow dependencies
- Fast on CPU
- Works with Windows Store Python
Cons:
- Lower accuracy on game UI text than neural OCR
- Requires Tesseract binary installation
Installation:
- Windows: choco install tesseract or download from UB Mannheim
- Linux: sudo apt-get install tesseract-ocr
- macOS: brew install tesseract
- Python: pip install pytesseract
"""
NAME = "tesseract"
SUPPORTS_GPU = False # Tesseract is CPU-only
def __init__(self, use_gpu: bool = True, lang: str = 'en', **kwargs):
super().__init__(use_gpu=use_gpu, lang=lang, **kwargs)
self.tesseract_cmd = kwargs.get('tesseract_cmd', None)
self._version = None
# Language mapping for Tesseract
self.lang_map = {
'en': 'eng',
'sv': 'swe', # Swedish
'de': 'deu',
'fr': 'fra',
'es': 'spa',
'latin': 'eng+deu+fra+spa', # Multi-language
}
# Tesseract configuration
self.config = kwargs.get('config', '--psm 6') # Assume single uniform block of text
def _initialize(self) -> bool:
"""Initialize Tesseract OCR."""
try:
import pytesseract
# Set custom path if provided
if self.tesseract_cmd:
pytesseract.pytesseract.tesseract_cmd = self.tesseract_cmd
# Try to get version to verify installation
try:
version = pytesseract.get_tesseract_version()
self._version = str(version)
logger.info(f"Tesseract version: {version}")
except Exception as e:
# Try to find tesseract in PATH
tesseract_path = shutil.which('tesseract')
if tesseract_path:
pytesseract.pytesseract.tesseract_cmd = tesseract_path
version = pytesseract.get_tesseract_version()
self._version = str(version)
logger.info(f"Tesseract found at: {tesseract_path}, version: {version}")
else:
raise e
self._available = True
logger.info("Tesseract OCR initialized successfully")
return True
except ImportError:
self._error_msg = "pytesseract not installed. Run: pip install pytesseract"
logger.warning(self._error_msg)
return False
except Exception as e:
self._error_msg = f"Tesseract not found: {e}. Please install Tesseract OCR."
logger.warning(self._error_msg)
logger.info("Download from: https://github.com/UB-Mannheim/tesseract/wiki")
return False
def extract_text(self, image: np.ndarray) -> List[OCRTextRegion]:
"""
Extract text from image using Tesseract.
Uses a two-step approach:
1. Detect text regions using OpenCV contours
2. Run Tesseract on each region
Args:
image: Input image (BGR format from OpenCV)
Returns:
List of detected text regions with recognized text
"""
if not self._available:
logger.error("Tesseract backend not initialized")
return []
try:
import pytesseract
import cv2
# Preprocess image
gray = self._to_grayscale(image)
processed = self._preprocess_for_tesseract(gray)
# Get data including bounding boxes
tesseract_lang = self.lang_map.get(self.lang, 'eng')
data = pytesseract.image_to_data(
processed,
lang=tesseract_lang,
config=self.config,
output_type=pytesseract.Output.DICT
)
regions = []
n_boxes = len(data['text'])
for i in range(n_boxes):
text = data['text'][i].strip()
conf = int(data['conf'][i])
# Filter low confidence and empty text
if conf > 30 and text:
x = data['left'][i]
y = data['top'][i]
w = data['width'][i]
h = data['height'][i]
regions.append(OCRTextRegion(
text=text,
confidence=conf / 100.0, # Normalize to 0-1
bbox=(x, y, w, h),
language=self.lang
))
# Merge overlapping regions that are likely the same text
regions = self._merge_nearby_regions(regions)
logger.debug(f"Tesseract detected {len(regions)} text regions")
return regions
except Exception as e:
logger.error(f"Tesseract extraction failed: {e}")
return []
def _preprocess_for_tesseract(self, gray: np.ndarray) -> np.ndarray:
"""Preprocess image specifically for Tesseract."""
import cv2
# Resize small images (Tesseract works better with larger text)
h, w = gray.shape[:2]
min_height = 100
if h < min_height:
scale = min_height / h
gray = cv2.resize(gray, None, fx=scale, fy=scale, interpolation=cv2.INTER_CUBIC)
# Apply adaptive thresholding
processed = cv2.adaptiveThreshold(
gray, 255,
cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY,
11, 2
)
# Denoise
processed = cv2.fastNlMeansDenoising(processed, None, 10, 7, 21)
return processed
def _merge_nearby_regions(self, regions: List[OCRTextRegion],
max_distance: int = 10) -> List[OCRTextRegion]:
"""Merge text regions that are close to each other."""
if not regions:
return []
# Sort by y position
sorted_regions = sorted(regions, key=lambda r: (r.bbox[1], r.bbox[0]))
merged = []
current = sorted_regions[0]
for next_region in sorted_regions[1:]:
# Check if regions are close enough to merge
cx, cy, cw, ch = current.bbox
nx, ny, nw, nh = next_region.bbox
# Calculate distance
distance = abs(ny - cy)
x_overlap = not (cx + cw < nx or nx + nw < cx)
if distance < max_distance and x_overlap:
# Merge regions
min_x = min(cx, nx)
min_y = min(cy, ny)
max_x = max(cx + cw, nx + nw)
max_y = max(cy + ch, ny + nh)
# Combine text
combined_text = current.text + " " + next_region.text
avg_conf = (current.confidence + next_region.confidence) / 2
current = OCRTextRegion(
text=combined_text.strip(),
confidence=avg_conf,
bbox=(min_x, min_y, max_x - min_x, max_y - min_y),
language=self.lang
)
else:
merged.append(current)
current = next_region
merged.append(current)
return merged
def extract_text_simple(self, image: np.ndarray) -> str:
"""
Simple text extraction without region detection.
Returns:
All text found in image as single string
"""
if not self._available:
return ""
try:
import pytesseract
import cv2
# Convert to RGB if needed
if len(image.shape) == 3:
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
tesseract_lang = self.lang_map.get(self.lang, 'eng')
text = pytesseract.image_to_string(
image,
lang=tesseract_lang,
config=self.config
)
return text.strip()
except Exception as e:
logger.error(f"Tesseract simple extraction failed: {e}")
return ""
@staticmethod
def find_tesseract() -> Optional[str]:
"""Find Tesseract installation path."""
path = shutil.which('tesseract')
if path:
return path
# Common Windows paths
common_paths = [
r"C:\Program Files\Tesseract-OCR\tesseract.exe",
r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe",
r"C:\Users\%USERNAME%\AppData\Local\Tesseract-OCR\tesseract.exe",
r"C:\Tesseract-OCR\tesseract.exe",
]
import os
for p in common_paths:
expanded = os.path.expandvars(p)
if Path(expanded).exists():
return expanded
return None

35
requirements-ocr.txt Normal file
View File

@ -0,0 +1,35 @@
# Lemontropia Suite - OCR Dependencies
# Install based on your needs and system capabilities
# ========== REQUIRED ==========
# These are always required for OCR functionality
opencv-python>=4.8.0 # Computer vision and OpenCV EAST text detection
numpy>=1.24.0 # Numerical operations
pillow>=10.0.0 # Image processing
# ========== RECOMMENDED (Choose One) ==========
## Option 1: EasyOCR (Recommended for most users)
## Good accuracy, lighter than PaddleOCR, supports GPU
## Note: Requires PyTorch - may not work with Windows Store Python
# easyocr>=1.7.0
## Option 2: Tesseract OCR (Most Stable)
## Traditional OCR, no ML dependencies, very stable
## Requires system Tesseract installation
# pytesseract>=0.3.10
## Option 3: PaddleOCR (Best Accuracy)
## Highest accuracy but heavy dependencies
## Note: Requires PaddlePaddle - may not work with Windows Store Python
# paddleocr>=2.7.0
# paddlepaddle>=2.5.0 # or paddlepaddle-gpu for CUDA
# ========== OPTIONAL GPU SUPPORT ==========
# Only if you have a compatible NVIDIA GPU
# torch>=2.0.0 # PyTorch with CUDA support
# torchvision>=0.15.0
# ========== DEVELOPMENT ==========
pytest>=7.4.0 # Testing
pytest-cov>=4.1.0 # Coverage

328
test_ocr_system.py Normal file
View File

@ -0,0 +1,328 @@
"""
Lemontropia Suite - OCR Backend Test Script
Tests all OCR backends and reports on availability.
Run this to verify the OCR system works without PyTorch DLL errors.
"""
import sys
import logging
from pathlib import Path
# Setup logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
def test_hardware_detection():
"""Test hardware detection."""
print("\n" + "=" * 60)
print("HARDWARE DETECTION TEST")
print("=" * 60)
try:
from modules.hardware_detection import (
HardwareDetector,
print_hardware_summary,
recommend_ocr_backend
)
# Print summary
print_hardware_summary()
# Get detailed info
info = HardwareDetector.detect_all()
# Check for Windows Store Python
if info.is_windows_store_python:
print("\n⚠️ WARNING: Windows Store Python detected!")
print(" This may cause DLL compatibility issues with PyTorch.")
# Check for PyTorch DLL errors
if info.pytorch_dll_error:
print("\n❌ PyTorch DLL Error detected!")
print(f" Error: {info.pytorch_error}")
print("\n This is expected with Windows Store Python.")
print(" The system will automatically use alternative OCR backends.")
elif not info.pytorch_available:
print("\n⚠️ PyTorch not installed.")
else:
print(f"\n✅ PyTorch {info.pytorch_version} available")
# Recommendation
recommended = recommend_ocr_backend()
print(f"\n📋 Recommended OCR backend: {recommended}")
return True
except Exception as e:
print(f"❌ Hardware detection failed: {e}")
import traceback
traceback.print_exc()
return False
def test_ocr_backends():
"""Test all OCR backends."""
print("\n" + "=" * 60)
print("OCR BACKEND TESTS")
print("=" * 60)
try:
from modules.ocr_backends import OCRBackendFactory
# Check all backends
backends = OCRBackendFactory.check_all_backends(use_gpu=True)
available_count = 0
for info in backends:
status = "✅ Available" if info.available else "❌ Not Available"
gpu_status = "🚀 GPU" if info.gpu_accelerated else "💻 CPU"
print(f"\n{info.name.upper()}:")
print(f" Status: {status}")
print(f" GPU: {gpu_status}")
if info.version:
print(f" Version: {info.version}")
if info.error_message:
print(f" Error: {info.error_message}")
if info.available:
available_count += 1
print(f"\n📊 Summary: {available_count}/{len(backends)} backends available")
return available_count > 0
except Exception as e:
print(f"❌ OCR backend test failed: {e}")
import traceback
traceback.print_exc()
return False
def test_opencv_east():
"""Test OpenCV EAST backend specifically (should always work)."""
print("\n" + "=" * 60)
print("OPENCV EAST BACKEND TEST")
print("=" * 60)
try:
import numpy as np
from modules.ocr_backends import OCRBackendFactory
# Create test image with text-like regions
print("\nCreating test image...")
test_image = np.ones((400, 600, 3), dtype=np.uint8) * 255
# Draw some rectangles that look like text regions
import cv2
cv2.rectangle(test_image, (50, 50), (200, 80), (0, 0, 0), -1)
cv2.rectangle(test_image, (50, 100), (250, 130), (0, 0, 0), -1)
cv2.rectangle(test_image, (300, 50), (500, 90), (0, 0, 0), -1)
# Create backend
print("Creating OpenCV EAST backend...")
backend = OCRBackendFactory.create_backend('opencv_east', use_gpu=False)
if backend is None:
print("❌ Failed to create OpenCV EAST backend")
return False
print(f"✅ Backend created: {backend.get_info().name}")
print(f" Available: {backend.is_available()}")
print(f" GPU: {backend.get_info().gpu_accelerated}")
# Test detection
print("\nRunning text detection...")
regions = backend.extract_text(test_image)
print(f"✅ Detection complete: {len(regions)} regions found")
for i, region in enumerate(regions[:5]): # Show first 5
x, y, w, h = region.bbox
print(f" Region {i+1}: bbox=({x},{y},{w},{h}), conf={region.confidence:.2f}")
return True
except Exception as e:
print(f"❌ OpenCV EAST test failed: {e}")
import traceback
traceback.print_exc()
return False
def test_unified_ocr():
"""Test unified OCR processor."""
print("\n" + "=" * 60)
print("UNIFIED OCR PROCESSOR TEST")
print("=" * 60)
try:
import numpy as np
import cv2
from modules.game_vision_ai import UnifiedOCRProcessor
# Create processor (auto-selects best backend)
print("\nInitializing Unified OCR Processor...")
processor = UnifiedOCRProcessor(use_gpu=True, auto_select=True)
backend_name = processor.get_current_backend()
print(f"✅ Processor initialized with backend: {backend_name}")
# Get backend info
info = processor.get_backend_info()
print(f" Info: {info}")
# Create test image
print("\nCreating test image...")
test_image = np.ones((400, 600, 3), dtype=np.uint8) * 255
cv2.rectangle(test_image, (50, 50), (200, 80), (0, 0, 0), -1)
cv2.rectangle(test_image, (50, 100), (250, 130), (0, 0, 0), -1)
# Test extraction
print("\nRunning text extraction...")
regions = processor.extract_text(test_image)
print(f"✅ Extraction complete: {len(regions)} regions found")
# List all available backends
print("\n📋 All OCR backends:")
for backend_info in processor.get_available_backends():
status = "" if backend_info.available else ""
print(f" {status} {backend_info.name}")
return True
except Exception as e:
print(f"❌ Unified OCR test failed: {e}")
import traceback
traceback.print_exc()
return False
def test_game_vision_ai():
"""Test GameVisionAI class."""
print("\n" + "=" * 60)
print("GAME VISION AI TEST")
print("=" * 60)
try:
from modules.game_vision_ai import GameVisionAI
print("\nInitializing GameVisionAI...")
vision = GameVisionAI(use_gpu=True)
print(f"✅ GameVisionAI initialized")
print(f" OCR Backend: {vision.ocr.get_current_backend()}")
print(f" GPU Backend: {vision.backend.value}")
# Get diagnostic info
print("\nRunning diagnostics...")
diag = GameVisionAI.diagnose()
print(f" Hardware: {diag['hardware']['gpu']['backend']}")
print(f" Recommended OCR: {diag['recommendations']['ocr_backend']}")
# Test available backends
backends = vision.get_ocr_backends()
available = [b['name'] for b in backends if b['available']]
print(f" Available backends: {', '.join(available) if available else 'None'}")
return True
except Exception as e:
print(f"❌ GameVisionAI test failed: {e}")
import traceback
traceback.print_exc()
return False
def test_pytorch_dll_handling():
"""Test that PyTorch DLL errors are handled gracefully."""
print("\n" + "=" * 60)
print("PYTORCH DLL ERROR HANDLING TEST")
print("=" * 60)
try:
from modules.hardware_detection import HardwareDetector
info = HardwareDetector.detect_all()
if info.pytorch_dll_error:
print("\n⚠️ PyTorch DLL error detected (as expected with Windows Store Python)")
print("✅ System correctly detected the DLL error")
print("✅ System will use fallback OCR backends")
# Verify fallback recommendation
recommended = HardwareDetector.recommend_ocr_backend()
if recommended in ['opencv_east', 'tesseract']:
print(f"✅ Recommended safe backend: {recommended}")
return True
else:
print(f"⚠️ Unexpected recommendation: {recommended}")
return False
else:
print("\n✅ No PyTorch DLL error detected")
print(" PyTorch is working correctly!")
if info.pytorch_available:
print(f" Version: {info.pytorch_version}")
return True
except Exception as e:
print(f"❌ DLL handling test failed: {e}")
import traceback
traceback.print_exc()
return False
def main():
"""Run all tests."""
print("\n" + "=" * 60)
print("LEMONTROPIA SUITE - OCR SYSTEM TEST")
print("=" * 60)
print("\nPython:", sys.version)
print("Platform:", sys.platform)
results = {}
# Run tests
results['hardware'] = test_hardware_detection()
results['backends'] = test_ocr_backends()
results['opencv_east'] = test_opencv_east()
results['unified_ocr'] = test_unified_ocr()
results['game_vision'] = test_game_vision_ai()
results['dll_handling'] = test_pytorch_dll_handling()
# Summary
print("\n" + "=" * 60)
print("TEST SUMMARY")
print("=" * 60)
for name, passed in results.items():
status = "✅ PASS" if passed else "❌ FAIL"
print(f" {status}: {name}")
total = len(results)
passed = sum(results.values())
print(f"\n Total: {passed}/{total} tests passed")
if passed == total:
print("\n🎉 All tests passed! OCR system is working correctly.")
return 0
else:
print("\n⚠️ Some tests failed. Check the output above for details.")
return 1
if __name__ == "__main__":
sys.exit(main())

View File

@ -119,6 +119,7 @@ from ui.hud_overlay_clean import HUDOverlay
from ui.session_history import SessionHistoryDialog
from ui.gallery_dialog import GalleryDialog, ScreenshotCapture
from ui.settings_dialog import SettingsDialog
# ============================================================================
# Screenshot Hotkey Integration
@ -250,86 +251,6 @@ class TemplateStatsDialog(QDialog):
layout.addWidget(button_box)
class SettingsDialog(QDialog):
"""Dialog for application settings."""
def __init__(self, parent=None, current_player_name: str = ""):
super().__init__(parent)
self.setWindowTitle("Settings")
self.setMinimumWidth(450)
self.player_name = current_player_name
self.setup_ui()
def setup_ui(self):
layout = QVBoxLayout(self)
# Player Settings Group
player_group = QGroupBox("Player Settings")
player_layout = QFormLayout(player_group)
self.player_name_edit = QLineEdit()
self.player_name_edit.setText(self.player_name)
self.player_name_edit.setPlaceholderText("Your avatar name in Entropia Universe")
player_layout.addRow("Avatar Name:", self.player_name_edit)
help_label = QLabel("Set your avatar name to track your globals correctly.")
help_label.setStyleSheet("color: #888; font-size: 11px;")
player_layout.addRow(help_label)
layout.addWidget(player_group)
# Log Settings Group
log_group = QGroupBox("Log File Settings")
log_layout = QFormLayout(log_group)
self.log_path_edit = QLineEdit()
self.log_path_edit.setPlaceholderText("Path to chat.log")
log_layout.addRow("Log Path:", self.log_path_edit)
self.auto_detect_check = QCheckBox("Auto-detect log path on startup")
self.auto_detect_check.setChecked(True)
log_layout.addRow(self.auto_detect_check)
layout.addWidget(log_group)
# Default Activity Group
activity_group = QGroupBox("Default Activity")
activity_layout = QFormLayout(activity_group)
self.default_activity_combo = QComboBox()
for activity in ActivityType:
self.default_activity_combo.addItem(activity.display_name, activity)
activity_layout.addRow("Default:", self.default_activity_combo)
layout.addWidget(activity_group)
layout.addStretch()
button_box = QDialogButtonBox(
QDialogButtonBox.StandardButton.Ok | QDialogButtonBox.StandardButton.Cancel
)
button_box.accepted.connect(self.accept)
button_box.rejected.connect(self.reject)
layout.addWidget(button_box)
def get_player_name(self) -> str:
"""Get the configured player name."""
return self.player_name_edit.text().strip()
def get_log_path(self) -> str:
"""Get the configured log path."""
return self.log_path_edit.text().strip()
def get_auto_detect(self) -> bool:
"""Get auto-detect setting."""
return self.auto_detect_check.isChecked()
def get_default_activity(self) -> str:
"""Get default activity type."""
activity = self.default_activity_combo.currentData()
return activity.value if activity else "hunting"
# ============================================================================
# Main Window
# ============================================================================
@ -831,14 +752,6 @@ class MainWindow(QMainWindow):
vision_test_action.triggered.connect(self.on_vision_test)
vision_menu.addAction(vision_test_action)
tools_menu.addSeparator()
# Screenshot hotkey settings
screenshot_hotkeys_action = QAction("📸 Screenshot &Hotkeys", self)
screenshot_hotkeys_action.setShortcut("Ctrl+Shift+S")
screenshot_hotkeys_action.triggered.connect(self._show_screenshot_hotkey_settings)
tools_menu.addAction(screenshot_hotkeys_action)
# View menu
view_menu = menubar.addMenu("&View")
@ -1887,14 +1800,12 @@ class MainWindow(QMainWindow):
self.log_info("HUD", "HUD overlay hidden")
def on_settings(self):
"""Open settings dialog."""
dialog = SettingsDialog(self, self.player_name)
"""Open comprehensive settings dialog."""
dialog = SettingsDialog(self, self.db)
if dialog.exec() == QDialog.DialogCode.Accepted:
self.player_name = dialog.get_player_name()
self.log_path = dialog.get_log_path()
self.auto_detect_log = dialog.get_auto_detect()
self._save_settings()
self.log_info("Settings", f"Avatar name: {self.player_name}")
# Reload settings from QSettings
self._load_settings()
self.log_info("Settings", "Settings updated successfully")
def on_run_setup_wizard(self):
"""Run the setup wizard again."""

684
ui/settings_dialog.py Normal file
View File

@ -0,0 +1,684 @@
"""
Lemontropia Suite - Comprehensive Settings Dialog
Unified settings for Player, Screenshot Hotkeys, Computer Vision, and General preferences.
"""
import logging
from pathlib import Path
from typing import Optional, Dict, Any
from PyQt6.QtWidgets import (
QDialog, QVBoxLayout, QHBoxLayout, QFormLayout,
QLabel, QLineEdit, QPushButton, QComboBox,
QCheckBox, QGroupBox, QTabWidget, QDialogButtonBox,
QMessageBox, QFileDialog, QWidget, QGridLayout,
QSpinBox, QDoubleSpinBox, QFrame
)
from PyQt6.QtCore import Qt, QSettings
from PyQt6.QtGui import QKeySequence
logger = logging.getLogger(__name__)
class SettingsDialog(QDialog):
"""
Comprehensive settings dialog with tabbed interface.
Tabs:
- General: Player name, log path, activity defaults
- Screenshot Hotkeys: Configure F12 and other hotkeys
- Computer Vision: OCR backend selection, GPU settings
- Advanced: Performance, logging, database options
"""
def __init__(self, parent=None, db=None):
super().__init__(parent)
self.setWindowTitle("Lemontropia Suite - Settings")
self.setMinimumSize(600, 500)
self.resize(700, 550)
self.db = db
self._settings = QSettings("Lemontropia", "Suite")
# Load current values
self._load_current_values()
self._setup_ui()
self._apply_dark_theme()
def _load_current_values(self):
"""Load current settings values."""
# General
self._player_name = self._settings.value("player/name", "", type=str)
self._log_path = self._settings.value("log/path", "", type=str)
self._auto_detect_log = self._settings.value("log/auto_detect", True, type=bool)
self._default_activity = self._settings.value("activity/default", "hunting", type=str)
# Screenshot hotkeys
self._hotkey_full = self._settings.value("hotkey/screenshot_full", "F12", type=str)
self._hotkey_region = self._settings.value("hotkey/screenshot_region", "Shift+F12", type=str)
self._hotkey_loot = self._settings.value("hotkey/screenshot_loot", "Ctrl+F12", type=str)
self._hotkey_hud = self._settings.value("hotkey/screenshot_hud", "Alt+F12", type=str)
# Computer Vision
self._cv_backend = self._settings.value("cv/backend", "auto", type=str)
self._cv_use_gpu = self._settings.value("cv/use_gpu", True, type=bool)
self._cv_confidence = self._settings.value("cv/confidence", 0.5, type=float)
def _setup_ui(self):
"""Setup the dialog UI with tabs."""
layout = QVBoxLayout(self)
layout.setContentsMargins(15, 15, 15, 15)
layout.setSpacing(10)
# Title
title = QLabel("⚙️ Settings")
title.setStyleSheet("font-size: 18px; font-weight: bold; color: #4caf50;")
layout.addWidget(title)
# Tab widget
self.tabs = QTabWidget()
layout.addWidget(self.tabs)
# Create tabs
self.tabs.addTab(self._create_general_tab(), "📋 General")
self.tabs.addTab(self._create_hotkeys_tab(), "📸 Screenshot Hotkeys")
self.tabs.addTab(self._create_vision_tab(), "👁️ Computer Vision")
self.tabs.addTab(self._create_advanced_tab(), "🔧 Advanced")
# Button box
button_box = QDialogButtonBox(
QDialogButtonBox.StandardButton.Save |
QDialogButtonBox.StandardButton.Cancel |
QDialogButtonBox.StandardButton.Reset
)
button_box.accepted.connect(self._on_save)
button_box.rejected.connect(self.reject)
button_box.button(QDialogButtonBox.StandardButton.Reset).clicked.connect(self._on_reset)
layout.addWidget(button_box)
def _create_general_tab(self) -> QWidget:
"""Create General settings tab."""
tab = QWidget()
layout = QVBoxLayout(tab)
layout.setSpacing(15)
# Player Settings
player_group = QGroupBox("🎮 Player Settings")
player_form = QFormLayout(player_group)
self.player_name_edit = QLineEdit(self._player_name)
self.player_name_edit.setPlaceholderText("Your avatar name in Entropia Universe")
player_form.addRow("Avatar Name:", self.player_name_edit)
player_help = QLabel("This name is used to identify your globals and HoFs in the log.")
player_help.setStyleSheet("color: #888; font-size: 11px;")
player_help.setWordWrap(True)
player_form.addRow(player_help)
layout.addWidget(player_group)
# Log File Settings
log_group = QGroupBox("📄 Log File Settings")
log_layout = QVBoxLayout(log_group)
log_form = QFormLayout()
log_path_layout = QHBoxLayout()
self.log_path_edit = QLineEdit(self._log_path)
self.log_path_edit.setPlaceholderText(r"C:\Users\...\Documents\Entropia Universe\chat.log")
log_path_layout.addWidget(self.log_path_edit)
browse_btn = QPushButton("Browse...")
browse_btn.clicked.connect(self._browse_log_path)
log_path_layout.addWidget(browse_btn)
log_form.addRow("Chat Log Path:", log_path_layout)
self.auto_detect_check = QCheckBox("Auto-detect log path on startup")
self.auto_detect_check.setChecked(self._auto_detect_log)
log_form.addRow(self.auto_detect_check)
log_layout.addLayout(log_form)
# Quick paths
quick_paths_layout = QHBoxLayout()
quick_paths_layout.addWidget(QLabel("Quick select:"))
default_path_btn = QPushButton("Default Location")
default_path_btn.clicked.connect(self._set_default_log_path)
quick_paths_layout.addWidget(default_path_btn)
quick_paths_layout.addStretch()
log_layout.addLayout(quick_paths_layout)
layout.addWidget(log_group)
# Default Activity
activity_group = QGroupBox("🎯 Default Activity")
activity_form = QFormLayout(activity_group)
self.default_activity_combo = QComboBox()
activities = [
("hunting", "🎯 Hunting"),
("mining", "⛏️ Mining"),
("crafting", "⚒️ Crafting")
]
for value, display in activities:
self.default_activity_combo.addItem(display, value)
if value == self._default_activity:
self.default_activity_combo.setCurrentIndex(self.default_activity_combo.count() - 1)
activity_form.addRow("Default Activity:", self.default_activity_combo)
layout.addWidget(activity_group)
layout.addStretch()
return tab
def _create_hotkeys_tab(self) -> QWidget:
"""Create Screenshot Hotkeys tab."""
tab = QWidget()
layout = QVBoxLayout(tab)
layout.setSpacing(15)
# Info header
info = QLabel("📸 Configure screenshot hotkeys. Hotkeys work when the app is focused.")
info.setStyleSheet("color: #888; padding: 5px;")
info.setWordWrap(True)
layout.addWidget(info)
# Status
status_group = QGroupBox("Status")
status_layout = QVBoxLayout(status_group)
try:
import keyboard
self.hotkey_status = QLabel("✅ Global hotkeys available (keyboard library installed)")
self.hotkey_status.setStyleSheet("color: #4caf50;")
except ImportError:
self.hotkey_status = QLabel(" Qt shortcuts only (install 'keyboard' library for global hotkeys)\npip install keyboard")
self.hotkey_status.setStyleSheet("color: #ff9800;")
self.hotkey_status.setWordWrap(True)
status_layout.addWidget(self.hotkey_status)
layout.addWidget(status_group)
# Hotkey configuration
hotkey_group = QGroupBox("Hotkey Configuration")
hotkey_form = QFormLayout(hotkey_group)
# Full screen
full_layout = QHBoxLayout()
self.hotkey_full_edit = QLineEdit(self._hotkey_full)
full_layout.addWidget(self.hotkey_full_edit)
full_test = QPushButton("Test")
full_test.clicked.connect(lambda: self._test_hotkey("full"))
full_layout.addWidget(full_test)
hotkey_form.addRow("Full Screen:", full_layout)
# Region
region_layout = QHBoxLayout()
self.hotkey_region_edit = QLineEdit(self._hotkey_region)
region_layout.addWidget(self.hotkey_region_edit)
region_test = QPushButton("Test")
region_test.clicked.connect(lambda: self._test_hotkey("region"))
region_layout.addWidget(region_test)
hotkey_form.addRow("Center Region (800x600):", region_layout)
# Loot
loot_layout = QHBoxLayout()
self.hotkey_loot_edit = QLineEdit(self._hotkey_loot)
loot_layout.addWidget(self.hotkey_loot_edit)
loot_test = QPushButton("Test")
loot_test.clicked.connect(lambda: self._test_hotkey("loot"))
loot_layout.addWidget(loot_test)
hotkey_form.addRow("Loot Window:", loot_layout)
# HUD
hud_layout = QHBoxLayout()
self.hotkey_hud_edit = QLineEdit(self._hotkey_hud)
hud_layout.addWidget(self.hotkey_hud_edit)
hud_test = QPushButton("Test")
hud_test.clicked.connect(lambda: self._test_hotkey("hud"))
hud_layout.addWidget(hud_test)
hotkey_form.addRow("HUD Area:", hud_layout)
layout.addWidget(hotkey_group)
# Help text
help_group = QGroupBox("Help")
help_layout = QVBoxLayout(help_group)
help_text = QLabel(
"Format examples:\n"
" F12, Ctrl+F12, Shift+F12, Alt+F12\n"
" Ctrl+Shift+S, Alt+Tab (don't use system shortcuts)\n\n"
"Note: Global hotkeys require the 'keyboard' library and may need admin privileges.\n"
"Qt shortcuts (app focused only) work without additional libraries."
)
help_text.setStyleSheet("color: #888; font-family: monospace;")
help_layout.addWidget(help_text)
layout.addWidget(help_group)
layout.addStretch()
return tab
def _create_vision_tab(self) -> QWidget:
"""Create Computer Vision tab."""
tab = QWidget()
layout = QVBoxLayout(tab)
layout.setSpacing(15)
# Info header
info = QLabel("👁️ Computer Vision settings for automatic loot detection and OCR.")
info.setStyleSheet("color: #888; padding: 5px;")
info.setWordWrap(True)
layout.addWidget(info)
# OCR Backend Selection
backend_group = QGroupBox("OCR Backend")
backend_layout = QFormLayout(backend_group)
self.cv_backend_combo = QComboBox()
backends = [
("auto", "🤖 Auto-detect (recommended)"),
("opencv", "⚡ OpenCV EAST (fastest, no extra dependencies)"),
("easyocr", "📖 EasyOCR (good accuracy, lighter than Paddle)"),
("tesseract", "🔍 Tesseract (traditional, stable)"),
("paddle", "🧠 PaddleOCR (best accuracy, requires PyTorch)")
]
for value, display in backends:
self.cv_backend_combo.addItem(display, value)
if value == self._cv_backend:
self.cv_backend_combo.setCurrentIndex(self.cv_backend_combo.count() - 1)
self.cv_backend_combo.currentIndexChanged.connect(self._on_backend_changed)
backend_layout.addRow("OCR Backend:", self.cv_backend_combo)
# Backend status
self.backend_status = QLabel()
self._update_backend_status()
backend_layout.addRow(self.backend_status)
layout.addWidget(backend_group)
# GPU Settings
gpu_group = QGroupBox("GPU Acceleration")
gpu_layout = QFormLayout(gpu_group)
self.cv_use_gpu_check = QCheckBox("Use GPU acceleration if available")
self.cv_use_gpu_check.setChecked(self._cv_use_gpu)
self.cv_use_gpu_check.setToolTip("Faster processing but requires compatible GPU")
gpu_layout.addRow(self.cv_use_gpu_check)
# GPU Info
self.gpu_info = QLabel()
self._update_gpu_info()
gpu_layout.addRow(self.gpu_info)
layout.addWidget(gpu_group)
# Detection Settings
detection_group = QGroupBox("Detection Settings")
detection_layout = QFormLayout(detection_group)
self.cv_confidence_spin = QDoubleSpinBox()
self.cv_confidence_spin.setRange(0.1, 1.0)
self.cv_confidence_spin.setSingleStep(0.05)
self.cv_confidence_spin.setValue(self._cv_confidence)
self.cv_confidence_spin.setDecimals(2)
detection_layout.addRow("Confidence Threshold:", self.cv_confidence_spin)
confidence_help = QLabel("Lower = more sensitive (may detect non-text)\nHigher = stricter (may miss some text)")
confidence_help.setStyleSheet("color: #888; font-size: 11px;")
detection_layout.addRow(confidence_help)
layout.addWidget(detection_group)
# Test buttons
test_group = QGroupBox("Test Computer Vision")
test_layout = QHBoxLayout(test_group)
test_ocr_btn = QPushButton("📝 Test OCR")
test_ocr_btn.clicked.connect(self._test_ocr)
test_layout.addWidget(test_ocr_btn)
test_icon_btn = QPushButton("🎯 Test Icon Detection")
test_icon_btn.clicked.connect(self._test_icon_detection)
test_layout.addWidget(test_icon_btn)
calibrate_btn = QPushButton("📐 Calibrate")
calibrate_btn.clicked.connect(self._calibrate_vision)
test_layout.addWidget(calibrate_btn)
layout.addWidget(test_group)
layout.addStretch()
return tab
def _create_advanced_tab(self) -> QWidget:
"""Create Advanced settings tab."""
tab = QWidget()
layout = QVBoxLayout(tab)
layout.setSpacing(15)
# Performance
perf_group = QGroupBox("Performance")
perf_layout = QFormLayout(perf_group)
self.fps_limit_spin = QSpinBox()
self.fps_limit_spin.setRange(1, 144)
self.fps_limit_spin.setValue(60)
self.fps_limit_spin.setSuffix(" FPS")
perf_layout.addRow("Target FPS:", self.fps_limit_spin)
layout.addWidget(perf_group)
# Database
db_group = QGroupBox("Database")
db_layout = QVBoxLayout(db_group)
db_info = QLabel(f"Database location:\n{self.db.db_path if self.db else 'Not connected'}")
db_info.setStyleSheet("color: #888; font-family: monospace; font-size: 11px;")
db_info.setWordWrap(True)
db_layout.addWidget(db_info)
db_buttons = QHBoxLayout()
backup_btn = QPushButton("💾 Backup Database")
backup_btn.clicked.connect(self._backup_database)
db_buttons.addWidget(backup_btn)
export_btn = QPushButton("📤 Export Data")
export_btn.clicked.connect(self._export_data)
db_buttons.addWidget(export_btn)
db_buttons.addStretch()
db_layout.addLayout(db_buttons)
layout.addWidget(db_group)
# Logging
log_group = QGroupBox("Logging")
log_layout = QFormLayout(log_group)
self.log_level_combo = QComboBox()
log_levels = ["DEBUG", "INFO", "WARNING", "ERROR"]
for level in log_levels:
self.log_level_combo.addItem(level)
self.log_level_combo.setCurrentText("INFO")
log_layout.addRow("Log Level:", self.log_level_combo)
layout.addWidget(log_group)
layout.addStretch()
return tab
def _on_backend_changed(self):
"""Handle OCR backend selection change."""
self._update_backend_status()
def _update_backend_status(self):
"""Update backend status label."""
backend = self.cv_backend_combo.currentData()
status_text = ""
if backend == "auto":
status_text = "Will try: OpenCV → EasyOCR → Tesseract → PaddleOCR"
elif backend == "opencv":
status_text = "✅ Always available - uses OpenCV DNN (EAST model)"
elif backend == "easyocr":
try:
import easyocr
status_text = "✅ EasyOCR installed and ready"
except ImportError:
status_text = "❌ EasyOCR not installed: pip install easyocr"
elif backend == "tesseract":
try:
import pytesseract
status_text = "✅ Tesseract Python module installed"
except ImportError:
status_text = "❌ pytesseract not installed: pip install pytesseract"
elif backend == "paddle":
try:
from paddleocr import PaddleOCR
status_text = "✅ PaddleOCR installed"
except ImportError:
status_text = "❌ PaddleOCR not installed: pip install paddlepaddle paddleocr"
self.backend_status.setText(status_text)
self.backend_status.setStyleSheet(
"color: #4caf50;" if status_text.startswith("") else
"color: #f44336;" if status_text.startswith("") else "color: #888;"
)
def _update_gpu_info(self):
"""Update GPU info label."""
info_parts = []
# Check CUDA
try:
import cv2
if cv2.cuda.getCudaEnabledDeviceCount() > 0:
info_parts.append("✅ OpenCV CUDA")
else:
info_parts.append("❌ OpenCV CUDA")
except:
info_parts.append("❌ OpenCV CUDA")
# Check PyTorch CUDA
try:
import torch
if torch.cuda.is_available():
info_parts.append(f"✅ PyTorch CUDA ({torch.cuda.get_device_name(0)})")
else:
info_parts.append("❌ PyTorch CUDA")
except:
info_parts.append("❌ PyTorch CUDA")
self.gpu_info.setText(" | ".join(info_parts))
def _browse_log_path(self):
"""Browse for log file."""
path, _ = QFileDialog.getOpenFileName(
self,
"Select Entropia Universe chat.log",
"",
"Log Files (*.log);;All Files (*)"
)
if path:
self.log_path_edit.setText(path)
def _set_default_log_path(self):
"""Set default log path."""
default_path = Path.home() / "Documents" / "Entropia Universe" / "chat.log"
self.log_path_edit.setText(str(default_path))
def _test_hotkey(self, hotkey_type: str):
"""Test a screenshot hotkey."""
try:
from modules.auto_screenshot import AutoScreenshot
screenshots_dir = Path(__file__).parent.parent / "data" / "screenshots"
ss = AutoScreenshot(screenshots_dir)
filename = f"test_{hotkey_type}_{datetime.now():%Y%m%d_%H%M%S}.png"
if hotkey_type == "full":
filepath = ss.capture_full_screen(filename)
elif hotkey_type == "region":
import mss
with mss.mss() as sct:
monitor = sct.monitors[1]
x = (monitor['width'] - 800) // 2
y = (monitor['height'] - 600) // 2
filepath = ss.capture_region(x, y, 800, 600, filename)
elif hotkey_type == "loot":
import mss
with mss.mss() as sct:
monitor = sct.monitors[1]
x = monitor['width'] - 350
y = monitor['height'] // 2 - 200
filepath = ss.capture_region(x, y, 300, 400, filename)
elif hotkey_type == "hud":
import mss
with mss.mss() as sct:
monitor = sct.monitors[1]
w, h = 600, 150
x = (monitor['width'] - w) // 2
y = monitor['height'] - h - 50
filepath = ss.capture_region(x, y, w, h, filename)
else:
filepath = None
if filepath:
QMessageBox.information(self, "Screenshot Taken", f"Saved to:\n{filepath}")
else:
QMessageBox.warning(self, "Error", "Failed to capture screenshot")
except Exception as e:
QMessageBox.critical(self, "Error", f"Screenshot failed:\n{e}")
def _test_ocr(self):
"""Test OCR functionality."""
QMessageBox.information(self, "OCR Test", "OCR test will be implemented in the Vision Test dialog.")
# TODO: Open vision test dialog
def _test_icon_detection(self):
"""Test icon detection."""
QMessageBox.information(self, "Icon Detection", "Icon detection test will be implemented in the Vision Test dialog.")
# TODO: Open vision test dialog
def _calibrate_vision(self):
"""Open vision calibration."""
QMessageBox.information(self, "Calibration", "Vision calibration will be implemented in the Calibration dialog.")
# TODO: Open calibration dialog
def _backup_database(self):
"""Backup the database."""
if not self.db:
QMessageBox.warning(self, "Error", "Database not connected")
return
try:
import shutil
from datetime import datetime
backup_path = self.db.db_path.parent / f"lemontropia_backup_{datetime.now():%Y%m%d_%H%M%S}.db"
shutil.copy2(self.db.db_path, backup_path)
QMessageBox.information(self, "Backup Complete", f"Database backed up to:\n{backup_path}")
except Exception as e:
QMessageBox.critical(self, "Backup Failed", str(e))
def _export_data(self):
"""Export data to CSV/JSON."""
QMessageBox.information(self, "Export", "Export functionality coming soon!")
def _on_save(self):
"""Save all settings."""
try:
# General
self._settings.setValue("player/name", self.player_name_edit.text().strip())
self._settings.setValue("log/path", self.log_path_edit.text().strip())
self._settings.setValue("log/auto_detect", self.auto_detect_check.isChecked())
self._settings.setValue("activity/default", self.default_activity_combo.currentData())
# Hotkeys
self._settings.setValue("hotkey/screenshot_full", self.hotkey_full_edit.text().strip())
self._settings.setValue("hotkey/screenshot_region", self.hotkey_region_edit.text().strip())
self._settings.setValue("hotkey/screenshot_loot", self.hotkey_loot_edit.text().strip())
self._settings.setValue("hotkey/screenshot_hud", self.hotkey_hud_edit.text().strip())
# Computer Vision
self._settings.setValue("cv/backend", self.cv_backend_combo.currentData())
self._settings.setValue("cv/use_gpu", self.cv_use_gpu_check.isChecked())
self._settings.setValue("cv/confidence", self.cv_confidence_spin.value())
# Advanced
self._settings.setValue("performance/fps_limit", self.fps_limit_spin.value())
self._settings.setValue("logging/level", self.log_level_combo.currentText())
self._settings.sync()
QMessageBox.information(self, "Settings Saved", "All settings have been saved successfully!")
self.accept()
except Exception as e:
QMessageBox.critical(self, "Error", f"Failed to save settings:\n{e}")
def _on_reset(self):
"""Reset settings to defaults."""
reply = QMessageBox.question(
self,
"Reset Settings",
"Are you sure you want to reset all settings to defaults?",
QMessageBox.StandardButton.Yes | QMessageBox.StandardButton.No
)
if reply == QMessageBox.StandardButton.Yes:
# Clear all settings
self._settings.clear()
self._settings.sync()
QMessageBox.information(self, "Settings Reset", "Settings have been reset. Please restart the application.")
self.reject()
def _apply_dark_theme(self):
"""Apply dark theme to the dialog."""
self.setStyleSheet("""
QDialog {
background-color: #1e1e1e;
color: #e0e0e0;
}
QTabWidget::pane {
background-color: #252525;
border: 1px solid #444;
border-radius: 4px;
}
QTabBar::tab {
background-color: #2d2d2d;
padding: 8px 16px;
border: 1px solid #444;
border-bottom: none;
border-top-left-radius: 4px;
border-top-right-radius: 4px;
}
QTabBar::tab:selected {
background-color: #0d47a1;
}
QGroupBox {
font-weight: bold;
border: 1px solid #444;
border-radius: 6px;
margin-top: 10px;
padding-top: 10px;
}
QGroupBox::title {
subcontrol-origin: margin;
left: 10px;
padding: 0 5px;
}
QLineEdit, QComboBox, QSpinBox, QDoubleSpinBox {
background-color: #252525;
border: 1px solid #444;
border-radius: 4px;
padding: 6px;
color: #e0e0e0;
}
QPushButton {
background-color: #0d47a1;
border: 1px solid #1565c0;
border-radius: 4px;
padding: 6px 12px;
color: white;
}
QPushButton:hover {
background-color: #1565c0;
}
QLabel {
color: #e0e0e0;
}
""")