Lemontropia-Suite/docs/VISION_PLAN.md

364 lines
9.1 KiB
Markdown

# Vision System Plan - Game UI Reading
## Goal
Read equipped weapon/tool names, mob names, and other UI text from Entropia Universe game window with high accuracy.
## Two Approaches
### Approach 1: Template Matching + OCR (Recommended)
**Best balance of accuracy and performance.**
**How it works:**
1. Take screenshot of game window
2. Use template matching to find specific UI regions (weapon slot, target window, etc.)
3. Crop to those regions
4. Run OCR on cropped regions only
5. Much more accurate than full-screen OCR
**Pros:**
- Fast (only OCR small regions)
- Accurate (focused on known text areas)
- Works with game updates (just update templates)
- Low CPU usage
**Cons:**
- Requires creating templates for each UI layout
- UI position changes require template updates
---
### Approach 2: Pure Computer Vision (Advanced)
**Use object detection to find and read text regions automatically.**
**How it works:**
1. Train YOLO/SSD model to detect UI elements (weapon icons, text boxes, health bars)
2. Run inference on game screenshot
3. Crop detected regions
4. OCR or classifier on each region
**Pros:**
- Adapts to UI changes automatically
- Can detect new elements without templates
- Very robust
**Cons:**
- Requires training data (thousands of labeled screenshots)
- Higher CPU/GPU usage
- Complex to implement
- Overkill for this use case
---
## Recommended: Template Matching + OCR Pipeline
### Architecture
```
Game Window
Screenshot (mss or PIL)
Template Matching (OpenCV)
Crop Regions of Interest
OCR (PaddleOCR or EasyOCR)
Parse Results
Update Loadout/HUD
```
### Implementation Plan
#### Phase 1: Screen Capture
```python
import mss
import numpy as np
def capture_game_window():
"""Capture Entropia Universe window."""
with mss.mss() as sct:
# Find game window by title
# Windows: use win32gui
# Return: numpy array (BGR for OpenCV)
pass
```
#### Phase 2: Template Matching
```python
import cv2
class UIFinder:
def __init__(self, template_dir):
self.templates = self._load_templates(template_dir)
def find_weapon_slot(self, screenshot):
"""Find weapon slot in screenshot."""
template = self.templates['weapon_slot']
result = cv2.matchTemplate(screenshot, template, cv2.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
if max_val > 0.8: # Threshold
x, y = max_loc
h, w = template.shape[:2]
return (x, y, w, h) # Region
return None
def find_target_window(self, screenshot):
"""Find mob target window."""
# Similar to above
pass
```
#### Phase 3: Region OCR
```python
from paddleocr import PaddleOCR
class RegionOCR:
def __init__(self):
# Use English only for speed
self.ocr = PaddleOCR(
lang='en',
use_gpu=False, # CPU only
show_log=False,
det_model_dir=None, # Use default
rec_model_dir=None,
)
def read_weapon_name(self, screenshot, region):
"""OCR weapon name from specific region."""
x, y, w, h = region
crop = screenshot[y:y+h, x:x+w]
# Preprocess for better OCR
gray = cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY)
result = self.ocr.ocr(thresh, cls=False)
if result and result[0]:
text = result[0][0][1][0] # Extract text
confidence = result[0][0][1][1]
return text, confidence
return None, 0.0
```
#### Phase 4: Integration
```python
class GameVision:
"""Main vision system."""
def __init__(self):
self.finder = UIFinder('templates/')
self.ocr = RegionOCR()
def get_equipped_weapon(self):
"""Read currently equipped weapon name."""
screenshot = capture_game_window()
region = self.finder.find_weapon_slot(screenshot)
if region:
name, conf = self.ocr.read_weapon_name(screenshot, region)
if conf > 0.8:
return name
return None
def get_target_mob(self):
"""Read current target mob name."""
screenshot = capture_game_window()
region = self.finder.find_target_window(screenshot)
if region:
name, conf = self.ocr.read_text(screenshot, region)
if conf > 0.8:
return name
return None
```
---
## Template Creation Process
### Step 1: Capture Reference Screenshots
```python
def capture_templates():
"""Interactive tool to capture UI templates."""
print("1. Open Entropia Universe")
print("2. Equip your weapon")
print("3. Press SPACE when ready to capture weapon slot template")
input()
screenshot = capture_game_window()
# User drags to select region
region = select_region_interactive(screenshot)
# Save template
x, y, w, h = region
template = screenshot[y:y+h, x:x+w]
cv2.imwrite('templates/weapon_slot.png', template)
```
### Step 2: Create Template Library
```
templates/
├── weapon_slot.png # Weapon/tool equipped area
├── weapon_name_region.png # Just the text part
├── target_window.png # Target mob window
├── target_name_region.png # Mob name text
├── health_bar.png # Player health
├── tool_slot.png # Mining tool/finder
└── README.md # Template info
```
---
## OCR Engine Comparison
| Engine | Speed | Accuracy | Setup | Best For |
|--------|-------|----------|-------|----------|
| **PaddleOCR** | Medium | High | Easy | General text, multi-language |
| **EasyOCR** | Medium | High | Easy | English only, simple text |
| **Tesseract** | Slow | Medium | Medium | Legacy support |
| **PaddleOCR + GPU** | Fast | High | Complex | Real-time if GPU available |
**Recommendation: PaddleOCR** (already used in project)
---
## Performance Optimizations
### 1. Region of Interest Only
```python
# BAD: OCR entire screen
result = ocr.ocr(full_screenshot)
# GOOD: OCR only weapon region
result = ocr.ocr(weapon_region)
```
### 2. Frame Skipping
```python
class VisionPoller:
def __init__(self):
self.last_check = 0
self.check_interval = 2.0 # seconds
def poll(self):
if time.time() - self.last_check < self.check_interval:
return # Skip this frame
# Do OCR
self.last_check = time.time()
```
### 3. Async Processing
```python
import asyncio
async def vision_loop():
while True:
screenshot = await capture_async()
weapon = await ocr_weapon_async(screenshot)
if weapon:
update_loadout(weapon)
await asyncio.sleep(2)
```
### 4. Confidence Thresholding
```python
name, confidence = ocr.read_weapon(screenshot)
if confidence < 0.7:
# Too uncertain, skip this reading
return None
if confidence < 0.9:
# Flag for manual verification
log.warning(f"Low confidence reading: {name} ({confidence:.2f})")
return name
```
---
## Implementation Roadmap
### Week 1: Foundation
- [ ] Create screen capture module (Windows window handle)
- [ ] Install PaddleOCR (if not already)
- [ ] Test basic OCR on game screenshots
- [ ] Create template capture tool
### Week 2: Templates
- [ ] Capture weapon slot template
- [ ] Capture target window template
- [ ] Test template matching accuracy
- [ ] Handle different resolutions/UI scales
### Week 3: Integration
- [ ] Create GameVision class
- [ ] Integrate with Loadout Manager
- [ ] Auto-update equipped weapon detection
- [ ] Mob name logging for hunts
### Week 4: Polish
- [ ] Performance optimization
- [ ] Confidence thresholds
- [ ] Error handling
- [ ] Documentation
---
## Expected Accuracy
| UI Element | Expected Accuracy | Notes |
|------------|-------------------|-------|
| Weapon Name | 85-95% | Clear text, fixed position |
| Tool Name | 85-95% | Similar to weapon |
| Mob Name | 70-85% | Can be complex names, smaller text |
| Health Values | 90-98% | Numbers are easier |
| Damage Numbers | 80-90% | Floating text, harder to catch |
**Why not 100%?**
- Font rendering variations
- Transparency/effects
- Screen scaling
- Anti-aliasing
---
## Alternative: UI Memory Reading (Advanced)
**WARNING: May violate TOS - Research first!**
Some games expose UI data in memory. This would be:
- Instant (no screenshot/OCR)
- 100% accurate
- Much lower CPU usage
**Research needed:**
- Check Entropia Universe EULA
- Look for public memory maps
- Use tools like Cheat Engine (offline only!)
**Not recommended** unless explicitly allowed.
---
## Summary
**Best approach for Lemontropia:**
1. **Template Matching + OCR** - Good accuracy, reasonable performance
2. Capture templates for weapon slot, target window
3. OCR only those regions
4. Update every 2-5 seconds (not every frame)
5. Use confidence thresholds to filter bad reads
**Next Steps:**
1. I can create the template capture tool
2. Create the vision module structure
3. Integrate with existing loadout system
Want me to implement any part of this?