9.1 KiB
9.1 KiB
Vision System Plan - Game UI Reading
Goal
Read equipped weapon/tool names, mob names, and other UI text from Entropia Universe game window with high accuracy.
Two Approaches
Approach 1: Template Matching + OCR (Recommended)
Best balance of accuracy and performance.
How it works:
- Take screenshot of game window
- Use template matching to find specific UI regions (weapon slot, target window, etc.)
- Crop to those regions
- Run OCR on cropped regions only
- Much more accurate than full-screen OCR
Pros:
- Fast (only OCR small regions)
- Accurate (focused on known text areas)
- Works with game updates (just update templates)
- Low CPU usage
Cons:
- Requires creating templates for each UI layout
- UI position changes require template updates
Approach 2: Pure Computer Vision (Advanced)
Use object detection to find and read text regions automatically.
How it works:
- Train YOLO/SSD model to detect UI elements (weapon icons, text boxes, health bars)
- Run inference on game screenshot
- Crop detected regions
- OCR or classifier on each region
Pros:
- Adapts to UI changes automatically
- Can detect new elements without templates
- Very robust
Cons:
- Requires training data (thousands of labeled screenshots)
- Higher CPU/GPU usage
- Complex to implement
- Overkill for this use case
Recommended: Template Matching + OCR Pipeline
Architecture
Game Window
↓
Screenshot (mss or PIL)
↓
Template Matching (OpenCV)
↓
Crop Regions of Interest
↓
OCR (PaddleOCR or EasyOCR)
↓
Parse Results
↓
Update Loadout/HUD
Implementation Plan
Phase 1: Screen Capture
import mss
import numpy as np
def capture_game_window():
"""Capture Entropia Universe window."""
with mss.mss() as sct:
# Find game window by title
# Windows: use win32gui
# Return: numpy array (BGR for OpenCV)
pass
Phase 2: Template Matching
import cv2
class UIFinder:
def __init__(self, template_dir):
self.templates = self._load_templates(template_dir)
def find_weapon_slot(self, screenshot):
"""Find weapon slot in screenshot."""
template = self.templates['weapon_slot']
result = cv2.matchTemplate(screenshot, template, cv2.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
if max_val > 0.8: # Threshold
x, y = max_loc
h, w = template.shape[:2]
return (x, y, w, h) # Region
return None
def find_target_window(self, screenshot):
"""Find mob target window."""
# Similar to above
pass
Phase 3: Region OCR
from paddleocr import PaddleOCR
class RegionOCR:
def __init__(self):
# Use English only for speed
self.ocr = PaddleOCR(
lang='en',
use_gpu=False, # CPU only
show_log=False,
det_model_dir=None, # Use default
rec_model_dir=None,
)
def read_weapon_name(self, screenshot, region):
"""OCR weapon name from specific region."""
x, y, w, h = region
crop = screenshot[y:y+h, x:x+w]
# Preprocess for better OCR
gray = cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY)
result = self.ocr.ocr(thresh, cls=False)
if result and result[0]:
text = result[0][0][1][0] # Extract text
confidence = result[0][0][1][1]
return text, confidence
return None, 0.0
Phase 4: Integration
class GameVision:
"""Main vision system."""
def __init__(self):
self.finder = UIFinder('templates/')
self.ocr = RegionOCR()
def get_equipped_weapon(self):
"""Read currently equipped weapon name."""
screenshot = capture_game_window()
region = self.finder.find_weapon_slot(screenshot)
if region:
name, conf = self.ocr.read_weapon_name(screenshot, region)
if conf > 0.8:
return name
return None
def get_target_mob(self):
"""Read current target mob name."""
screenshot = capture_game_window()
region = self.finder.find_target_window(screenshot)
if region:
name, conf = self.ocr.read_text(screenshot, region)
if conf > 0.8:
return name
return None
Template Creation Process
Step 1: Capture Reference Screenshots
def capture_templates():
"""Interactive tool to capture UI templates."""
print("1. Open Entropia Universe")
print("2. Equip your weapon")
print("3. Press SPACE when ready to capture weapon slot template")
input()
screenshot = capture_game_window()
# User drags to select region
region = select_region_interactive(screenshot)
# Save template
x, y, w, h = region
template = screenshot[y:y+h, x:x+w]
cv2.imwrite('templates/weapon_slot.png', template)
Step 2: Create Template Library
templates/
├── weapon_slot.png # Weapon/tool equipped area
├── weapon_name_region.png # Just the text part
├── target_window.png # Target mob window
├── target_name_region.png # Mob name text
├── health_bar.png # Player health
├── tool_slot.png # Mining tool/finder
└── README.md # Template info
OCR Engine Comparison
| Engine | Speed | Accuracy | Setup | Best For |
|---|---|---|---|---|
| PaddleOCR | Medium | High | Easy | General text, multi-language |
| EasyOCR | Medium | High | Easy | English only, simple text |
| Tesseract | Slow | Medium | Medium | Legacy support |
| PaddleOCR + GPU | Fast | High | Complex | Real-time if GPU available |
Recommendation: PaddleOCR (already used in project)
Performance Optimizations
1. Region of Interest Only
# BAD: OCR entire screen
result = ocr.ocr(full_screenshot)
# GOOD: OCR only weapon region
result = ocr.ocr(weapon_region)
2. Frame Skipping
class VisionPoller:
def __init__(self):
self.last_check = 0
self.check_interval = 2.0 # seconds
def poll(self):
if time.time() - self.last_check < self.check_interval:
return # Skip this frame
# Do OCR
self.last_check = time.time()
3. Async Processing
import asyncio
async def vision_loop():
while True:
screenshot = await capture_async()
weapon = await ocr_weapon_async(screenshot)
if weapon:
update_loadout(weapon)
await asyncio.sleep(2)
4. Confidence Thresholding
name, confidence = ocr.read_weapon(screenshot)
if confidence < 0.7:
# Too uncertain, skip this reading
return None
if confidence < 0.9:
# Flag for manual verification
log.warning(f"Low confidence reading: {name} ({confidence:.2f})")
return name
Implementation Roadmap
Week 1: Foundation
- Create screen capture module (Windows window handle)
- Install PaddleOCR (if not already)
- Test basic OCR on game screenshots
- Create template capture tool
Week 2: Templates
- Capture weapon slot template
- Capture target window template
- Test template matching accuracy
- Handle different resolutions/UI scales
Week 3: Integration
- Create GameVision class
- Integrate with Loadout Manager
- Auto-update equipped weapon detection
- Mob name logging for hunts
Week 4: Polish
- Performance optimization
- Confidence thresholds
- Error handling
- Documentation
Expected Accuracy
| UI Element | Expected Accuracy | Notes |
|---|---|---|
| Weapon Name | 85-95% | Clear text, fixed position |
| Tool Name | 85-95% | Similar to weapon |
| Mob Name | 70-85% | Can be complex names, smaller text |
| Health Values | 90-98% | Numbers are easier |
| Damage Numbers | 80-90% | Floating text, harder to catch |
Why not 100%?
- Font rendering variations
- Transparency/effects
- Screen scaling
- Anti-aliasing
Alternative: UI Memory Reading (Advanced)
WARNING: May violate TOS - Research first!
Some games expose UI data in memory. This would be:
- Instant (no screenshot/OCR)
- 100% accurate
- Much lower CPU usage
Research needed:
- Check Entropia Universe EULA
- Look for public memory maps
- Use tools like Cheat Engine (offline only!)
Not recommended unless explicitly allowed.
Summary
Best approach for Lemontropia:
- Template Matching + OCR - Good accuracy, reasonable performance
- Capture templates for weapon slot, target window
- OCR only those regions
- Update every 2-5 seconds (not every frame)
- Use confidence thresholds to filter bad reads
Next Steps:
- I can create the template capture tool
- Create the vision module structure
- Integrate with existing loadout system
Want me to implement any part of this?