Lemontropia-Suite/docs/VISION_PLAN.md

9.1 KiB

Vision System Plan - Game UI Reading

Goal

Read equipped weapon/tool names, mob names, and other UI text from Entropia Universe game window with high accuracy.

Two Approaches

Best balance of accuracy and performance.

How it works:

  1. Take screenshot of game window
  2. Use template matching to find specific UI regions (weapon slot, target window, etc.)
  3. Crop to those regions
  4. Run OCR on cropped regions only
  5. Much more accurate than full-screen OCR

Pros:

  • Fast (only OCR small regions)
  • Accurate (focused on known text areas)
  • Works with game updates (just update templates)
  • Low CPU usage

Cons:

  • Requires creating templates for each UI layout
  • UI position changes require template updates

Approach 2: Pure Computer Vision (Advanced)

Use object detection to find and read text regions automatically.

How it works:

  1. Train YOLO/SSD model to detect UI elements (weapon icons, text boxes, health bars)
  2. Run inference on game screenshot
  3. Crop detected regions
  4. OCR or classifier on each region

Pros:

  • Adapts to UI changes automatically
  • Can detect new elements without templates
  • Very robust

Cons:

  • Requires training data (thousands of labeled screenshots)
  • Higher CPU/GPU usage
  • Complex to implement
  • Overkill for this use case

Architecture

Game Window
     ↓
Screenshot (mss or PIL)
     ↓
Template Matching (OpenCV)
     ↓
Crop Regions of Interest
     ↓
OCR (PaddleOCR or EasyOCR)
     ↓
Parse Results
     ↓
Update Loadout/HUD

Implementation Plan

Phase 1: Screen Capture

import mss
import numpy as np

def capture_game_window():
    """Capture Entropia Universe window."""
    with mss.mss() as sct:
        # Find game window by title
        # Windows: use win32gui
        # Return: numpy array (BGR for OpenCV)
        pass

Phase 2: Template Matching

import cv2

class UIFinder:
    def __init__(self, template_dir):
        self.templates = self._load_templates(template_dir)
    
    def find_weapon_slot(self, screenshot):
        """Find weapon slot in screenshot."""
        template = self.templates['weapon_slot']
        result = cv2.matchTemplate(screenshot, template, cv2.TM_CCOEFF_NORMED)
        min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
        
        if max_val > 0.8:  # Threshold
            x, y = max_loc
            h, w = template.shape[:2]
            return (x, y, w, h)  # Region
        return None
    
    def find_target_window(self, screenshot):
        """Find mob target window."""
        # Similar to above
        pass

Phase 3: Region OCR

from paddleocr import PaddleOCR

class RegionOCR:
    def __init__(self):
        # Use English only for speed
        self.ocr = PaddleOCR(
            lang='en',
            use_gpu=False,  # CPU only
            show_log=False,
            det_model_dir=None,  # Use default
            rec_model_dir=None,
        )
    
    def read_weapon_name(self, screenshot, region):
        """OCR weapon name from specific region."""
        x, y, w, h = region
        crop = screenshot[y:y+h, x:x+w]
        
        # Preprocess for better OCR
        gray = cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY)
        _, thresh = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY)
        
        result = self.ocr.ocr(thresh, cls=False)
        
        if result and result[0]:
            text = result[0][0][1][0]  # Extract text
            confidence = result[0][0][1][1]
            return text, confidence
        return None, 0.0

Phase 4: Integration

class GameVision:
    """Main vision system."""
    
    def __init__(self):
        self.finder = UIFinder('templates/')
        self.ocr = RegionOCR()
    
    def get_equipped_weapon(self):
        """Read currently equipped weapon name."""
        screenshot = capture_game_window()
        region = self.finder.find_weapon_slot(screenshot)
        
        if region:
            name, conf = self.ocr.read_weapon_name(screenshot, region)
            if conf > 0.8:
                return name
        return None
    
    def get_target_mob(self):
        """Read current target mob name."""
        screenshot = capture_game_window()
        region = self.finder.find_target_window(screenshot)
        
        if region:
            name, conf = self.ocr.read_text(screenshot, region)
            if conf > 0.8:
                return name
        return None

Template Creation Process

Step 1: Capture Reference Screenshots

def capture_templates():
    """Interactive tool to capture UI templates."""
    print("1. Open Entropia Universe")
    print("2. Equip your weapon")
    print("3. Press SPACE when ready to capture weapon slot template")
    input()
    
    screenshot = capture_game_window()
    
    # User drags to select region
    region = select_region_interactive(screenshot)
    
    # Save template
    x, y, w, h = region
    template = screenshot[y:y+h, x:x+w]
    cv2.imwrite('templates/weapon_slot.png', template)

Step 2: Create Template Library

templates/
├── weapon_slot.png          # Weapon/tool equipped area
├── weapon_name_region.png   # Just the text part
├── target_window.png        # Target mob window
├── target_name_region.png   # Mob name text
├── health_bar.png           # Player health
├── tool_slot.png            # Mining tool/finder
└── README.md                # Template info

OCR Engine Comparison

Engine Speed Accuracy Setup Best For
PaddleOCR Medium High Easy General text, multi-language
EasyOCR Medium High Easy English only, simple text
Tesseract Slow Medium Medium Legacy support
PaddleOCR + GPU Fast High Complex Real-time if GPU available

Recommendation: PaddleOCR (already used in project)


Performance Optimizations

1. Region of Interest Only

# BAD: OCR entire screen
result = ocr.ocr(full_screenshot)

# GOOD: OCR only weapon region
result = ocr.ocr(weapon_region)

2. Frame Skipping

class VisionPoller:
    def __init__(self):
        self.last_check = 0
        self.check_interval = 2.0  # seconds
    
    def poll(self):
        if time.time() - self.last_check < self.check_interval:
            return  # Skip this frame
        
        # Do OCR
        self.last_check = time.time()

3. Async Processing

import asyncio

async def vision_loop():
    while True:
        screenshot = await capture_async()
        weapon = await ocr_weapon_async(screenshot)
        if weapon:
            update_loadout(weapon)
        await asyncio.sleep(2)

4. Confidence Thresholding

name, confidence = ocr.read_weapon(screenshot)

if confidence < 0.7:
    # Too uncertain, skip this reading
    return None

if confidence < 0.9:
    # Flag for manual verification
    log.warning(f"Low confidence reading: {name} ({confidence:.2f})")

return name

Implementation Roadmap

Week 1: Foundation

  • Create screen capture module (Windows window handle)
  • Install PaddleOCR (if not already)
  • Test basic OCR on game screenshots
  • Create template capture tool

Week 2: Templates

  • Capture weapon slot template
  • Capture target window template
  • Test template matching accuracy
  • Handle different resolutions/UI scales

Week 3: Integration

  • Create GameVision class
  • Integrate with Loadout Manager
  • Auto-update equipped weapon detection
  • Mob name logging for hunts

Week 4: Polish

  • Performance optimization
  • Confidence thresholds
  • Error handling
  • Documentation

Expected Accuracy

UI Element Expected Accuracy Notes
Weapon Name 85-95% Clear text, fixed position
Tool Name 85-95% Similar to weapon
Mob Name 70-85% Can be complex names, smaller text
Health Values 90-98% Numbers are easier
Damage Numbers 80-90% Floating text, harder to catch

Why not 100%?

  • Font rendering variations
  • Transparency/effects
  • Screen scaling
  • Anti-aliasing

Alternative: UI Memory Reading (Advanced)

WARNING: May violate TOS - Research first!

Some games expose UI data in memory. This would be:

  • Instant (no screenshot/OCR)
  • 100% accurate
  • Much lower CPU usage

Research needed:

  • Check Entropia Universe EULA
  • Look for public memory maps
  • Use tools like Cheat Engine (offline only!)

Not recommended unless explicitly allowed.


Summary

Best approach for Lemontropia:

  1. Template Matching + OCR - Good accuracy, reasonable performance
  2. Capture templates for weapon slot, target window
  3. OCR only those regions
  4. Update every 2-5 seconds (not every frame)
  5. Use confidence thresholds to filter bad reads

Next Steps:

  1. I can create the template capture tool
  2. Create the vision module structure
  3. Integrate with existing loadout system

Want me to implement any part of this?