9.1 KiB

Raw Permalink Blame History

Vision System Plan - Game UI Reading

Goal

Read equipped weapon/tool names, mob names, and other UI text from Entropia Universe game window with high accuracy.

Two Approaches

Approach 1: Template Matching + OCR (Recommended)

Best balance of accuracy and performance.

How it works:

Take screenshot of game window
Use template matching to find specific UI regions (weapon slot, target window, etc.)
Crop to those regions
Run OCR on cropped regions only
Much more accurate than full-screen OCR

Pros:

Fast (only OCR small regions)
Accurate (focused on known text areas)
Works with game updates (just update templates)
Low CPU usage

Cons:

Requires creating templates for each UI layout
UI position changes require template updates

Approach 2: Pure Computer Vision (Advanced)

Use object detection to find and read text regions automatically.

How it works:

Train YOLO/SSD model to detect UI elements (weapon icons, text boxes, health bars)
Run inference on game screenshot
Crop detected regions
OCR or classifier on each region

Pros:

Adapts to UI changes automatically
Can detect new elements without templates
Very robust

Cons:

Requires training data (thousands of labeled screenshots)
Higher CPU/GPU usage
Complex to implement
Overkill for this use case

Recommended: Template Matching + OCR Pipeline

Architecture

Game Window
     ↓
Screenshot (mss or PIL)
     ↓
Template Matching (OpenCV)
     ↓
Crop Regions of Interest
     ↓
OCR (PaddleOCR or EasyOCR)
     ↓
Parse Results
     ↓
Update Loadout/HUD

Implementation Plan

Phase 1: Screen Capture

import mss
import numpy as np

def capture_game_window():
    """Capture Entropia Universe window."""
    with mss.mss() as sct:
        # Find game window by title
        # Windows: use win32gui
        # Return: numpy array (BGR for OpenCV)
        pass

Phase 2: Template Matching

import cv2

class UIFinder:
    def __init__(self, template_dir):
        self.templates = self._load_templates(template_dir)
    
    def find_weapon_slot(self, screenshot):
        """Find weapon slot in screenshot."""
        template = self.templates['weapon_slot']
        result = cv2.matchTemplate(screenshot, template, cv2.TM_CCOEFF_NORMED)
        min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
        
        if max_val > 0.8:  # Threshold
            x, y = max_loc
            h, w = template.shape[:2]
            return (x, y, w, h)  # Region
        return None
    
    def find_target_window(self, screenshot):
        """Find mob target window."""
        # Similar to above
        pass

Phase 3: Region OCR

from paddleocr import PaddleOCR

class RegionOCR:
    def __init__(self):
        # Use English only for speed
        self.ocr = PaddleOCR(
            lang='en',
            use_gpu=False,  # CPU only
            show_log=False,
            det_model_dir=None,  # Use default
            rec_model_dir=None,
        )
    
    def read_weapon_name(self, screenshot, region):
        """OCR weapon name from specific region."""
        x, y, w, h = region
        crop = screenshot[y:y+h, x:x+w]
        
        # Preprocess for better OCR
        gray = cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY)
        _, thresh = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY)
        
        result = self.ocr.ocr(thresh, cls=False)
        
        if result and result[0]:
            text = result[0][0][1][0]  # Extract text
            confidence = result[0][0][1][1]
            return text, confidence
        return None, 0.0

Phase 4: Integration

class GameVision:
    """Main vision system."""
    
    def __init__(self):
        self.finder = UIFinder('templates/')
        self.ocr = RegionOCR()
    
    def get_equipped_weapon(self):
        """Read currently equipped weapon name."""
        screenshot = capture_game_window()
        region = self.finder.find_weapon_slot(screenshot)
        
        if region:
            name, conf = self.ocr.read_weapon_name(screenshot, region)
            if conf > 0.8:
                return name
        return None
    
    def get_target_mob(self):
        """Read current target mob name."""
        screenshot = capture_game_window()
        region = self.finder.find_target_window(screenshot)
        
        if region:
            name, conf = self.ocr.read_text(screenshot, region)
            if conf > 0.8:
                return name
        return None

Template Creation Process

Step 1: Capture Reference Screenshots

def capture_templates():
    """Interactive tool to capture UI templates."""
    print("1. Open Entropia Universe")
    print("2. Equip your weapon")
    print("3. Press SPACE when ready to capture weapon slot template")
    input()
    
    screenshot = capture_game_window()
    
    # User drags to select region
    region = select_region_interactive(screenshot)
    
    # Save template
    x, y, w, h = region
    template = screenshot[y:y+h, x:x+w]
    cv2.imwrite('templates/weapon_slot.png', template)

Step 2: Create Template Library

templates/
├── weapon_slot.png          # Weapon/tool equipped area
├── weapon_name_region.png   # Just the text part
├── target_window.png        # Target mob window
├── target_name_region.png   # Mob name text
├── health_bar.png           # Player health
├── tool_slot.png            # Mining tool/finder
└── README.md                # Template info

OCR Engine Comparison

Engine	Speed	Accuracy	Setup	Best For
PaddleOCR	Medium	High	Easy	General text, multi-language
EasyOCR	Medium	High	Easy	English only, simple text
Tesseract	Slow	Medium	Medium	Legacy support
PaddleOCR + GPU	Fast	High	Complex	Real-time if GPU available

Recommendation: PaddleOCR (already used in project)

Performance Optimizations

1. Region of Interest Only

# BAD: OCR entire screen
result = ocr.ocr(full_screenshot)

# GOOD: OCR only weapon region
result = ocr.ocr(weapon_region)

2. Frame Skipping

class VisionPoller:
    def __init__(self):
        self.last_check = 0
        self.check_interval = 2.0  # seconds
    
    def poll(self):
        if time.time() - self.last_check < self.check_interval:
            return  # Skip this frame
        
        # Do OCR
        self.last_check = time.time()

3. Async Processing

import asyncio

async def vision_loop():
    while True:
        screenshot = await capture_async()
        weapon = await ocr_weapon_async(screenshot)
        if weapon:
            update_loadout(weapon)
        await asyncio.sleep(2)

4. Confidence Thresholding

name, confidence = ocr.read_weapon(screenshot)

if confidence < 0.7:
    # Too uncertain, skip this reading
    return None

if confidence < 0.9:
    # Flag for manual verification
    log.warning(f"Low confidence reading: {name} ({confidence:.2f})")

return name

Implementation Roadmap

Week 1: Foundation

Create screen capture module (Windows window handle)
Install PaddleOCR (if not already)
Test basic OCR on game screenshots
Create template capture tool

Week 2: Templates

Capture weapon slot template
Capture target window template
Test template matching accuracy
Handle different resolutions/UI scales

Week 3: Integration

Create GameVision class
Integrate with Loadout Manager
Auto-update equipped weapon detection
Mob name logging for hunts

Week 4: Polish

Performance optimization
Confidence thresholds
Error handling
Documentation

Expected Accuracy

UI Element	Expected Accuracy	Notes
Weapon Name	85-95%	Clear text, fixed position
Tool Name	85-95%	Similar to weapon
Mob Name	70-85%	Can be complex names, smaller text
Health Values	90-98%	Numbers are easier
Damage Numbers	80-90%	Floating text, harder to catch

Why not 100%?

Font rendering variations
Transparency/effects
Screen scaling
Anti-aliasing

Alternative: UI Memory Reading (Advanced)

WARNING: May violate TOS - Research first!

Some games expose UI data in memory. This would be:

Instant (no screenshot/OCR)
100% accurate
Much lower CPU usage

Research needed:

Check Entropia Universe EULA
Look for public memory maps
Use tools like Cheat Engine (offline only!)

Not recommended unless explicitly allowed.

Summary

Best approach for Lemontropia:

Template Matching + OCR - Good accuracy, reasonable performance
Capture templates for weapon slot, target window
OCR only those regions
Update every 2-5 seconds (not every frame)
Use confidence thresholds to filter bad reads

Next Steps:

I can create the template capture tool
Create the vision module structure
Integrate with existing loadout system

Want me to implement any part of this?

9.1 KiB Raw Permalink Blame History