# Vision System Plan - Game UI Reading ## Goal Read equipped weapon/tool names, mob names, and other UI text from Entropia Universe game window with high accuracy. ## Two Approaches ### Approach 1: Template Matching + OCR (Recommended) **Best balance of accuracy and performance.** **How it works:** 1. Take screenshot of game window 2. Use template matching to find specific UI regions (weapon slot, target window, etc.) 3. Crop to those regions 4. Run OCR on cropped regions only 5. Much more accurate than full-screen OCR **Pros:** - Fast (only OCR small regions) - Accurate (focused on known text areas) - Works with game updates (just update templates) - Low CPU usage **Cons:** - Requires creating templates for each UI layout - UI position changes require template updates --- ### Approach 2: Pure Computer Vision (Advanced) **Use object detection to find and read text regions automatically.** **How it works:** 1. Train YOLO/SSD model to detect UI elements (weapon icons, text boxes, health bars) 2. Run inference on game screenshot 3. Crop detected regions 4. OCR or classifier on each region **Pros:** - Adapts to UI changes automatically - Can detect new elements without templates - Very robust **Cons:** - Requires training data (thousands of labeled screenshots) - Higher CPU/GPU usage - Complex to implement - Overkill for this use case --- ## Recommended: Template Matching + OCR Pipeline ### Architecture ``` Game Window ↓ Screenshot (mss or PIL) ↓ Template Matching (OpenCV) ↓ Crop Regions of Interest ↓ OCR (PaddleOCR or EasyOCR) ↓ Parse Results ↓ Update Loadout/HUD ``` ### Implementation Plan #### Phase 1: Screen Capture ```python import mss import numpy as np def capture_game_window(): """Capture Entropia Universe window.""" with mss.mss() as sct: # Find game window by title # Windows: use win32gui # Return: numpy array (BGR for OpenCV) pass ``` #### Phase 2: Template Matching ```python import cv2 class UIFinder: def __init__(self, template_dir): self.templates = self._load_templates(template_dir) def find_weapon_slot(self, screenshot): """Find weapon slot in screenshot.""" template = self.templates['weapon_slot'] result = cv2.matchTemplate(screenshot, template, cv2.TM_CCOEFF_NORMED) min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result) if max_val > 0.8: # Threshold x, y = max_loc h, w = template.shape[:2] return (x, y, w, h) # Region return None def find_target_window(self, screenshot): """Find mob target window.""" # Similar to above pass ``` #### Phase 3: Region OCR ```python from paddleocr import PaddleOCR class RegionOCR: def __init__(self): # Use English only for speed self.ocr = PaddleOCR( lang='en', use_gpu=False, # CPU only show_log=False, det_model_dir=None, # Use default rec_model_dir=None, ) def read_weapon_name(self, screenshot, region): """OCR weapon name from specific region.""" x, y, w, h = region crop = screenshot[y:y+h, x:x+w] # Preprocess for better OCR gray = cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY) _, thresh = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY) result = self.ocr.ocr(thresh, cls=False) if result and result[0]: text = result[0][0][1][0] # Extract text confidence = result[0][0][1][1] return text, confidence return None, 0.0 ``` #### Phase 4: Integration ```python class GameVision: """Main vision system.""" def __init__(self): self.finder = UIFinder('templates/') self.ocr = RegionOCR() def get_equipped_weapon(self): """Read currently equipped weapon name.""" screenshot = capture_game_window() region = self.finder.find_weapon_slot(screenshot) if region: name, conf = self.ocr.read_weapon_name(screenshot, region) if conf > 0.8: return name return None def get_target_mob(self): """Read current target mob name.""" screenshot = capture_game_window() region = self.finder.find_target_window(screenshot) if region: name, conf = self.ocr.read_text(screenshot, region) if conf > 0.8: return name return None ``` --- ## Template Creation Process ### Step 1: Capture Reference Screenshots ```python def capture_templates(): """Interactive tool to capture UI templates.""" print("1. Open Entropia Universe") print("2. Equip your weapon") print("3. Press SPACE when ready to capture weapon slot template") input() screenshot = capture_game_window() # User drags to select region region = select_region_interactive(screenshot) # Save template x, y, w, h = region template = screenshot[y:y+h, x:x+w] cv2.imwrite('templates/weapon_slot.png', template) ``` ### Step 2: Create Template Library ``` templates/ ├── weapon_slot.png # Weapon/tool equipped area ├── weapon_name_region.png # Just the text part ├── target_window.png # Target mob window ├── target_name_region.png # Mob name text ├── health_bar.png # Player health ├── tool_slot.png # Mining tool/finder └── README.md # Template info ``` --- ## OCR Engine Comparison | Engine | Speed | Accuracy | Setup | Best For | |--------|-------|----------|-------|----------| | **PaddleOCR** | Medium | High | Easy | General text, multi-language | | **EasyOCR** | Medium | High | Easy | English only, simple text | | **Tesseract** | Slow | Medium | Medium | Legacy support | | **PaddleOCR + GPU** | Fast | High | Complex | Real-time if GPU available | **Recommendation: PaddleOCR** (already used in project) --- ## Performance Optimizations ### 1. Region of Interest Only ```python # BAD: OCR entire screen result = ocr.ocr(full_screenshot) # GOOD: OCR only weapon region result = ocr.ocr(weapon_region) ``` ### 2. Frame Skipping ```python class VisionPoller: def __init__(self): self.last_check = 0 self.check_interval = 2.0 # seconds def poll(self): if time.time() - self.last_check < self.check_interval: return # Skip this frame # Do OCR self.last_check = time.time() ``` ### 3. Async Processing ```python import asyncio async def vision_loop(): while True: screenshot = await capture_async() weapon = await ocr_weapon_async(screenshot) if weapon: update_loadout(weapon) await asyncio.sleep(2) ``` ### 4. Confidence Thresholding ```python name, confidence = ocr.read_weapon(screenshot) if confidence < 0.7: # Too uncertain, skip this reading return None if confidence < 0.9: # Flag for manual verification log.warning(f"Low confidence reading: {name} ({confidence:.2f})") return name ``` --- ## Implementation Roadmap ### Week 1: Foundation - [ ] Create screen capture module (Windows window handle) - [ ] Install PaddleOCR (if not already) - [ ] Test basic OCR on game screenshots - [ ] Create template capture tool ### Week 2: Templates - [ ] Capture weapon slot template - [ ] Capture target window template - [ ] Test template matching accuracy - [ ] Handle different resolutions/UI scales ### Week 3: Integration - [ ] Create GameVision class - [ ] Integrate with Loadout Manager - [ ] Auto-update equipped weapon detection - [ ] Mob name logging for hunts ### Week 4: Polish - [ ] Performance optimization - [ ] Confidence thresholds - [ ] Error handling - [ ] Documentation --- ## Expected Accuracy | UI Element | Expected Accuracy | Notes | |------------|-------------------|-------| | Weapon Name | 85-95% | Clear text, fixed position | | Tool Name | 85-95% | Similar to weapon | | Mob Name | 70-85% | Can be complex names, smaller text | | Health Values | 90-98% | Numbers are easier | | Damage Numbers | 80-90% | Floating text, harder to catch | **Why not 100%?** - Font rendering variations - Transparency/effects - Screen scaling - Anti-aliasing --- ## Alternative: UI Memory Reading (Advanced) **WARNING: May violate TOS - Research first!** Some games expose UI data in memory. This would be: - Instant (no screenshot/OCR) - 100% accurate - Much lower CPU usage **Research needed:** - Check Entropia Universe EULA - Look for public memory maps - Use tools like Cheat Engine (offline only!) **Not recommended** unless explicitly allowed. --- ## Summary **Best approach for Lemontropia:** 1. **Template Matching + OCR** - Good accuracy, reasonable performance 2. Capture templates for weapon slot, target window 3. OCR only those regions 4. Update every 2-5 seconds (not every frame) 5. Use confidence thresholds to filter bad reads **Next Steps:** 1. I can create the template capture tool 2. Create the vision module structure 3. Integrate with existing loadout system Want me to implement any part of this?