Cua-BenchExamples

Create a Universal Task

Build a cross-platform Minesweeper game that works on QEMU, simulated desktop, and Docker

In this tutorial, you'll create a Minesweeper game task using the Universal GUI API. This task is universal - it runs across QEMU, simulated desktop, and Docker environments without modification.

Time: ~15 minutes Prerequisites: cua-bench installed

What You'll Build

A fully playable Minesweeper game where agents must click cells to reveal them and flag mines. You'll learn:

  • Using launch_window() to create GUI apps
  • Communicating with your GUI via JavaScript
  • Creating task variants with different difficulty levels
  • Evaluating game state for rewards

Step 1: Create Task Directory

mkdir -p tasks/minesweeper/gui
cd tasks/minesweeper

Step 2: Create the Task Configuration

Create main.py:

import cua_bench as cb
from pathlib import Path

pid = None

@cb.tasks_config(split="train")
def load():
    """Generate Minesweeper variants with different difficulty levels."""
    game_configs = [
        (8, 8, 10),   # Easy
        (10, 10, 15), # Medium
        (12, 12, 20), # Hard
    ]

    return [
        cb.Task(
            description=f'Play Minesweeper on a {rows}x{cols} grid with {mines} mines. Click cells to reveal them, right-click to flag mines. Win by revealing all non-mine cells.',
            metadata={
                "rows": rows,
                "cols": cols,
                "mines": mines,
            },
            computer={
                "provider": "simulated",  # Use simulated desktop for preview
                "setup_config": {
                    "os_type": "win11",
                    "width": 800,
                    "height": 800,
                }
            }
        )
        for rows, cols, mines in game_configs
    ]

@cb.setup_task(split="train")
async def start(task_cfg: cb.Task, session: cb.DesktopSession):
    """Launch the Minesweeper GUI window."""
    global pid

    # Get game parameters from task config
    rows = task_cfg.metadata["rows"]
    cols = task_cfg.metadata["cols"]
    mines = task_cfg.metadata["mines"]

    # Calculate window size based on grid size
    window_width = cols * 30 + 100
    window_height = rows * 30 + 200

    # Launch window with the game GUI
    pid = await session.launch_window(
        html=(Path(__file__).parent / "gui/index.html").read_text('utf-8'),
        title="Minesweeper",
        width=window_width,
        height=window_height,
    )

    # Initialize the game with the task configuration
    if pid is not None:
        await session.execute_javascript(pid, f"window.initGame({rows}, {cols}, {mines})")

@cb.evaluate_task(split="train")
async def evaluate(task_cfg: cb.Task, session: cb.DesktopSession) -> list[float]:
    """Check if the agent won or provide partial credit."""
    global pid

    if pid is None:
        return [0.0]

    # Check game state via JavaScript
    game_won = await session.execute_javascript(pid, "window.__gameWon")
    game_lost = await session.execute_javascript(pid, "window.__gameLost")

    # Win: 1.0, Loss: 0.0
    if game_won is True:
        return [1.0]
    elif game_lost is True:
        return [0.0]
    else:
        # Partial credit based on revealed cells
        rows = task_cfg.metadata["rows"]
        cols = task_cfg.metadata["cols"]
        mines = task_cfg.metadata["mines"]

        revealed_count = await session.execute_javascript(pid, "window.game.revealedCount")
        total_non_mines = rows * cols - mines

        if revealed_count and total_non_mines > 0:
            return [revealed_count / total_non_mines]
        return [0.0]

@cb.solve_task(split="train")
async def solve(task_cfg: cb.Task, session: cb.DesktopSession):
    """Demonstrate a simple solution strategy."""
    global pid
    import asyncio

    if pid is None:
        return

    rows = task_cfg.metadata["rows"]
    cols = task_cfg.metadata["cols"]

    # Simple strategy: click corner first, then scan left-to-right
    await session.click_element(pid, '[data-row="0"][data-col="0"]')
    await asyncio.sleep(0.5)

    # Continue clicking unrevealed cells
    for r in range(rows):
        for c in range(cols):
            game_state = await session.execute_javascript(pid, "window.__gameState")
            if game_state != "playing":
                break

            is_revealed = await session.execute_javascript(
                pid, f"window.game.revealed[{r}][{c}]"
            )
            is_flagged = await session.execute_javascript(
                pid, f"window.game.flagged[{r}][{c}]"
            )

            if not is_revealed and not is_flagged:
                await session.click_element(pid, f'[data-row="{r}"][data-col="{c}"]')
                await asyncio.sleep(0.2)

        if game_state != "playing":
            break

Step 3: Create the HTML GUI

Create gui/index.html with the Minesweeper game:

<div class="p-4 flex flex-col items-center">
  <div class="mb-4 text-center">
    <h1 class="text-2xl font-bold mb-2">Minesweeper</h1>
    <div class="flex gap-4 justify-center items-center mb-2">
      <div class="text-lg font-mono">🚩 <span id="flag-count">0</span></div>
      <button id="reset-btn" class="px-4 py-2 bg-gray-200 hover:bg-gray-300 rounded">
        😊 New Game
      </button>
      <div class="text-lg font-mono">⏱️ <span id="timer">0</span></div>
    </div>
    <div id="status" class="text-sm font-semibold"></div>
  </div>
  <div id="game-board" class="inline-block border-4 border-gray-400 bg-gray-300"></div>

  <style>
    .cell {
      width: 30px;
      height: 30px;
      border: 2px outset #999;
      background: #c0c0c0;
      display: inline-flex;
      align-items: center;
      justify-content: center;
      font-weight: bold;
      font-size: 16px;
      cursor: pointer;
      user-select: none;
    }
    .cell.revealed {
      border: 1px solid #999;
      background: #e0e0e0;
      cursor: default;
    }
    .cell.mine {
      background: #ff6b6b;
    }
    .cell.num-1 {
      color: blue;
    }
    .cell.num-2 {
      color: green;
    }
    .cell.num-3 {
      color: red;
    }
  </style>

  <script>
    class Minesweeper {
      constructor(rows, cols, mines) {
        this.rows = rows;
        this.cols = cols;
        this.totalMines = mines;
        this.board = [];
        this.revealed = [];
        this.flagged = [];
        this.revealedCount = 0;
        this.initBoard();
        this.render();
        window.__gameState = 'playing';
        window.__gameWon = false;
        window.__gameLost = false;
      }

      initBoard() {
        for (let r = 0; r < this.rows; r++) {
          this.board[r] = [];
          this.revealed[r] = [];
          this.flagged[r] = [];
          for (let c = 0; c < this.cols; c++) {
            this.board[r][c] = 0;
            this.revealed[r][c] = false;
            this.flagged[r][c] = false;
          }
        }
      }

      placeMines(excludeRow, excludeCol) {
        let placed = 0;
        while (placed < this.totalMines) {
          const r = Math.floor(Math.random() * this.rows);
          const c = Math.floor(Math.random() * this.cols);
          if ((r === excludeRow && c === excludeCol) || this.board[r][c] === -1) {
            continue;
          }
          this.board[r][c] = -1;
          placed++;
        }

        // Calculate numbers
        for (let r = 0; r < this.rows; r++) {
          for (let c = 0; c < this.cols; c++) {
            if (this.board[r][c] !== -1) {
              this.board[r][c] = this.countAdjacentMines(r, c);
            }
          }
        }
      }

      countAdjacentMines(row, col) {
        let count = 0;
        for (let dr = -1; dr <= 1; dr++) {
          for (let dc = -1; dc <= 1; dc++) {
            if (dr === 0 && dc === 0) continue;
            const r = row + dr,
              c = col + dc;
            if (r >= 0 && r < this.rows && c >= 0 && c < this.cols && this.board[r][c] === -1) {
              count++;
            }
          }
        }
        return count;
      }

      reveal(row, col) {
        if (this.revealed[row][col] || this.flagged[row][col]) return;

        if (!this.firstClickDone) {
          this.placeMines(row, col);
          this.firstClickDone = true;
        }

        this.revealed[row][col] = true;
        this.revealedCount++;

        if (this.board[row][col] === -1) {
          this.lose();
          return;
        }

        if (this.board[row][col] === 0) {
          for (let dr = -1; dr <= 1; dr++) {
            for (let dc = -1; dc <= 1; dc++) {
              const r = row + dr,
                c = col + dc;
              if (r >= 0 && r < this.rows && c >= 0 && c < this.cols && !this.revealed[r][c]) {
                this.reveal(r, c);
              }
            }
          }
        }

        this.checkWin();
        this.render();
      }

      checkWin() {
        if (this.revealedCount === this.rows * this.cols - this.totalMines) {
          window.__gameState = 'won';
          window.__gameWon = true;
          document.getElementById('status').textContent = '🎉 You Won!';
        }
      }

      lose() {
        window.__gameState = 'lost';
        window.__gameLost = true;
        document.getElementById('status').textContent = '💥 Game Over!';
      }

      render() {
        const board = document.getElementById('game-board');
        board.innerHTML = '';
        board.style.display = 'grid';
        board.style.gridTemplateColumns = `repeat(${this.cols}, 30px)`;

        for (let r = 0; r < this.rows; r++) {
          for (let c = 0; c < this.cols; c++) {
            const cell = document.createElement('div');
            cell.className = 'cell';
            cell.dataset.row = r;
            cell.dataset.col = c;

            if (this.revealed[r][c]) {
              cell.classList.add('revealed');
              if (this.board[r][c] === -1) {
                cell.textContent = '💣';
                cell.classList.add('mine');
              } else if (this.board[r][c] > 0) {
                cell.textContent = this.board[r][c];
                cell.classList.add(`num-${this.board[r][c]}`);
              }
            } else if (this.flagged[r][c]) {
              cell.textContent = '🚩';
            }

            cell.addEventListener('click', () => this.reveal(r, c));
            board.appendChild(cell);
          }
        }
      }
    }

    window.initGame = function (rows, cols, mines) {
      window.game = new Minesweeper(rows, cols, mines);
    };
  </script>
</div>

Step 4: Run the Task

Preview Interactively

cb interact tasks/minesweeper --variant-id 0

A browser window will open showing the Minesweeper game in a simulated desktop environment.

Run with Oracle

cb interact tasks/minesweeper --oracle --variant-id 0

Run with Agent

export ANTHROPIC_API_KEY=sk-...
cb run task tasks/minesweeper --agent cua-agent --model anthropic/claude-sonnet-4-20250514

Key Concepts

Window Management: launch_window() creates GUI apps in your tasks

JavaScript Communication: Use execute_javascript() to read/write game state

Task Variants: One codebase generates multiple difficulty levels

Partial Rewards: Reward based on progress, not just win/loss

Why Universal?

This task works across all platforms:

  • Simulated Desktop: Set provider: "simulated" for lightweight browser preview
  • Docker: Use real Linux desktop environments
  • QEMU: Run on actual Windows/Linux VMs

The same HTML GUI and task code runs everywhere.

Next Steps

Was this page helpful?