Computer SDK API Reference
Python API reference for controlling virtual machines and computer interfaces
Cua Computer Interface for cross-platform computer control.
Classes
| Class | Description |
|---|---|
Computer | Computer is the main class for interacting with the computer. |
VMProviderType | Enum of supported VM provider types. |
Computer
Computer is the main class for interacting with the computer.
Constructor
Computer(self, display: Union[Display, Dict[str, int], str] = '1024x768', memory: str = '8GB', cpu: str = '4', os_type: OSType = 'macos', name: str = '', image: Optional[str] = None, shared_directories: Optional[List[str]] = None, use_host_computer_server: bool = False, verbosity: Union[int, LogLevel] = logging.INFO, telemetry_enabled: bool = True, provider_type: Union[str, VMProviderType] = VMProviderType.LUME, provider_port: Optional[int] = 7777, noVNC_port: Optional[int] = 8006, api_port: Optional[int] = None, host: str = 'localhost', api_host: Optional[str] = None, storage: Optional[str] = None, ephemeral: bool = False, api_key: Optional[str] = None, experiments: Optional[List[str]] = None, timeout: int = 100, run_opts: Optional[Dict[str, Any]] = None)Attributes
| Name | Type | Description |
|---|---|---|
logger | Any | |
image | Any | |
host | Any | |
provider_port | Any | |
noVNC_port | Any | |
api_port | Any | |
api_host | Any | |
os_type | Any | |
provider_type | Any | |
ephemeral | Any | |
api_key | Any | |
timeout | Any | |
experiments | Any | |
custom_run_opts | Any | |
storage | Any | |
shared_path | Any | |
verbosity | Any | |
vm_logger | Any | |
interface_logger | Any | |
config | Any | |
shared_directories | Any | |
use_host_computer_server | Any | |
interface | Any | Get the computer interface for interacting with the VM. |
tracing | ComputerTracing | Get the computer tracing instance for recording sessions. |
telemetry_enabled | bool | Check if telemetry is enabled for this computer instance. |
Methods
Computer.create_desktop_from_apps
def create_desktop_from_apps(self, apps)Create a virtual desktop from a list of app names, returning a DioramaComputer that proxies Diorama.Interface but uses diorama_cmds via the computer interface.
Parameters:
| Name | Type | Description |
|---|---|---|
apps | list[str] | List of application names to include in the desktop. |
Returns: DioramaComputer: A proxy object with the Diorama interface, but using diorama_cmds.
Computer.run
async def run(self) -> Optional[str]Initialize the VM and computer interface.
Computer.disconnect
async def disconnect(self) -> NoneDisconnect from the computer's WebSocket interface.
Computer.stop
async def stop(self) -> NoneDisconnect from the computer's WebSocket interface and stop the computer.
Computer.start
async def start(self) -> NoneStart the computer.
Computer.restart
async def restart(self) -> NoneRestart the computer.
If using a VM provider that supports restart, this will issue a restart without tearing down the provider context, then reconnect the interface. Falls back to stop()+run() when a provider restart is not available.
Computer.get_ip
async def get_ip(self, max_retries: int = 15, retry_delay: int = 3) -> strGet the IP address of the VM or localhost if using host computer server.
This method delegates to the provider's get_ip method, which waits indefinitely until the VM has a valid IP address.
Parameters:
| Name | Type | Description |
|---|---|---|
max_retries | Any | Unused parameter, kept for backward compatibility |
retry_delay | Any | Delay between retries in seconds (default: 2) |
Returns: IP address of the VM or localhost if using host computer server
Computer.wait_vm_ready
async def wait_vm_ready(self) -> Optional[Dict[str, Any]]Wait for VM to be ready with an IP address.
Returns: VM status information or None if using host computer server.
Computer.update
async def update(self, cpu: Optional[int] = None, memory: Optional[str] = None)Update VM settings.
Computer.get_screenshot_size
def get_screenshot_size(self, screenshot: bytes) -> Dict[str, int]Get the dimensions of a screenshot.
Parameters:
| Name | Type | Description |
|---|---|---|
screenshot | Any | The screenshot bytes |
Returns: Dict[str, int]: Dictionary containing 'width' and 'height' of the image
Computer.to_screen_coordinates
async def to_screen_coordinates(self, x: float, y: float) -> tuple[float, float]Convert normalized coordinates to screen coordinates.
Parameters:
| Name | Type | Description |
|---|---|---|
x | Any | X coordinate between 0 and 1 |
y | Any | Y coordinate between 0 and 1 |
Returns: tuple[float, float]: Screen coordinates (x, y)
Computer.to_screenshot_coordinates
async def to_screenshot_coordinates(self, x: float, y: float) -> tuple[float, float]Convert screen coordinates to screenshot coordinates.
Parameters:
| Name | Type | Description |
|---|---|---|
x | Any | X coordinate in screen space |
y | Any | Y coordinate in screen space |
Returns: tuple[float, float]: (x, y) coordinates in screenshot space
Computer.playwright_exec
async def playwright_exec(self, command: str, params: Optional[Dict] = None) -> Dict[str, Any]Execute a Playwright browser command.
Parameters:
| Name | Type | Description |
|---|---|---|
command | Any | The browser command to execute (visit_url, click, type, scroll, web_search) |
params | Any | Command parameters |
Returns: Dict containing the command result
Example:
# Navigate to a URL
await computer.playwright_exec("visit_url", {"url": "https://example.com"})
# Click at coordinates
await computer.playwright_exec("click", {"x": 100, "y": 200})
# Type text
await computer.playwright_exec("type", {"text": "Hello, world!"})
# Scroll
await computer.playwright_exec("scroll", {"delta_x": 0, "delta_y": -100})
# Web search
await computer.playwright_exec("web_search", {"query": "computer use agent"})Computer.venv_install
async def venv_install(self, venv_name: str, requirements: list[str])Install packages in a UV project.
Parameters:
| Name | Type | Description |
|---|---|---|
venv_name | Any | Name of the UV project |
requirements | Any | List of package requirements to install |
Returns: Tuple of (stdout, stderr) from the installation command
Computer.pip_install
async def pip_install(self, requirements: list[str])Install packages using the system Python with UV (no venv).
Parameters:
| Name | Type | Description |
|---|---|---|
requirements | Any | List of package requirements to install globally/user site. |
Returns: Tuple of (stdout, stderr) from the installation command
Computer.venv_cmd
async def venv_cmd(self, venv_name: str, command: str)Execute a shell command in a UV project.
Parameters:
| Name | Type | Description |
|---|---|---|
venv_name | Any | Name of the UV project |
command | Any | Shell command to execute in the UV project |
Returns: Tuple of (stdout, stderr) from the command execution
Computer.venv_exec
async def venv_exec(self, venv_name: str, python_func, args = (), kwargs = {})Execute Python function in a virtual environment using source code extraction.
Parameters:
| Name | Type | Description |
|---|---|---|
venv_name | Any | Name of the virtual environment |
python_func | Any | A callable function to execute *args: Positional arguments to pass to the function **kwargs: Keyword arguments to pass to the function |
Returns: The result of the function execution, or raises any exception that occurred
Computer.venv_exec_background
async def venv_exec_background(self, venv_name: str, python_func, args = (), requirements: Optional[List[str]] = None, kwargs = {}) -> intRun the Python function in the venv in the background and return the PID.
Uses a short launcher Python that spawns a detached child and exits immediately.
Computer.python_exec
async def python_exec(self, python_func, args = (), kwargs = {})Execute a Python function using the system Python (no venv).
Uses source extraction and base64 transport, mirroring venv_exec but without virtual environment activation.
Returns the function result or raises a reconstructed exception with remote traceback context appended.
Computer.python_exec_background
async def python_exec_background(self, python_func, args = (), requirements: Optional[List[str]] = None, kwargs = {}) -> intRun a Python function with the system interpreter in the background and return PID.
Uses a short launcher Python that spawns a detached child and exits immediately.
Computer.python_command
def python_command(self, requirements: Optional[List[str]] = None, venv_name: str = 'default', use_system_python: bool = False, background: bool = False) -> Callable[[Callable[P, R]], Callable[P, Awaitable[R]]]Decorator to execute a Python function remotely in this Computer's venv.
This mirrors computer.helpers.sandboxed() but binds to this instance and
optionally ensures required packages are installed before execution.
Parameters:
| Name | Type | Description |
|---|---|---|
requirements | Any | Packages to install in the virtual environment. |
venv_name | Any | Name of the virtual environment to use. |
use_system_python | Any | If True, use the system Python/pip instead of a venv. |
background | Any | If True, run the function detached and return the child PID immediately. |
Returns: A decorator that turns a local function into an async callable which runs remotely and returns the function's result.
VMProviderType
Inherits from: StrEnum
Enum of supported VM provider types.
Attributes
| Name | Type | Description |
|---|---|---|
LUME | Any | |
LUMIER | Any | |
CLOUD | Any | |
CLOUDV2 | Any | |
WINSANDBOX | Any | |
DOCKER | Any | |
UNKNOWN | Any |
tracing
Computer tracing functionality for recording sessions.
This module provides a Computer.tracing API inspired by Playwright's tracing functionality, allowing users to record computer interactions for debugging, training, and analysis.
ComputerTracing
Computer tracing class that records computer interactions and saves them to disk.
This class provides a flexible API for recording computer sessions with configurable options for what to record (screenshots, API calls, video, etc.).
Constructor
ComputerTracing(self, computer_instance)Attributes
| Name | Type | Description |
|---|---|---|
is_tracing | bool | Check if tracing is currently active. |
Methods
ComputerTracing.start
async def start(self, config: Optional[Dict[str, Any]] = None) -> NoneStart tracing with the specified configuration.
Parameters:
| Name | Type | Description |
|---|---|---|
config | Any | Tracing configuration dict with options: - video: bool - Record video frames (default: False) - screenshots: bool - Record screenshots (default: True) - api_calls: bool - Record API calls and results (default: True) - accessibility_tree: bool - Record accessibility tree snapshots (default: False) - metadata: bool - Record custom metadata (default: True) - name: str - Custom trace name (default: auto-generated) - path: str - Custom trace directory path (default: auto-generated) |
ComputerTracing.stop
async def stop(self, options: Optional[Dict[str, Any]] = None) -> strStop tracing and save the trace data.
Parameters:
| Name | Type | Description |
|---|---|---|
options | Any | Stop options dict with: - path: str - Custom output path for the trace archive - format: str - Output format ('zip' or 'dir', default: 'zip') |
Returns: str: Path to the saved trace file or directory
ComputerTracing.record_api_call
async def record_api_call(self, method: str, args: Dict[str, Any], result: Any = None, error: Optional[Exception] = None) -> NoneRecord an API call event.
Parameters:
| Name | Type | Description |
|---|---|---|
method | Any | The method name that was called |
args | Any | Arguments passed to the method |
result | Any | Result returned by the method |
error | Any | Exception raised by the method, if any |
ComputerTracing.record_accessibility_tree
async def record_accessibility_tree(self) -> NoneRecord the current accessibility tree if enabled.
ComputerTracing.add_metadata
async def add_metadata(self, key: str, value: Any) -> NoneAdd custom metadata to the trace.
Parameters:
| Name | Type | Description |
|---|---|---|
key | Any | Metadata key |
value | Any | Metadata value |
models
Models for computer configuration.
BaseVMProvider
Inherits from: AsyncContextManager
Base interface for VM providers.
All VM provider implementations must implement this interface.
Attributes
| Name | Type | Description |
|---|---|---|
provider_type | VMProviderType | Get the provider type. |
Methods
BaseVMProvider.get_vm
async def get_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]Get VM information by name.
Parameters:
| Name | Type | Description |
|---|---|---|
name | Any | Name of the VM to get information for |
storage | Any | Optional storage path override. If provided, this will be used instead of the provider's default storage path. |
Returns: Dictionary with VM information including status, IP address, etc.
BaseVMProvider.list_vms
async def list_vms(self) -> ListVMsResponseList all available VMs.
Returns: ListVMsResponse: A list of minimal VM objects as defined in computer.providers.types.MinimalVM.
BaseVMProvider.run_vm
async def run_vm(self, image: str, name: str, run_opts: Dict[str, Any], storage: Optional[str] = None) -> Dict[str, Any]Run a VM by name with the given options.
Parameters:
| Name | Type | Description |
|---|---|---|
image | Any | Name/tag of the image to use |
name | Any | Name of the VM to run |
run_opts | Any | Dictionary of run options (memory, cpu, etc.) |
storage | Any | Optional storage path override. If provided, this will be used instead of the provider's default storage path. |
Returns: Dictionary with VM run status and information
BaseVMProvider.stop_vm
async def stop_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]Stop a VM by name.
Parameters:
| Name | Type | Description |
|---|---|---|
name | Any | Name of the VM to stop |
storage | Any | Optional storage path override. If provided, this will be used instead of the provider's default storage path. |
Returns: Dictionary with VM stop status and information
BaseVMProvider.restart_vm
async def restart_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]Restart a VM by name.
Parameters:
| Name | Type | Description |
|---|---|---|
name | Any | Name of the VM to restart |
storage | Any | Optional storage path override. If provided, this will be used instead of the provider's default storage path. |
Returns: Dictionary with VM restart status and information
BaseVMProvider.update_vm
async def update_vm(self, name: str, update_opts: Dict[str, Any], storage: Optional[str] = None) -> Dict[str, Any]Update VM configuration.
Parameters:
| Name | Type | Description |
|---|---|---|
name | Any | Name of the VM to update |
update_opts | Any | Dictionary of update options (memory, cpu, etc.) |
storage | Any | Optional storage path override. If provided, this will be used instead of the provider's default storage path. |
Returns: Dictionary with VM update status and information
BaseVMProvider.get_ip
async def get_ip(self, name: str, storage: Optional[str] = None, retry_delay: int = 2) -> strGet the IP address of a VM, waiting indefinitely until it's available.
Parameters:
| Name | Type | Description |
|---|---|---|
name | Any | Name of the VM to get the IP for |
storage | Any | Optional storage path override. If provided, this will be used instead of the provider's default storage path. |
retry_delay | Any | Delay between retries in seconds (default: 2) |
Returns: IP address of the VM when it becomes available
Display
Display configuration.
Constructor
Display(self, width: int, height: int) -> NoneAttributes
| Name | Type | Description |
|---|---|---|
width | int | |
height | int |
Image
VM image configuration.
Constructor
Image(self, image: str, tag: str, name: str) -> NoneAttributes
| Name | Type | Description |
|---|---|---|
image | str | |
tag | str | |
name | str |
Computer
Computer configuration.
Constructor
Computer(self, image: str, tag: str, name: str, display: Display, memory: str, cpu: str, vm_provider: Optional[BaseVMProvider] = None) -> NoneAttributes
| Name | Type | Description |
|---|---|---|
image | str | |
tag | str | |
name | str | |
display | Display | |
memory | str | |
cpu | str | |
vm_provider | Optional[BaseVMProvider] |
Methods
Computer.get_ip
async def get_ip(self) -> Optional[str]Get the IP address of the VM.
diorama_computer
Key
Inherits from: Enum
Keyboard keys that can be used with press_key.
These key names follow a consistent cross-platform keyboard key naming convention.
Attributes
| Name | Type | Description |
|---|---|---|
PAGE_DOWN | Any | |
PAGE_UP | Any | |
HOME | Any | |
END | Any | |
LEFT | Any | |
RIGHT | Any | |
UP | Any | |
DOWN | Any | |
RETURN | Any | |
ENTER | Any | |
ESCAPE | Any | |
ESC | Any | |
TAB | Any | |
SPACE | Any | |
BACKSPACE | Any | |
DELETE | Any | |
ALT | Any | |
CTRL | Any | |
SHIFT | Any | |
WIN | Any | |
COMMAND | Any | |
OPTION | Any | |
F1 | Any | |
F2 | Any | |
F3 | Any | |
F4 | Any | |
F5 | Any | |
F6 | Any | |
F7 | Any | |
F8 | Any | |
F9 | Any | |
F10 | Any | |
F11 | Any | |
F12 | Any |
Methods
Key.from_string
def from_string(cls, key: str) -> Key | strConvert a string key name to a Key enum value.
Parameters:
| Name | Type | Description |
|---|---|---|
key | Any | String key name to convert |
Returns: Key enum value if the string matches a known key, otherwise returns the original string for single character keys
DioramaComputer
A Computer-compatible proxy for Diorama that sends commands over the ComputerInterface.
Constructor
DioramaComputer(self, computer, apps)Attributes
| Name | Type | Description |
|---|---|---|
computer | Any | |
apps | Any | |
interface | Any |
Methods
DioramaComputer.run
async def run(self)Initialize and run the DioramaComputer if not already initialized.
Returns: self: The DioramaComputer instance
DioramaComputerInterface
Diorama Interface proxy that sends diorama_cmds via the Computer's interface.
Constructor
DioramaComputerInterface(self, computer, apps)Attributes
| Name | Type | Description |
|---|---|---|
computer | Any | |
apps | Any |
Methods
DioramaComputerInterface.screenshot
async def screenshot(self, as_bytes = True)Take a screenshot of the diorama scene.
Parameters:
| Name | Type | Description |
|---|---|---|
as_bytes | bool | If True, return image as bytes; if False, return PIL Image object |
Returns: bytes or PIL.Image: Screenshot data in the requested format
DioramaComputerInterface.get_screen_size
async def get_screen_size(self)Get the dimensions of the diorama scene.
Returns: dict: Dictionary containing 'width' and 'height' keys with pixel dimensions
DioramaComputerInterface.move_cursor
async def move_cursor(self, x, y)Move the cursor to the specified coordinates.
Parameters:
| Name | Type | Description |
|---|---|---|
x | int | X coordinate to move cursor to |
y | int | Y coordinate to move cursor to |
DioramaComputerInterface.left_click
async def left_click(self, x = None, y = None)Perform a left mouse click at the specified coordinates or current cursor position.
Parameters:
| Name | Type | Description |
|---|---|---|
x | int, optional | X coordinate to click at. If None, clicks at current cursor position |
y | int, optional | Y coordinate to click at. If None, clicks at current cursor position |
DioramaComputerInterface.right_click
async def right_click(self, x = None, y = None)Perform a right mouse click at the specified coordinates or current cursor position.
Parameters:
| Name | Type | Description |
|---|---|---|
x | int, optional | X coordinate to click at. If None, clicks at current cursor position |
y | int, optional | Y coordinate to click at. If None, clicks at current cursor position |
DioramaComputerInterface.double_click
async def double_click(self, x = None, y = None)Perform a double mouse click at the specified coordinates or current cursor position.
Parameters:
| Name | Type | Description |
|---|---|---|
x | int, optional | X coordinate to double-click at. If None, clicks at current cursor position |
y | int, optional | Y coordinate to double-click at. If None, clicks at current cursor position |
DioramaComputerInterface.scroll_up
async def scroll_up(self, clicks = 1)Scroll up by the specified number of clicks.
Parameters:
| Name | Type | Description |
|---|---|---|
clicks | int | Number of scroll clicks to perform upward. Defaults to 1 |
DioramaComputerInterface.scroll_down
async def scroll_down(self, clicks = 1)Scroll down by the specified number of clicks.
Parameters:
| Name | Type | Description |
|---|---|---|
clicks | int | Number of scroll clicks to perform downward. Defaults to 1 |
DioramaComputerInterface.drag_to
async def drag_to(self, x, y, duration = 0.5)Drag from the current cursor position to the specified coordinates.
Parameters:
| Name | Type | Description |
|---|---|---|
x | int | X coordinate to drag to |
y | int | Y coordinate to drag to |
duration | float | Duration of the drag operation in seconds. Defaults to 0.5 |
DioramaComputerInterface.get_cursor_position
async def get_cursor_position(self)Get the current cursor position.
Returns: dict: Dictionary containing the current cursor coordinates
DioramaComputerInterface.type_text
async def type_text(self, text)Type the specified text at the current cursor position.
Parameters:
| Name | Type | Description |
|---|---|---|
text | str | The text to type |
DioramaComputerInterface.press_key
async def press_key(self, key)Press a single key.
Parameters:
| Name | Type | Description |
|---|---|---|
key | Any | The key to press |
DioramaComputerInterface.hotkey
async def hotkey(self, keys = ())Press multiple keys simultaneously as a hotkey combination.
Raises:
ValueError- If any key is not a Key enum or string type
DioramaComputerInterface.to_screen_coordinates
async def to_screen_coordinates(self, x, y)Convert coordinates to screen coordinates.
Parameters:
| Name | Type | Description |
|---|---|---|
x | int | X coordinate to convert |
y | int | Y coordinate to convert |
Returns: dict: Dictionary containing the converted screen coordinates
helpers
Helper functions and decorators for the Computer module.
DependencyInfo
Inherits from: TypedDict
Attributes
| Name | Type | Description |
|---|---|---|
import_statements | List[str] | |
definitions | List[tuple[str, Any]] |
set_default_computer
def set_default_computer(computer: Any) -> NoneSet the default computer instance to be used by the remote decorator.
Parameters:
| Name | Type | Description |
|---|---|---|
computer | Any | The computer instance to use as default |
sandboxed
def sandboxed(venv_name: str = 'default', computer: str = 'default', max_retries: int = 3) -> Callable[[Callable[P, R]], Callable[P, Awaitable[R]]]Decorator that wraps a function to be executed remotely via computer.venv_exec
The function is automatically analyzed for dependencies (imports, helper functions, constants, etc.) and reconstructed with all necessary code in the remote sandbox.
Parameters:
| Name | Type | Description |
|---|---|---|
venv_name | Any | Name of the virtual environment to execute in |
computer | Any | The computer instance to use, or "default" to use the globally set default |
max_retries | Any | Maximum number of retries for the remote execution |
generate_source_code
def generate_source_code(func: FunctionType) -> strGenerate complete source code for a function with all dependencies.
Parameters:
| Name | Type | Description |
|---|---|---|
func | Any | The function to generate source code for |
Returns: Complete Python source code as a string
interface
Interface package for Computer SDK.
BaseComputerInterface
Inherits from: ABC
Base class for computer control interfaces.
Constructor
BaseComputerInterface(self, ip_address: str, username: str = 'lume', password: str = 'lume', api_key: Optional[str] = None, vm_name: Optional[str] = None)Attributes
| Name | Type | Description |
|---|---|---|
ip_address | Any | |
username | Any | |
password | Any | |
api_key | Any | |
vm_name | Any | |
logger | Any | |
delay | float |
Methods
BaseComputerInterface.wait_for_ready
async def wait_for_ready(self, timeout: int = 60) -> NoneWait for interface to be ready.
Parameters:
| Name | Type | Description |
|---|---|---|
timeout | Any | Maximum time to wait in seconds |
Raises:
TimeoutError- If interface is not ready within timeout
BaseComputerInterface.close
def close(self) -> NoneClose the interface connection.
BaseComputerInterface.force_close
def force_close(self) -> NoneForce close the interface connection.
By default, this just calls close(), but subclasses can override to provide more forceful cleanup.
BaseComputerInterface.mouse_down
async def mouse_down(self, x: Optional[int] = None, y: Optional[int] = None, button: MouseButton = 'left', delay: Optional[float] = None) -> NonePress and hold a mouse button.
Parameters:
| Name | Type | Description |
|---|---|---|
x | Any | X coordinate to press at. If None, uses current cursor position. |
y | Any | Y coordinate to press at. If None, uses current cursor position. |
button | Any | Mouse button to press ('left', 'middle', 'right'). |
delay | Any | Optional delay in seconds after the action |
BaseComputerInterface.mouse_up
async def mouse_up(self, x: Optional[int] = None, y: Optional[int] = None, button: MouseButton = 'left', delay: Optional[float] = None) -> NoneRelease a mouse button.
Parameters:
| Name | Type | Description |
|---|---|---|
x | Any | X coordinate to release at. If None, uses current cursor position. |
y | Any | Y coordinate to release at. If None, uses current cursor position. |
button | Any | Mouse button to release ('left', 'middle', 'right'). |
delay | Any | Optional delay in seconds after the action |
BaseComputerInterface.left_click
async def left_click(self, x: Optional[int] = None, y: Optional[int] = None, delay: Optional[float] = None) -> NonePerform a left mouse button click.
Parameters:
| Name | Type | Description |
|---|---|---|
x | Any | X coordinate to click at. If None, uses current cursor position. |
y | Any | Y coordinate to click at. If None, uses current cursor position. |
delay | Any | Optional delay in seconds after the action |
BaseComputerInterface.right_click
async def right_click(self, x: Optional[int] = None, y: Optional[int] = None, delay: Optional[float] = None) -> NonePerform a right mouse button click.
Parameters:
| Name | Type | Description |
|---|---|---|
x | Any | X coordinate to click at. If None, uses current cursor position. |
y | Any | Y coordinate to click at. If None, uses current cursor position. |
delay | Any | Optional delay in seconds after the action |
BaseComputerInterface.double_click
async def double_click(self, x: Optional[int] = None, y: Optional[int] = None, delay: Optional[float] = None) -> NonePerform a double left mouse button click.
Parameters:
| Name | Type | Description |
|---|---|---|
x | Any | X coordinate to double-click at. If None, uses current cursor position. |
y | Any | Y coordinate to double-click at. If None, uses current cursor position. |
delay | Any | Optional delay in seconds after the action |
BaseComputerInterface.move_cursor
async def move_cursor(self, x: int, y: int, delay: Optional[float] = None) -> NoneMove the cursor to the specified screen coordinates.
Parameters:
| Name | Type | Description |
|---|---|---|
x | Any | X coordinate to move cursor to. |
y | Any | Y coordinate to move cursor to. |
delay | Any | Optional delay in seconds after the action |
BaseComputerInterface.drag_to
async def drag_to(self, x: int, y: int, button: str = 'left', duration: float = 0.5, delay: Optional[float] = None) -> NoneDrag from current position to specified coordinates.
Parameters:
| Name | Type | Description |
|---|---|---|
x | Any | The x coordinate to drag to |
y | Any | The y coordinate to drag to |
button | Any | The mouse button to use ('left', 'middle', 'right') |
duration | Any | How long the drag should take in seconds |
delay | Any | Optional delay in seconds after the action |
BaseComputerInterface.drag
async def drag(self, path: List[Tuple[int, int]], button: str = 'left', duration: float = 0.5, delay: Optional[float] = None) -> NoneDrag the cursor along a path of coordinates.
Parameters:
| Name | Type | Description |
|---|---|---|
path | Any | List of (x, y) coordinate tuples defining the drag path |
button | Any | The mouse button to use ('left', 'middle', 'right') |
duration | Any | Total time in seconds that the drag operation should take |
delay | Any | Optional delay in seconds after the action |
BaseComputerInterface.key_down
async def key_down(self, key: str, delay: Optional[float] = None) -> NonePress and hold a key.
Parameters:
| Name | Type | Description |
|---|---|---|
key | Any | The key to press and hold (e.g., 'a', 'shift', 'ctrl'). |
delay | Any | Optional delay in seconds after the action. |
BaseComputerInterface.key_up
async def key_up(self, key: str, delay: Optional[float] = None) -> NoneRelease a previously pressed key.
Parameters:
| Name | Type | Description |
|---|---|---|
key | Any | The key to release (e.g., 'a', 'shift', 'ctrl'). |
delay | Any | Optional delay in seconds after the action. |
BaseComputerInterface.type_text
async def type_text(self, text: str, delay: Optional[float] = None) -> NoneType the specified text string.
Parameters:
| Name | Type | Description |
|---|---|---|
text | Any | The text string to type. |
delay | Any | Optional delay in seconds after the action. |
BaseComputerInterface.press_key
async def press_key(self, key: str, delay: Optional[float] = None) -> NonePress and release a single key.
Parameters:
| Name | Type | Description |
|---|---|---|
key | Any | The key to press (e.g., 'a', 'enter', 'escape'). |
delay | Any | Optional delay in seconds after the action. |
BaseComputerInterface.hotkey
async def hotkey(self, keys: str = (), delay: Optional[float] = None) -> NonePress multiple keys simultaneously (keyboard shortcut).
Parameters:
| Name | Type | Description |
|---|---|---|
delay | Any | Optional delay in seconds after the action. |
BaseComputerInterface.scroll
async def scroll(self, x: int, y: int, delay: Optional[float] = None) -> NoneScroll the mouse wheel by specified amounts.
Parameters:
| Name | Type | Description |
|---|---|---|
x | Any | Horizontal scroll amount (positive = right, negative = left). |
y | Any | Vertical scroll amount (positive = up, negative = down). |
delay | Any | Optional delay in seconds after the action. |
BaseComputerInterface.scroll_down
async def scroll_down(self, clicks: int = 1, delay: Optional[float] = None) -> NoneScroll down by the specified number of clicks.
Parameters:
| Name | Type | Description |
|---|---|---|
clicks | Any | Number of scroll clicks to perform downward. |
delay | Any | Optional delay in seconds after the action. |
BaseComputerInterface.scroll_up
async def scroll_up(self, clicks: int = 1, delay: Optional[float] = None) -> NoneScroll up by the specified number of clicks.
Parameters:
| Name | Type | Description |
|---|---|---|
clicks | Any | Number of scroll clicks to perform upward. |
delay | Any | Optional delay in seconds after the action. |
BaseComputerInterface.screenshot
async def screenshot(self) -> bytesTake a screenshot.
Returns: Raw bytes of the screenshot image
BaseComputerInterface.get_screen_size
async def get_screen_size(self) -> Dict[str, int]Get the screen dimensions.
Returns: Dict with 'width' and 'height' keys
BaseComputerInterface.get_cursor_position
async def get_cursor_position(self) -> Dict[str, int]Get the current cursor position on screen.
Returns: Dict with 'x' and 'y' keys containing cursor coordinates.
BaseComputerInterface.copy_to_clipboard
async def copy_to_clipboard(self) -> strGet the current clipboard content.
Returns: The text content currently stored in the clipboard.
BaseComputerInterface.set_clipboard
async def set_clipboard(self, text: str) -> NoneSet the clipboard content to the specified text.
Parameters:
| Name | Type | Description |
|---|---|---|
text | Any | The text to store in the clipboard. |
BaseComputerInterface.file_exists
async def file_exists(self, path: str) -> boolCheck if a file exists at the specified path.
Parameters:
| Name | Type | Description |
|---|---|---|
path | Any | The file path to check. |
Returns: True if the file exists, False otherwise.
BaseComputerInterface.directory_exists
async def directory_exists(self, path: str) -> boolCheck if a directory exists at the specified path.
Parameters:
| Name | Type | Description |
|---|---|---|
path | Any | The directory path to check. |
Returns: True if the directory exists, False otherwise.
BaseComputerInterface.list_dir
async def list_dir(self, path: str) -> List[str]List the contents of a directory.
Parameters:
| Name | Type | Description |
|---|---|---|
path | Any | The directory path to list. |
Returns: List of file and directory names in the specified directory.
BaseComputerInterface.read_text
async def read_text(self, path: str) -> strRead the text contents of a file.
Parameters:
| Name | Type | Description |
|---|---|---|
path | Any | The file path to read from. |
Returns: The text content of the file.
BaseComputerInterface.write_text
async def write_text(self, path: str, content: str) -> NoneWrite text content to a file.
Parameters:
| Name | Type | Description |
|---|---|---|
path | Any | The file path to write to. |
content | Any | The text content to write. |
BaseComputerInterface.read_bytes
async def read_bytes(self, path: str, offset: int = 0, length: Optional[int] = None) -> bytesRead file binary contents with optional seeking support.
Parameters:
| Name | Type | Description |
|---|---|---|
path | Any | Path to the file |
offset | Any | Byte offset to start reading from (default: 0) |
length | Any | Number of bytes to read (default: None for entire file) |
BaseComputerInterface.write_bytes
async def write_bytes(self, path: str, content: bytes) -> NoneWrite binary content to a file.
Parameters:
| Name | Type | Description |
|---|---|---|
path | Any | The file path to write to. |
content | Any | The binary content to write. |
BaseComputerInterface.delete_file
async def delete_file(self, path: str) -> NoneDelete a file at the specified path.
Parameters:
| Name | Type | Description |
|---|---|---|
path | Any | The file path to delete. |
BaseComputerInterface.create_dir
async def create_dir(self, path: str) -> NoneCreate a directory at the specified path.
Parameters:
| Name | Type | Description |
|---|---|---|
path | Any | The directory path to create. |
BaseComputerInterface.delete_dir
async def delete_dir(self, path: str) -> NoneDelete a directory at the specified path.
Parameters:
| Name | Type | Description |
|---|---|---|
path | Any | The directory path to delete. |
BaseComputerInterface.get_file_size
async def get_file_size(self, path: str) -> intGet the size of a file in bytes.
Parameters:
| Name | Type | Description |
|---|---|---|
path | Any | The file path to get the size of. |
Returns: The size of the file in bytes.
BaseComputerInterface.get_desktop_environment
async def get_desktop_environment(self) -> strGet the current desktop environment.
Returns: The name of the current desktop environment.
BaseComputerInterface.set_wallpaper
async def set_wallpaper(self, path: str) -> NoneSet the desktop wallpaper to the specified path.
Parameters:
| Name | Type | Description |
|---|---|---|
path | Any | The file path to set as wallpaper |
BaseComputerInterface.open
async def open(self, target: str) -> NoneOpen a target using the system's default handler.
Typically opens files, folders, or URLs with the associated application.
Parameters:
| Name | Type | Description |
|---|---|---|
target | Any | The file path, folder path, or URL to open. |
BaseComputerInterface.launch
async def launch(self, app: str, args: List[str] | None = None) -> Optional[int]Launch an application with optional arguments.
Parameters:
| Name | Type | Description |
|---|---|---|
app | Any | The application executable or bundle identifier. |
args | Any | Optional list of arguments to pass to the application. |
Returns: Optional process ID (PID) of the launched application if available, otherwise None.
BaseComputerInterface.get_current_window_id
async def get_current_window_id(self) -> int | strGet the identifier of the currently active/focused window.
Returns: A window identifier that can be used with other window management methods.
BaseComputerInterface.get_application_windows
async def get_application_windows(self, app: str) -> List[int | str]Get all window identifiers for a specific application.
Parameters:
| Name | Type | Description |
|---|---|---|
app | Any | The application name, executable, or identifier to query. |
Returns: A list of window identifiers belonging to the specified application.
BaseComputerInterface.get_window_name
async def get_window_name(self, window_id: int | str) -> strGet the title/name of a window.
Parameters:
| Name | Type | Description |
|---|---|---|
window_id | Any | The window identifier. |
Returns: The window's title or name string.
BaseComputerInterface.get_window_size
async def get_window_size(self, window_id: int | str) -> tuple[int, int]Get the size of a window in pixels.
Parameters:
| Name | Type | Description |
|---|---|---|
window_id | Any | The window identifier. |
Returns: A tuple of (width, height) representing the window size in pixels.
BaseComputerInterface.get_window_position
async def get_window_position(self, window_id: int | str) -> tuple[int, int]Get the screen position of a window.
Parameters:
| Name | Type | Description |
|---|---|---|
window_id | Any | The window identifier. |
Returns: A tuple of (x, y) representing the window's top-left corner in screen coordinates.
BaseComputerInterface.set_window_size
async def set_window_size(self, window_id: int | str, width: int, height: int) -> NoneSet the size of a window in pixels.
Parameters:
| Name | Type | Description |
|---|---|---|
window_id | Any | The window identifier. |
width | Any | Desired width in pixels. |
height | Any | Desired height in pixels. |
BaseComputerInterface.set_window_position
async def set_window_position(self, window_id: int | str, x: int, y: int) -> NoneMove a window to a specific position on the screen.
Parameters:
| Name | Type | Description |
|---|---|---|
window_id | Any | The window identifier. |
x | Any | X coordinate for the window's top-left corner. |
y | Any | Y coordinate for the window's top-left corner. |
BaseComputerInterface.maximize_window
async def maximize_window(self, window_id: int | str) -> NoneMaximize a window.
Parameters:
| Name | Type | Description |
|---|---|---|
window_id | Any | The window identifier. |
BaseComputerInterface.minimize_window
async def minimize_window(self, window_id: int | str) -> NoneMinimize a window.
Parameters:
| Name | Type | Description |
|---|---|---|
window_id | Any | The window identifier. |
BaseComputerInterface.activate_window
async def activate_window(self, window_id: int | str) -> NoneBring a window to the foreground and focus it.
Parameters:
| Name | Type | Description |
|---|---|---|
window_id | Any | The window identifier. |
BaseComputerInterface.close_window
async def close_window(self, window_id: int | str) -> NoneClose a window.
Parameters:
| Name | Type | Description |
|---|---|---|
window_id | Any | The window identifier. |
BaseComputerInterface.get_window_title
async def get_window_title(self, window_id: int | str) -> strConvenience alias for get_window_name().
Parameters:
| Name | Type | Description |
|---|---|---|
window_id | Any | The window identifier. |
Returns: The window's title or name string.
BaseComputerInterface.window_size
async def window_size(self, window_id: int | str) -> tuple[int, int]Convenience alias for get_window_size().
Parameters:
| Name | Type | Description |
|---|---|---|
window_id | Any | The window identifier. |
Returns: A tuple of (width, height) representing the window size in pixels.
BaseComputerInterface.run_command
async def run_command(self, command: str) -> CommandResultRun shell command and return structured result.
Executes a shell command using subprocess.run with shell=True and check=False. The command is run in the target environment and captures both stdout and stderr.
Parameters:
| Name | Type | Description |
|---|---|---|
command | str | The shell command to execute |
Returns: CommandResult: A structured result containing: - stdout (str): Standard output from the command - stderr (str): Standard error from the command - returncode (int): Exit code from the command (0 indicates success)
Raises:
RuntimeError- If the command execution fails at the system level
Example:
result = await interface.run_command("ls -la")
if result.returncode == 0:
print(f"Output: {result.stdout}")
else:
print(f"Error: {result.stderr}, Exit code: {result.returncode}")BaseComputerInterface.get_accessibility_tree
async def get_accessibility_tree(self) -> DictGet the accessibility tree of the current screen.
Returns: Dict containing the hierarchical accessibility information of screen elements.
BaseComputerInterface.to_screen_coordinates
async def to_screen_coordinates(self, x: float, y: float) -> tuple[float, float]Convert screenshot coordinates to screen coordinates.
Parameters:
| Name | Type | Description |
|---|---|---|
x | Any | X coordinate in screenshot space |
y | Any | Y coordinate in screenshot space |
Returns: tuple[float, float]: (x, y) coordinates in screen space
BaseComputerInterface.to_screenshot_coordinates
async def to_screenshot_coordinates(self, x: float, y: float) -> tuple[float, float]Convert screen coordinates to screenshot coordinates.
Parameters:
| Name | Type | Description |
|---|---|---|
x | Any | X coordinate in screen space |
y | Any | Y coordinate in screen space |
Returns: tuple[float, float]: (x, y) coordinates in screenshot space
InterfaceFactory
Factory for creating OS-specific computer interfaces.
Methods
InterfaceFactory.create_interface_for_os
def create_interface_for_os(os: OSType, ip_address: str, api_port: Optional[int] = None, api_key: Optional[str] = None, vm_name: Optional[str] = None) -> BaseComputerInterfaceCreate an interface for the specified OS.
Parameters:
| Name | Type | Description |
|---|---|---|
os | Any | Operating system type ('macos', 'linux', or 'windows') |
ip_address | Any | IP address of the computer to control |
api_port | Any | Optional API port of the computer to control |
api_key | Any | Optional API key for cloud authentication |
vm_name | Any | Optional VM name for cloud authentication |
Returns: BaseComputerInterface: The appropriate interface for the OS
Raises:
ValueError- If the OS type is not supported
MacOSComputerInterface
Inherits from: GenericComputerInterface
Interface for macOS.
Constructor
MacOSComputerInterface(self, ip_address: str, username: str = 'lume', password: str = 'lume', api_key: Optional[str] = None, vm_name: Optional[str] = None, api_port: Optional[int] = None)Methods
MacOSComputerInterface.diorama_cmd
async def diorama_cmd(self, action: str, arguments: Optional[dict] = None) -> dictSend a diorama command to the server (macOS only).
Was this page helpful?