Spaces:

ibm-research
/

cuga-agent

Running

File size: 4,132 Bytes

0646b18

# CUGA Knowledge Base

*A compact reference for agents answering questions about the CUGA framework.*

## Overview

**CUGA (Configurable Generalist Agent)** is an open-source agent framework designed for enterprise workflows.
It combines hybrid reasoning (API + web), tool orchestration, policy guardrails, memory, and configurable behavior patterns.

**Why it exists:**
Building robust domain-specific agents from scratch is expensive. CUGA provides a generalist core you configure with your own tools, APIs, policies, and workflows.

---

## Core Concepts

### What CUGA Is

* A **planner → executor** agent engine with code-generation capabilities.
* A **configurable generalist**, not a domain-specific chatbot.
* Designed for **enterprise reliability**, HITL support, and safe execution.
* Modular: tools, policies, memory, and reasoning modes are all replaceable.

### What CUGA Is Not

* Not a single-task bot.
* Not tied to one model or one tool framework.
* Not opinionated on UI—can run headless, in Langflow, HF Spaces, notebooks, or scripts.

---

## Architecture

### Planner

Breaks user intent into sub-tasks; chooses strategies; checks policies.

### Executor

Performs steps, including dynamic code generation via the **code-act** mechanism.

### Code-Act Agent

Generates Python “glue code” to handle:

* API calls
* pagination
* schema-heavy responses
* loops & conditionals
* data aggregation

### Variable Store

Holds intermediate results **outside** LLM context → allows large data without context flooding.

### Task Modes

* `api` – API tools only
* `web` – browser extension
* `hybrid` – both

---

## Capabilities

### Core Abilities

* Hybrid API + web automation
* Multi-step planning & execution
* Tool orchestration through Python, OpenAPI, LangChain, or MCP
* Human-in-the-loop approvals
* Configurable reasoning strategies (fast, balanced, accurate)

### Advanced / Experimental

* Policy-aware planning
* Saving successful plans or code snippets
* Early memory layer for reuse
* Exposure of CUGA itself as a tool to other agents

---

## Configuration

### What You Can Configure

* Tools: Python functions, APIs, MCP servers, browser actions
* Reasoning mode: fast/balanced/accurate/custom
* Domain instructions and agent persona
* Safety policies
* Memory backends (optional)

### Domain Adaptation

Customize:

* task prompts
* policy objects
* domain-specific tips for APIs
* workflows (step templates or plan hints)

---

## Tools & Integrations

### Supported Tool Types

* **OpenAPI** schemas (auto-parsed)
* **Python functions / classes**
* **LangChain tools**
* **MCP servers**
* **Browser Automation** (web task mode)
* Custom tools via simple Python wrappers

### Ecosystem Integrations

* **Langflow**: low-code visual builder, CUGA block
* **Hugging Face Spaces**: interactive demo
* **Other agents**: CUGA can be exposed as a tool

---

## Benchmarks

### Performance

* **🥇 #1 on AppWorld** (750 tasks, 457 APIs)
* **Top-tier on WebArena**, #1 from Feb–Sep 2025

### Why It Matters

These benchmarks validate CUGA’s:

* generalization across real enterprise tasks
* hybrid reasoning reliability
* stability across thousands of workflows

---

## Policy & Safety

### Policy Layer

CUGA enforces:

* Allowed/forbidden actions
* Scope-of-intent classification
* Data boundaries
* When HITL approval is needed
* Organizational vs. user-level policy hierarchy

### Safety Behaviors

* Can refuse unsafe or out-of-scope tasks
* Can ask for clarification or approval
* Supports auditability via logs and structured steps

---

## Memory

### What CUGA Can Remember (Experimental)

* Successful code snippets
* Plans & execution traces
* API schemas and patterns
* User preferences
* Domain documents (optional)

### Why Memory Matters

* Faster task repetition
* Higher accuracy
* Lower hallucination risk
* Trustworthiness through predictable reuse

---


## Roadmap

Planned improvements include:

* Stronger policy governance 
* Long-term memory persistence and retrieval
* Learning from demonstrations and prior trajectories
* Multi-agent orchestration patterns