File size: 4,132 Bytes
0646b18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
# CUGA Knowledge Base

*A compact reference for agents answering questions about the CUGA framework.*

## Overview

**CUGA (Configurable Generalist Agent)** is an open-source agent framework designed for enterprise workflows.
It combines hybrid reasoning (API + web), tool orchestration, policy guardrails, memory, and configurable behavior patterns.

**Why it exists:**
Building robust domain-specific agents from scratch is expensive. CUGA provides a generalist core you configure with your own tools, APIs, policies, and workflows.

---

## Core Concepts

### What CUGA Is

* A **planner → executor** agent engine with code-generation capabilities.
* A **configurable generalist**, not a domain-specific chatbot.
* Designed for **enterprise reliability**, HITL support, and safe execution.
* Modular: tools, policies, memory, and reasoning modes are all replaceable.

### What CUGA Is Not

* Not a single-task bot.
* Not tied to one model or one tool framework.
* Not opinionated on UI—can run headless, in Langflow, HF Spaces, notebooks, or scripts.

---

## Architecture

### Planner

Breaks user intent into sub-tasks; chooses strategies; checks policies.

### Executor

Performs steps, including dynamic code generation via the **code-act** mechanism.

### Code-Act Agent

Generates Python “glue code” to handle:

* API calls
* pagination
* schema-heavy responses
* loops & conditionals
* data aggregation

### Variable Store

Holds intermediate results **outside** LLM context → allows large data without context flooding.

### Task Modes

* `api` – API tools only
* `web` – browser extension
* `hybrid` – both

---

## Capabilities

### Core Abilities

* Hybrid API + web automation
* Multi-step planning & execution
* Tool orchestration through Python, OpenAPI, LangChain, or MCP
* Human-in-the-loop approvals
* Configurable reasoning strategies (fast, balanced, accurate)

### Advanced / Experimental

* Policy-aware planning
* Saving successful plans or code snippets
* Early memory layer for reuse
* Exposure of CUGA itself as a tool to other agents

---

## Configuration

### What You Can Configure

* Tools: Python functions, APIs, MCP servers, browser actions
* Reasoning mode: fast/balanced/accurate/custom
* Domain instructions and agent persona
* Safety policies
* Memory backends (optional)

### Domain Adaptation

Customize:

* task prompts
* policy objects
* domain-specific tips for APIs
* workflows (step templates or plan hints)

---

## Tools & Integrations

### Supported Tool Types

* **OpenAPI** schemas (auto-parsed)
* **Python functions / classes**
* **LangChain tools**
* **MCP servers**
* **Browser Automation** (web task mode)
* Custom tools via simple Python wrappers

### Ecosystem Integrations

* **Langflow**: low-code visual builder, CUGA block
* **Hugging Face Spaces**: interactive demo
* **Other agents**: CUGA can be exposed as a tool

---

## Benchmarks

### Performance

* **🥇 #1 on AppWorld** (750 tasks, 457 APIs)
* **Top-tier on WebArena**, #1 from Feb–Sep 2025

### Why It Matters

These benchmarks validate CUGA’s:

* generalization across real enterprise tasks
* hybrid reasoning reliability
* stability across thousands of workflows

---

## Policy & Safety

### Policy Layer

CUGA enforces:

* Allowed/forbidden actions
* Scope-of-intent classification
* Data boundaries
* When HITL approval is needed
* Organizational vs. user-level policy hierarchy

### Safety Behaviors

* Can refuse unsafe or out-of-scope tasks
* Can ask for clarification or approval
* Supports auditability via logs and structured steps

---

## Memory

### What CUGA Can Remember (Experimental)

* Successful code snippets
* Plans & execution traces
* API schemas and patterns
* User preferences
* Domain documents (optional)

### Why Memory Matters

* Faster task repetition
* Higher accuracy
* Lower hallucination risk
* Trustworthiness through predictable reuse

---


## Roadmap

Planned improvements include:

* Stronger policy governance 
* Long-term memory persistence and retrieval
* Learning from demonstrations and prior trajectories
* Multi-agent orchestration patterns