cuga-agent / docs /examples /huggingface /cuga_knowledge.md
Sami Marreed
feat: docker-v1 with optimized frontend
0646b18
# CUGA Knowledge Base
*A compact reference for agents answering questions about the CUGA framework.*
## Overview
**CUGA (Configurable Generalist Agent)** is an open-source agent framework designed for enterprise workflows.
It combines hybrid reasoning (API + web), tool orchestration, policy guardrails, memory, and configurable behavior patterns.
**Why it exists:**
Building robust domain-specific agents from scratch is expensive. CUGA provides a generalist core you configure with your own tools, APIs, policies, and workflows.
---
## Core Concepts
### What CUGA Is
* A **planner → executor** agent engine with code-generation capabilities.
* A **configurable generalist**, not a domain-specific chatbot.
* Designed for **enterprise reliability**, HITL support, and safe execution.
* Modular: tools, policies, memory, and reasoning modes are all replaceable.
### What CUGA Is Not
* Not a single-task bot.
* Not tied to one model or one tool framework.
* Not opinionated on UI—can run headless, in Langflow, HF Spaces, notebooks, or scripts.
---
## Architecture
### Planner
Breaks user intent into sub-tasks; chooses strategies; checks policies.
### Executor
Performs steps, including dynamic code generation via the **code-act** mechanism.
### Code-Act Agent
Generates Python “glue code” to handle:
* API calls
* pagination
* schema-heavy responses
* loops & conditionals
* data aggregation
### Variable Store
Holds intermediate results **outside** LLM context → allows large data without context flooding.
### Task Modes
* `api` – API tools only
* `web` – browser extension
* `hybrid` – both
---
## Capabilities
### Core Abilities
* Hybrid API + web automation
* Multi-step planning & execution
* Tool orchestration through Python, OpenAPI, LangChain, or MCP
* Human-in-the-loop approvals
* Configurable reasoning strategies (fast, balanced, accurate)
### Advanced / Experimental
* Policy-aware planning
* Saving successful plans or code snippets
* Early memory layer for reuse
* Exposure of CUGA itself as a tool to other agents
---
## Configuration
### What You Can Configure
* Tools: Python functions, APIs, MCP servers, browser actions
* Reasoning mode: fast/balanced/accurate/custom
* Domain instructions and agent persona
* Safety policies
* Memory backends (optional)
### Domain Adaptation
Customize:
* task prompts
* policy objects
* domain-specific tips for APIs
* workflows (step templates or plan hints)
---
## Tools & Integrations
### Supported Tool Types
* **OpenAPI** schemas (auto-parsed)
* **Python functions / classes**
* **LangChain tools**
* **MCP servers**
* **Browser Automation** (web task mode)
* Custom tools via simple Python wrappers
### Ecosystem Integrations
* **Langflow**: low-code visual builder, CUGA block
* **Hugging Face Spaces**: interactive demo
* **Other agents**: CUGA can be exposed as a tool
---
## Benchmarks
### Performance
* **🥇 #1 on AppWorld** (750 tasks, 457 APIs)
* **Top-tier on WebArena**, #1 from Feb–Sep 2025
### Why It Matters
These benchmarks validate CUGA’s:
* generalization across real enterprise tasks
* hybrid reasoning reliability
* stability across thousands of workflows
---
## Policy & Safety
### Policy Layer
CUGA enforces:
* Allowed/forbidden actions
* Scope-of-intent classification
* Data boundaries
* When HITL approval is needed
* Organizational vs. user-level policy hierarchy
### Safety Behaviors
* Can refuse unsafe or out-of-scope tasks
* Can ask for clarification or approval
* Supports auditability via logs and structured steps
---
## Memory
### What CUGA Can Remember (Experimental)
* Successful code snippets
* Plans & execution traces
* API schemas and patterns
* User preferences
* Domain documents (optional)
### Why Memory Matters
* Faster task repetition
* Higher accuracy
* Lower hallucination risk
* Trustworthiness through predictable reuse
---
## Roadmap
Planned improvements include:
* Stronger policy governance
* Long-term memory persistence and retrieval
* Learning from demonstrations and prior trajectories
* Multi-agent orchestration patterns