Spaces:
Running
Running
| # CUGA Knowledge Base | |
| *A compact reference for agents answering questions about the CUGA framework.* | |
| ## Overview | |
| **CUGA (Configurable Generalist Agent)** is an open-source agent framework designed for enterprise workflows. | |
| It combines hybrid reasoning (API + web), tool orchestration, policy guardrails, memory, and configurable behavior patterns. | |
| **Why it exists:** | |
| Building robust domain-specific agents from scratch is expensive. CUGA provides a generalist core you configure with your own tools, APIs, policies, and workflows. | |
| --- | |
| ## Core Concepts | |
| ### What CUGA Is | |
| * A **planner → executor** agent engine with code-generation capabilities. | |
| * A **configurable generalist**, not a domain-specific chatbot. | |
| * Designed for **enterprise reliability**, HITL support, and safe execution. | |
| * Modular: tools, policies, memory, and reasoning modes are all replaceable. | |
| ### What CUGA Is Not | |
| * Not a single-task bot. | |
| * Not tied to one model or one tool framework. | |
| * Not opinionated on UI—can run headless, in Langflow, HF Spaces, notebooks, or scripts. | |
| --- | |
| ## Architecture | |
| ### Planner | |
| Breaks user intent into sub-tasks; chooses strategies; checks policies. | |
| ### Executor | |
| Performs steps, including dynamic code generation via the **code-act** mechanism. | |
| ### Code-Act Agent | |
| Generates Python “glue code” to handle: | |
| * API calls | |
| * pagination | |
| * schema-heavy responses | |
| * loops & conditionals | |
| * data aggregation | |
| ### Variable Store | |
| Holds intermediate results **outside** LLM context → allows large data without context flooding. | |
| ### Task Modes | |
| * `api` – API tools only | |
| * `web` – browser extension | |
| * `hybrid` – both | |
| --- | |
| ## Capabilities | |
| ### Core Abilities | |
| * Hybrid API + web automation | |
| * Multi-step planning & execution | |
| * Tool orchestration through Python, OpenAPI, LangChain, or MCP | |
| * Human-in-the-loop approvals | |
| * Configurable reasoning strategies (fast, balanced, accurate) | |
| ### Advanced / Experimental | |
| * Policy-aware planning | |
| * Saving successful plans or code snippets | |
| * Early memory layer for reuse | |
| * Exposure of CUGA itself as a tool to other agents | |
| --- | |
| ## Configuration | |
| ### What You Can Configure | |
| * Tools: Python functions, APIs, MCP servers, browser actions | |
| * Reasoning mode: fast/balanced/accurate/custom | |
| * Domain instructions and agent persona | |
| * Safety policies | |
| * Memory backends (optional) | |
| ### Domain Adaptation | |
| Customize: | |
| * task prompts | |
| * policy objects | |
| * domain-specific tips for APIs | |
| * workflows (step templates or plan hints) | |
| --- | |
| ## Tools & Integrations | |
| ### Supported Tool Types | |
| * **OpenAPI** schemas (auto-parsed) | |
| * **Python functions / classes** | |
| * **LangChain tools** | |
| * **MCP servers** | |
| * **Browser Automation** (web task mode) | |
| * Custom tools via simple Python wrappers | |
| ### Ecosystem Integrations | |
| * **Langflow**: low-code visual builder, CUGA block | |
| * **Hugging Face Spaces**: interactive demo | |
| * **Other agents**: CUGA can be exposed as a tool | |
| --- | |
| ## Benchmarks | |
| ### Performance | |
| * **🥇 #1 on AppWorld** (750 tasks, 457 APIs) | |
| * **Top-tier on WebArena**, #1 from Feb–Sep 2025 | |
| ### Why It Matters | |
| These benchmarks validate CUGA’s: | |
| * generalization across real enterprise tasks | |
| * hybrid reasoning reliability | |
| * stability across thousands of workflows | |
| --- | |
| ## Policy & Safety | |
| ### Policy Layer | |
| CUGA enforces: | |
| * Allowed/forbidden actions | |
| * Scope-of-intent classification | |
| * Data boundaries | |
| * When HITL approval is needed | |
| * Organizational vs. user-level policy hierarchy | |
| ### Safety Behaviors | |
| * Can refuse unsafe or out-of-scope tasks | |
| * Can ask for clarification or approval | |
| * Supports auditability via logs and structured steps | |
| --- | |
| ## Memory | |
| ### What CUGA Can Remember (Experimental) | |
| * Successful code snippets | |
| * Plans & execution traces | |
| * API schemas and patterns | |
| * User preferences | |
| * Domain documents (optional) | |
| ### Why Memory Matters | |
| * Faster task repetition | |
| * Higher accuracy | |
| * Lower hallucination risk | |
| * Trustworthiness through predictable reuse | |
| --- | |
| ## Roadmap | |
| Planned improvements include: | |
| * Stronger policy governance | |
| * Long-term memory persistence and retrieval | |
| * Learning from demonstrations and prior trajectories | |
| * Multi-agent orchestration patterns | |