Spaces:

ibm-research
/

cuga-agent

Running

App Files Files Community

cuga-agent / docs /examples /huggingface /cuga_knowledge.md

Sami Marreed

feat: docker-v1 with optimized frontend

0646b18 1 day ago

preview code

raw

history blame contribute delete

4.13 kB

	# CUGA Knowledge Base

	A compact reference for agents answering questions about the CUGA framework.

	## Overview

	CUGA (Configurable Generalist Agent) is an open-source agent framework designed for enterprise workflows.
	It combines hybrid reasoning (API + web), tool orchestration, policy guardrails, memory, and configurable behavior patterns.

	Why it exists:
	Building robust domain-specific agents from scratch is expensive. CUGA provides a generalist core you configure with your own tools, APIs, policies, and workflows.

	---

	## Core Concepts

	### What CUGA Is

	* A planner → executor agent engine with code-generation capabilities.
	* A configurable generalist, not a domain-specific chatbot.
	* Designed for enterprise reliability, HITL support, and safe execution.
	* Modular: tools, policies, memory, and reasoning modes are all replaceable.

	### What CUGA Is Not

	* Not a single-task bot.
	* Not tied to one model or one tool framework.
	* Not opinionated on UI—can run headless, in Langflow, HF Spaces, notebooks, or scripts.

	---

	## Architecture

	### Planner

	Breaks user intent into sub-tasks; chooses strategies; checks policies.

	### Executor

	Performs steps, including dynamic code generation via the code-act mechanism.

	### Code-Act Agent

	Generates Python “glue code” to handle:

	* API calls
	* pagination
	* schema-heavy responses
	* loops & conditionals
	* data aggregation

	### Variable Store

	Holds intermediate results outside LLM context → allows large data without context flooding.

	### Task Modes

	* `api` – API tools only
	* `web` – browser extension
	* `hybrid` – both

	---

	## Capabilities

	### Core Abilities

	* Hybrid API + web automation
	* Multi-step planning & execution
	* Tool orchestration through Python, OpenAPI, LangChain, or MCP
	* Human-in-the-loop approvals
	* Configurable reasoning strategies (fast, balanced, accurate)

	### Advanced / Experimental

	* Policy-aware planning
	* Saving successful plans or code snippets
	* Early memory layer for reuse
	* Exposure of CUGA itself as a tool to other agents

	---

	## Configuration

	### What You Can Configure

	* Tools: Python functions, APIs, MCP servers, browser actions
	* Reasoning mode: fast/balanced/accurate/custom
	* Domain instructions and agent persona
	* Safety policies
	* Memory backends (optional)

	### Domain Adaptation

	Customize:

	* task prompts
	* policy objects
	* domain-specific tips for APIs
	* workflows (step templates or plan hints)

	---

	## Tools & Integrations

	### Supported Tool Types

	* OpenAPI schemas (auto-parsed)
	* Python functions / classes
	* LangChain tools
	* MCP servers
	* Browser Automation (web task mode)
	* Custom tools via simple Python wrappers

	### Ecosystem Integrations

	* Langflow: low-code visual builder, CUGA block
	* Hugging Face Spaces: interactive demo
	* Other agents: CUGA can be exposed as a tool

	---

	## Benchmarks

	### Performance

	* 🥇 #1 on AppWorld (750 tasks, 457 APIs)
	* Top-tier on WebArena, #1 from Feb–Sep 2025

	### Why It Matters

	These benchmarks validate CUGA’s:

	* generalization across real enterprise tasks
	* hybrid reasoning reliability
	* stability across thousands of workflows

	---

	## Policy & Safety

	### Policy Layer

	CUGA enforces:

	* Allowed/forbidden actions
	* Scope-of-intent classification
	* Data boundaries
	* When HITL approval is needed
	* Organizational vs. user-level policy hierarchy

	### Safety Behaviors

	* Can refuse unsafe or out-of-scope tasks
	* Can ask for clarification or approval
	* Supports auditability via logs and structured steps

	---

	## Memory

	### What CUGA Can Remember (Experimental)

	* Successful code snippets
	* Plans & execution traces
	* API schemas and patterns
	* User preferences
	* Domain documents (optional)

	### Why Memory Matters

	* Faster task repetition
	* Higher accuracy
	* Lower hallucination risk
	* Trustworthiness through predictable reuse

	---


	## Roadmap

	Planned improvements include:

	* Stronger policy governance
	* Long-term memory persistence and retrieval
	* Learning from demonstrations and prior trajectories
	* Multi-agent orchestration patterns