Spaces:

ibm-research
/

cuga-agent

Running

App Files Files Community

cuga-agent / src /system_tests /profiling /QUICK_START.md

Sami Marreed

feat: docker-v1 with optimized frontend

0646b18 3 days ago

preview code

raw

history blame contribute delete

4.09 kB

Profiling Quick Start

1. Set Environment Variables

export LANGFUSE_PUBLIC_KEY="pk-..."
export LANGFUSE_SECRET_KEY="sk-..."

Or add to .env file in project root.

2. Run an Experiment

# Run default experiment (fast vs balanced)
./system_tests/profiling/run_experiment.sh

# Compare different providers
./system_tests/profiling/run_experiment.sh --config providers_comparison.yaml

# Compare all modes for one provider
./system_tests/profiling/run_experiment.sh --config fast_vs_accurate.yaml

# Full matrix: providers × modes
./system_tests/profiling/run_experiment.sh --config full_matrix_comparison.yaml

3. View Results

# Start HTTP server and open browser
./system_tests/profiling/serve.sh --open

# Or just start server (visit http://localhost:8080/comparison.html)
./system_tests/profiling/serve.sh

Comparison Types

Mode Comparison (Same Provider)

Compare fast vs balanced vs accurate modes using the same LLM provider.

Example output files: fast_20250930.json, balanced_20250930.json, accurate_20250930.json

Provider Comparison (Same Mode)

Compare OpenAI vs Azure vs WatsonX using the same mode (e.g., balanced).

Example output files: openai_balanced_20250930.json, azure_balanced_20250930.json, watsonx_balanced_20250930.json

Full Matrix Comparison

Compare all combinations of providers and modes (2 providers × 2 modes = 4 experiments).

Example output files: openai_fast_20250930.json, openai_balanced_20250930.json, azure_fast_20250930.json, azure_balanced_20250930.json

Available Scripts

Script	Purpose
`run_experiment.sh`	Run profiling experiments with YAML config
`serve.sh`	Start HTTP server to view results
`bin/run_profiling.sh`	Lower-level profiling script with CLI args
`bin/profile_digital_sales_tasks.py`	Core Python profiling tool

Configuration Files

Located in config/:

default_experiment.yaml - Fast vs Balanced comparison
fast_vs_accurate.yaml - Fast vs Accurate comparison
providers_comparison.yaml - OpenAI vs Azure vs WatsonX (same mode)
full_matrix_comparison.yaml - Full provider × mode matrix
.secrets.yaml - Your Langfuse credentials (git-ignored)

Example: Provider Comparison

Create or use config/providers_comparison.yaml:

experiment:
  name: "providers_comparison"
  runs:
    - name: "openai_balanced"
      test_id: "settings.openai.toml:balanced:test_get_top_account_by_revenue_stream"
      iterations: 3
      output: "experiments/openai_balanced_{{timestamp}}.json"
    
    - name: "azure_balanced"
      test_id: "settings.azure.toml:balanced:test_get_top_account_by_revenue_stream"
      iterations: 3
      output: "experiments/azure_balanced_{{timestamp}}.json"

Then run:

./system_tests/profiling/run_experiment.sh --config providers_comparison.yaml
./system_tests/profiling/serve.sh --open

Color Coding in Charts

The comparison HTML automatically color-codes experiments:

Modes:

Fast = Green 🟢
Balanced = Blue 🔵
Accurate = Orange 🟠

Providers:

OpenAI = Teal 🟦
Azure = Azure Blue 💙
WatsonX = IBM Blue 🔵

Combined Labels (e.g., openai_balanced) get colors based on provider first, then mode.

Directory Structure

system_tests/profiling/
├── run_experiment.sh          # Main entry point
├── serve.sh                   # View results
├── bin/                       # Internal scripts
├── config/                    # YAML configurations
├── experiments/               # Results + HTML viewer
└── reports/                   # Individual reports

Tips

💡 HTML auto-loads all JSON files in experiments/
💡 Naming format: {provider}_{mode}_{timestamp}.json or {mode}_{timestamp}.json
💡 CLI args override YAML config settings
💡 Use {{timestamp}} in output paths for unique files
💡 Retry mechanism handles Langfuse propagation delays
💡 Stop server with Ctrl+C

For full documentation, see README.md.