Sami Marreed
feat: docker-v1 with optimized frontend
0646b18

Profiling Quick Start

1. Set Environment Variables

export LANGFUSE_PUBLIC_KEY="pk-..."
export LANGFUSE_SECRET_KEY="sk-..."

Or add to .env file in project root.

2. Run an Experiment

# Run default experiment (fast vs balanced)
./system_tests/profiling/run_experiment.sh

# Compare different providers
./system_tests/profiling/run_experiment.sh --config providers_comparison.yaml

# Compare all modes for one provider
./system_tests/profiling/run_experiment.sh --config fast_vs_accurate.yaml

# Full matrix: providers Γ— modes
./system_tests/profiling/run_experiment.sh --config full_matrix_comparison.yaml

3. View Results

# Start HTTP server and open browser
./system_tests/profiling/serve.sh --open

# Or just start server (visit http://localhost:8080/comparison.html)
./system_tests/profiling/serve.sh

Comparison Types

Mode Comparison (Same Provider)

Compare fast vs balanced vs accurate modes using the same LLM provider.

Example output files: fast_20250930.json, balanced_20250930.json, accurate_20250930.json

Provider Comparison (Same Mode)

Compare OpenAI vs Azure vs WatsonX using the same mode (e.g., balanced).

Example output files: openai_balanced_20250930.json, azure_balanced_20250930.json, watsonx_balanced_20250930.json

Full Matrix Comparison

Compare all combinations of providers and modes (2 providers Γ— 2 modes = 4 experiments).

Example output files: openai_fast_20250930.json, openai_balanced_20250930.json, azure_fast_20250930.json, azure_balanced_20250930.json

Available Scripts

Script Purpose
run_experiment.sh Run profiling experiments with YAML config
serve.sh Start HTTP server to view results
bin/run_profiling.sh Lower-level profiling script with CLI args
bin/profile_digital_sales_tasks.py Core Python profiling tool

Configuration Files

Located in config/:

  • default_experiment.yaml - Fast vs Balanced comparison
  • fast_vs_accurate.yaml - Fast vs Accurate comparison
  • providers_comparison.yaml - OpenAI vs Azure vs WatsonX (same mode)
  • full_matrix_comparison.yaml - Full provider Γ— mode matrix
  • .secrets.yaml - Your Langfuse credentials (git-ignored)

Example: Provider Comparison

Create or use config/providers_comparison.yaml:

experiment:
  name: "providers_comparison"
  runs:
    - name: "openai_balanced"
      test_id: "settings.openai.toml:balanced:test_get_top_account_by_revenue_stream"
      iterations: 3
      output: "experiments/openai_balanced_{{timestamp}}.json"
    
    - name: "azure_balanced"
      test_id: "settings.azure.toml:balanced:test_get_top_account_by_revenue_stream"
      iterations: 3
      output: "experiments/azure_balanced_{{timestamp}}.json"

Then run:

./system_tests/profiling/run_experiment.sh --config providers_comparison.yaml
./system_tests/profiling/serve.sh --open

Color Coding in Charts

The comparison HTML automatically color-codes experiments:

Modes:

  • Fast = Green 🟒
  • Balanced = Blue πŸ”΅
  • Accurate = Orange 🟠

Providers:

  • OpenAI = Teal 🟦
  • Azure = Azure Blue πŸ’™
  • WatsonX = IBM Blue πŸ”΅

Combined Labels (e.g., openai_balanced) get colors based on provider first, then mode.

Directory Structure

system_tests/profiling/
β”œβ”€β”€ run_experiment.sh          # Main entry point
β”œβ”€β”€ serve.sh                   # View results
β”œβ”€β”€ bin/                       # Internal scripts
β”œβ”€β”€ config/                    # YAML configurations
β”œβ”€β”€ experiments/               # Results + HTML viewer
└── reports/                   # Individual reports

Tips

  • πŸ’‘ HTML auto-loads all JSON files in experiments/
  • πŸ’‘ Naming format: {provider}_{mode}_{timestamp}.json or {mode}_{timestamp}.json
  • πŸ’‘ CLI args override YAML config settings
  • πŸ’‘ Use {{timestamp}} in output paths for unique files
  • πŸ’‘ Retry mechanism handles Langfuse propagation delays
  • πŸ’‘ Stop server with Ctrl+C

For full documentation, see README.md.