Skip to content

Observability: Langfuse#329

Merged
sallyom merged 12 commits intoambient-code:mainfrom
sallyom:langfuse
Nov 26, 2025
Merged

Observability: Langfuse#329
sallyom merged 12 commits intoambient-code:mainfrom
sallyom:langfuse

Conversation

@sallyom
Copy link
Collaborator

@sallyom sallyom commented Nov 14, 2025

This PR

  • adds Langfuse instrumentation for usage and cost tracking
  • adds script and manifests to deploy Langfuse on K8s & OpenShift

@github-actions

This comment has been minimized.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@sallyom sallyom force-pushed the langfuse branch 2 times, most recently from 3cb842d to 87a9d9c Compare November 26, 2025 04:00
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@ambient-code ambient-code deleted a comment from github-actions bot Nov 26, 2025
@github-actions

This comment was marked as outdated.

sallyom and others added 12 commits November 26, 2025 00:35
- Add deploy-langfuse-openshift.sh script with ClickHouse resource fixes
- Add Langfuse configuration manifests for ambient-code integration
- README.md with complete setup instructions
- Add Makefile target: deploy-langfuse-openshift
- Update .gitignore to exclude Langfuse secrets

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
Remove npm installation of @anthropic-ai/claude-code and Node.js dependency from Dockerfile.
The claude-agent-sdk Python package bundles its own CLI (v2.0.49) which is automatically used by default.

pyproject.toml unpinned dependency versions remain:
- anthropic[vertex]>=0.68.0 (from ==0.73.0)
- claude-agent-sdk>=0.1.4 (from ==0.1.6)

Benefits:
- Smaller container image (no Node.js/npm)
- Automatic version sync between SDK and CLI
- Simpler build process

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
Signed-off-by: sallyom <somalley@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
… children

Problem: All interactions were being appended to a single claude_turn_1 trace
instead of creating separate claude_turn_1, claude_turn_2, etc. traces.

Root Cause:
- Using start_as_current_observation() sets the new observation as the current
  OpenTelemetry context, causing subsequent observations to become children
- This created a nested hierarchy instead of sibling observations

Solution:
1. Split turn tracking into two phases:
   - start_turn() - Called when AssistantMessage arrives (BEFORE tools)
   - end_turn() - Called when ResultMessage arrives (with usage data)

2. Use session_trace.start_observation() instead of start_as_current_observation()
   - Creates observations as SIBLINGS under the session trace
   - Does NOT change the current context
   - All turns are children of session, not children of each other

3. Tool span parenting:
   - Tools now properly attach to their parent turn via
     current_turn_generation.start_observation()
   - Added fallback to session trace if no active turn

4. Proper observation lifecycle:
   - Call .end() on both generations and spans to close them

Expected Result:
session_{session_id} (span)
├── claude_turn_1 (generation) - Sibling
│   ├── tool_Read (span)
│   └── tool_Write (span)
├── claude_turn_2 (generation) - Sibling
│   └── tool_Bash (span)
└── claude_turn_3 (generation) - Sibling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
Signed-off-by: sallyom <somalley@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-authored-by: Claude Code <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
Signed-off-by: sallyom <somalley@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Addresses GitHub Actions security review findings:

1. Enhanced Secret Exposure Prevention (observability.py:164)
2. Model Parameter Input Validation (observability.py:143-155)
3. Complete Error Recovery Logging (observability.py:341-342)

Co-authored-by: GitHub Actions Security Review Bot
Signed-off-by: sallyom <somalley@redhat.com>
Addresses GitHub Actions security review:

Risk: Logging secret_name reveals credential context to attackers

Fix:
- Changed log message from revealing secret type to generic warning
- Before: "SECURITY: Secret '{secret_name}' found in sanitized message"
- After:  "SECURITY: Credential sanitization validation failed"
- Maintains security alerting without revealing attack surface

Signed-off-by: sallyom <somalley@redhat.com>
…ers only

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
…secret

Signed-off-by: sallyom <somalley@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions
Copy link
Contributor

Claude Code Review

Summary

This PR adds comprehensive Langfuse observability infrastructure to track LLM usage, costs, and session traces. The implementation is well-architected with strong security practices, extensive test coverage, and production-ready deployment tooling.

Overall Assessment: ✅ Strong implementation with excellent security practices and comprehensive testing. A few critical security and architectural recommendations should be addressed before merge.


Issues by Severity

🚫 Blocker Issues

None - No blocking issues found.


🔴 Critical Issues

1. Kubernetes Name Validation May Be Too Restrictive (middleware.go:37-48)

Issue: The isValidKubernetesName regex enforces DNS-1123 *label* validation (^[a-z0-9]([-a-z0-9]*[a-z0-9])?$) which rejects uppercase letters and underscores. However, some OpenShift namespaces use uppercase (e.g., MyProject) or underscores.

Location: components/backend/handlers/middleware.go:37-48

var kubernetesNameRegex = regexp.MustCompile(`^[a-z0-9]([-a-z0-9]*[a-z0-9])?$`)

Impact: Legitimate users with uppercase namespace names will be rejected with 400 Bad Request.

Recommendation:

  • For *namespace* validation (not labels), use DNS-1123 *subdomain* rules: ^[a-z0-9]([-.a-z0-9]*[a-z0-9])?$ (allows dots)
  • OR relax to allow uppercase/underscores if OpenShift compatibility is required
  • Add unit tests for edge cases (My-Project, test_ns, etc.)

References:


2. Sanitization Validation Logic Has False Positive Risk (security_utils.py:60-67)

Issue: Post-sanitization validation uses if secret_value in error_msg which can produce false positives if the secret is a common substring (e.g., pk appears in "package").

Location: components/runners/claude-code-runner/security_utils.py:60-67

for secret_name, secret_value in secrets_to_redact.items():
    if secret_value and secret_value.strip() and secret_value in error_msg:
        logging.error("SECURITY: Credential sanitization validation failed")
        return "Operation failed - check configuration and credentials"

Impact:

  • Short secrets like pk or sk could trigger false positives
  • Generic error messages reduce debugging ability

Recommendation:

# Add minimum length check to avoid false positives
MIN_SECRET_LENGTH = 8  # Langfuse keys are typically >20 chars
if secret_value and len(secret_value.strip()) >= MIN_SECRET_LENGTH and secret_value in error_msg:
    logging.error("SECURITY: Credential sanitization validation failed")
    return "Operation failed - check configuration and credentials"

Test Coverage: Add test for short secrets (e.g., {"key": "pk"}) to verify no false positives.


3. Missing Secret Validation in Operator (sessions.go:298-328)

Issue: The operator copies ambient-admin-langfuse-secret to session namespaces without validating the secret contains required keys (LANGFUSE_HOST, LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY).

Location: components/operator/internal/handlers/sessions.go:298-328

Impact:

  • Sessions start with incomplete Langfuse config → silent observability failures
  • No validation of LANGFUSE_HOST format (could be malformed URL)

Recommendation:

// Validate secret contains required keys before copying
requiredKeys := []string{"LANGFUSE_HOST", "LANGFUSE_PUBLIC_KEY", "LANGFUSE_SECRET_KEY", "LANGFUSE_ENABLED"}
for _, key := range requiredKeys {
    if _, ok := langfuseSecret.Data[key]; \!ok {
        log.Printf("Warning: ambient-admin-langfuse-secret missing required key %s", key)
        // Skip copy or log error
    }
}

Consistency: Matches URL validation pattern in observability.py:120-128.


🟡 Major Issues

4. Context Manager Cleanup on Partial Initialization Failure (observability.py:163-179)

Issue: If propagate_attributes().__enter__() succeeds but a later step fails (e.g., model sanitization), the context is cleaned up in the except block. However, if __enter__() itself fails, cleanup runs on an uninitialized context.

Location: components/runners/claude-code-runner/observability.py:163-179

try:
    self._propagate_ctx = propagate_attributes(...)
    self._propagate_ctx.__enter__()
except Exception:
    if self._propagate_ctx:
        try:
            self._propagate_ctx.__exit__(None, None, None)
        except Exception:
            pass
    self._propagate_ctx = None
    raise

Issue: If __enter__ fails before assignment completes, self._propagate_ctx is still set but partially initialized.

Recommendation:

ctx = None
try:
    ctx = propagate_attributes(...)
    ctx.__enter__()
    self._propagate_ctx = ctx  # Only assign after successful __enter__
except Exception:
    if ctx:
        try:
            ctx.__exit__(None, None, None)
        except Exception:
            pass
    raise

5. Hardcoded Model Default Should Be Configurable (wrapper.py:269)

Issue: Default model claude-sonnet-4-5@20250929 is hardcoded. Future model updates require code changes.

Location: components/runners/claude-code-runner/wrapper.py:269

configured_model = model or 'claude-sonnet-4-5@20250929'

Recommendation:

DEFAULT_MODEL = os.getenv('DEFAULT_LLM_MODEL', 'claude-sonnet-4-5@20250929')
configured_model = model or DEFAULT_MODEL

Benefit: Allows platform admins to update default via ConfigMap without code changes.


6. Flush Timeout Configuration Lacks Documentation (observability.py:482)

Issue: LANGFUSE_FLUSH_TIMEOUT is configurable but not documented in the deployment example secret or docs.

Location: components/runners/claude-code-runner/observability.py:482

Recommendation: Add to ambient-admin-langfuse-secret.yaml.example:

# Optional: Langfuse flush timeout (default: 30s)
# Increase for large traces or constrained networks
# LANGFUSE_FLUSH_TIMEOUT: "60.0"

7. Test Coverage Gaps

Missing Tests:

  1. Backend: No tests for isValidKubernetesName edge cases (empty string, >63 chars, uppercase, special chars)
  2. Operator: No tests for Langfuse secret validation or cleanup logic (deleteAmbientLangfuseSecret)
  3. Runner: No integration test for Langfuse disabled mode (env vars unset)

Recommendation: Add unit tests for:

  • isValidKubernetesName("") -> false
  • isValidKubernetesName("A-Project") -> behavior?
  • deleteAmbientLangfuseSecret with/without CopiedFromAnnotation

Note: Excellent test coverage for observability.py (549 lines) and security_utils.py (396 lines) ✅


🔵 Minor Issues

8. Inconsistent Error Logging Patterns

Issue: Some functions log errors before returning, others don't. Example:

# observability.py:186 - logs before returning
except Exception as e:
    error_msg = sanitize_exception_message(e, secrets)
    logging.warning(f"Langfuse init failed: {error_msg}")  # ✅

# observability.py:407 - catches but doesn't log
except Exception as e:
    logging.debug(f"Langfuse: Failed to track tool use: {e}")  # Debug level

Recommendation: Standardize error levels:

  • logging.error for initialization failures (user-visible impact)
  • logging.warning for non-fatal failures (observability only)
  • logging.debug for expected silent failures

9. Magic String "ambient-admin-langfuse-secret" Repeated

Issue: Secret name appears 12+ times across operator, docs, and examples. No constant definition.

Recommendation: Define in components/operator/internal/types/resources.go:

const AmbientLangfuseSecretName = "ambient-admin-langfuse-secret"

Benefit: Single source of truth, easier refactoring.


10. Deployment Script Has Weak Credential Defaults

Issue: Test mode uses simple passwords (postgres123, clickhouse123) which may be accidentally deployed to production.

Location: e2e/scripts/deploy-langfuse.sh:118-120

Recommendation:

# Add warning banner
if [[ "$USE_TEST_CREDS" =~ ^[Yy]$ ]]; then
  echo "⚠️  WARNING: Using test credentials (INSECURE - dev/test only\!)"
  echo "   DO NOT use in production environments"
  read -p "Continue? (y/n): " CONFIRM
  [[ "$CONFIRM" =~ ^[Yy]$ ]] || exit 1
  # ... existing code
fi

Positive Highlights

🌟 Excellent Security Practices

  1. Secret Sanitization: Comprehensive sanitize_exception_message with post-validation defense-in-depth (security_utils.py:17-68)
  2. Input Validation: Model name sanitization prevents injection attacks (security_utils.py:175-205)
  3. RBAC Documentation: Clear comments explaining why backend service account is used for secret copies (operator/sessions.go:116-122)
  4. Timeout Protection: Both async and sync timeout wrappers prevent hanging operations (security_utils.py:71-143)

🧪 Strong Test Coverage

  • 549 lines of observability tests covering edge cases (duplicate turns, context cleanup, flush timeouts)
  • 396 lines of security utils tests (sanitization validation, timeout scenarios)
  • Mock-based architecture allows testing without Langfuse dependencies
  • Async test patterns properly handle event loop and context managers

📚 Excellent Documentation

  • ADR-style comments in code explaining architectural decisions (observability.py:2-43)
  • Step-by-step deployment guide with prerequisite checks (deploy-langfuse.sh)
  • Comprehensive example secret with inline explanations (ambient-admin-langfuse-secret.yaml.example)
  • Model pricing reference added to docs (docs/model-pricing.md)

🏗️ Clean Architecture

  • Separation of concerns: Observability logic isolated in dedicated module
  • Graceful degradation: System works without Langfuse (ImportError handling)
  • Idempotent operations: Secret cleanup checks CopiedFromAnnotation before deleting
  • Resource lifecycle: OwnerReferences ensure proper cleanup (not shown in diff but follows patterns)

🔧 Production-Ready Features

  • Configurable flush timeout: Handles slow networks without data loss
  • Session grouping: Uses propagate_attributes for automatic trace aggregation
  • Cache token tracking: Separate fields for accurate cost calculation (cache_read vs cache_creation)
  • Platform detection: Deploy script auto-detects OpenShift vs Kubernetes

Recommendations

Priority 1 (Before Merge)

  1. Fix namespace validation to support OpenShift naming (uppercase/underscores)
  2. Add secret validation in operator before copying Langfuse secret
  3. Add minimum length check to sanitization validation (prevent false positives)
  4. Add unit tests for isValidKubernetesName edge cases

Priority 2 (Near-term)

  1. 📝 Document LANGFUSE_FLUSH_TIMEOUT in example secret
  2. 🏗️ Extract magic strings to constants (secret name, default model)
  3. 🧪 Add operator tests for Langfuse secret lifecycle

Priority 3 (Nice-to-have)

  1. 🔍 Standardize error logging levels across modules
  2. ⚠️ Add production warning to deployment script test mode
  3. 📊 Consider metrics endpoint for observability health checks

Final Notes

This is a high-quality PR with exceptional attention to security, testing, and documentation. The code follows repository standards from CLAUDE.md (secret sanitization, timeout wrappers, RBAC documentation).

The critical issues are primarily about edge case handling (namespace validation, secret validation) rather than fundamental design flaws. Once addressed, this will be a production-ready observability system.

Estimated review effort: 2-3 hours for fixes + testing
Merge readiness: 85% (pending critical fixes)


Reviewed by: Claude Code (Sonnet 4.5)
Review Date: 2025-11-26
PR: #329 - Observability: Langfuse

@sallyom
Copy link
Collaborator Author

sallyom commented Nov 26, 2025

examples from langfuse UI
Screenshot 2025-11-26 at 12 59 55 AM
Screenshot 2025-11-26 at 1 02 06 AM
Screenshot 2025-11-26 at 1 04 19 AM

@sallyom sallyom merged commit cc6a82c into ambient-code:main Nov 26, 2025
22 checks passed
@bobbravo2 bobbravo2 added this to the v0.0.11 milestone Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants