Observability: Langfuse by sallyom · Pull Request #329 · ambient-code/platform

sallyom · 2025-11-14T22:51:42Z

This PR

adds Langfuse instrumentation for usage and cost tracking
adds script and manifests to deploy Langfuse on K8s & OpenShift

- Add deploy-langfuse-openshift.sh script with ClickHouse resource fixes - Add Langfuse configuration manifests for ambient-code integration - README.md with complete setup instructions - Add Makefile target: deploy-langfuse-openshift - Update .gitignore to exclude Langfuse secrets 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: sallyom <somalley@redhat.com>

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: sallyom <somalley@redhat.com>

Remove npm installation of @anthropic-ai/claude-code and Node.js dependency from Dockerfile. The claude-agent-sdk Python package bundles its own CLI (v2.0.49) which is automatically used by default. pyproject.toml unpinned dependency versions remain: - anthropic[vertex]>=0.68.0 (from ==0.73.0) - claude-agent-sdk>=0.1.4 (from ==0.1.6) Benefits: - Smaller container image (no Node.js/npm) - Automatic version sync between SDK and CLI - Simpler build process 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: sallyom <somalley@redhat.com>

Signed-off-by: sallyom <somalley@redhat.com> Co-Authored-By: Claude <noreply@anthropic.com>

… children Problem: All interactions were being appended to a single claude_turn_1 trace instead of creating separate claude_turn_1, claude_turn_2, etc. traces. Root Cause: - Using start_as_current_observation() sets the new observation as the current OpenTelemetry context, causing subsequent observations to become children - This created a nested hierarchy instead of sibling observations Solution: 1. Split turn tracking into two phases: - start_turn() - Called when AssistantMessage arrives (BEFORE tools) - end_turn() - Called when ResultMessage arrives (with usage data) 2. Use session_trace.start_observation() instead of start_as_current_observation() - Creates observations as SIBLINGS under the session trace - Does NOT change the current context - All turns are children of session, not children of each other 3. Tool span parenting: - Tools now properly attach to their parent turn via current_turn_generation.start_observation() - Added fallback to session trace if no active turn 4. Proper observation lifecycle: - Call .end() on both generations and spans to close them Expected Result: session_{session_id} (span) ├── claude_turn_1 (generation) - Sibling │ ├── tool_Read (span) │ └── tool_Write (span) ├── claude_turn_2 (generation) - Sibling │ └── tool_Bash (span) └── claude_turn_3 (generation) - Sibling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: sallyom <somalley@redhat.com>

Signed-off-by: sallyom <somalley@redhat.com> Co-Authored-By: Claude <noreply@anthropic.com>

Co-authored-by: Claude Code <noreply@anthropic.com> Signed-off-by: sallyom <somalley@redhat.com>

Signed-off-by: sallyom <somalley@redhat.com> Co-Authored-By: Claude <noreply@anthropic.com>

Addresses GitHub Actions security review findings: 1. Enhanced Secret Exposure Prevention (observability.py:164) 2. Model Parameter Input Validation (observability.py:143-155) 3. Complete Error Recovery Logging (observability.py:341-342) Co-authored-by: GitHub Actions Security Review Bot Signed-off-by: sallyom <somalley@redhat.com>

Addresses GitHub Actions security review: Risk: Logging secret_name reveals credential context to attackers Fix: - Changed log message from revealing secret type to generic warning - Before: "SECURITY: Secret '{secret_name}' found in sanitized message" - After: "SECURITY: Credential sanitization validation failed" - Maintains security alerting without revealing attack surface Signed-off-by: sallyom <somalley@redhat.com>

…ers only 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: sallyom <somalley@redhat.com>

…secret Signed-off-by: sallyom <somalley@redhat.com> Co-Authored-By: Claude <noreply@anthropic.com>

github-actions · 2025-11-26T05:39:51Z

Claude Code Review

Summary

This PR adds comprehensive Langfuse observability infrastructure to track LLM usage, costs, and session traces. The implementation is well-architected with strong security practices, extensive test coverage, and production-ready deployment tooling.

Overall Assessment: ✅ Strong implementation with excellent security practices and comprehensive testing. A few critical security and architectural recommendations should be addressed before merge.

Issues by Severity

🚫 Blocker Issues

None - No blocking issues found.

🔴 Critical Issues

1. Kubernetes Name Validation May Be Too Restrictive (middleware.go:37-48)

Issue: The isValidKubernetesName regex enforces DNS-1123 *label* validation (^[a-z0-9]([-a-z0-9]*[a-z0-9])?$) which rejects uppercase letters and underscores. However, some OpenShift namespaces use uppercase (e.g., MyProject) or underscores.

Location: components/backend/handlers/middleware.go:37-48

var kubernetesNameRegex = regexp.MustCompile(`^[a-z0-9]([-a-z0-9]*[a-z0-9])?$`)

Impact: Legitimate users with uppercase namespace names will be rejected with 400 Bad Request.

Recommendation:

For *namespace* validation (not labels), use DNS-1123 *subdomain* rules: ^[a-z0-9]([-.a-z0-9]*[a-z0-9])?$ (allows dots)
OR relax to allow uppercase/underscores if OpenShift compatibility is required
Add unit tests for edge cases (My-Project, test_ns, etc.)

References:

Kubernetes DNS subdomain: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-subdomain-names
Your CLAUDE.md backend standards: "Never skip RBAC checks"

2. Sanitization Validation Logic Has False Positive Risk (security_utils.py:60-67)

Issue: Post-sanitization validation uses if secret_value in error_msg which can produce false positives if the secret is a common substring (e.g., pk appears in "package").

Location: components/runners/claude-code-runner/security_utils.py:60-67

for secret_name, secret_value in secrets_to_redact.items():
    if secret_value and secret_value.strip() and secret_value in error_msg:
        logging.error("SECURITY: Credential sanitization validation failed")
        return "Operation failed - check configuration and credentials"

Impact:

Short secrets like pk or sk could trigger false positives
Generic error messages reduce debugging ability

Recommendation:

# Add minimum length check to avoid false positives
MIN_SECRET_LENGTH = 8  # Langfuse keys are typically >20 chars
if secret_value and len(secret_value.strip()) >= MIN_SECRET_LENGTH and secret_value in error_msg:
    logging.error("SECURITY: Credential sanitization validation failed")
    return "Operation failed - check configuration and credentials"

Test Coverage: Add test for short secrets (e.g., {"key": "pk"}) to verify no false positives.

3. Missing Secret Validation in Operator (sessions.go:298-328)

Issue: The operator copies ambient-admin-langfuse-secret to session namespaces without validating the secret contains required keys (LANGFUSE_HOST, LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY).

Location: components/operator/internal/handlers/sessions.go:298-328

Impact:

Sessions start with incomplete Langfuse config → silent observability failures
No validation of LANGFUSE_HOST format (could be malformed URL)

Recommendation:

// Validate secret contains required keys before copying
requiredKeys := []string{"LANGFUSE_HOST", "LANGFUSE_PUBLIC_KEY", "LANGFUSE_SECRET_KEY", "LANGFUSE_ENABLED"}
for _, key := range requiredKeys {
    if _, ok := langfuseSecret.Data[key]; \!ok {
        log.Printf("Warning: ambient-admin-langfuse-secret missing required key %s", key)
        // Skip copy or log error
    }
}

Consistency: Matches URL validation pattern in observability.py:120-128.

🟡 Major Issues

4. Context Manager Cleanup on Partial Initialization Failure (observability.py:163-179)

Issue: If propagate_attributes().__enter__() succeeds but a later step fails (e.g., model sanitization), the context is cleaned up in the except block. However, if __enter__() itself fails, cleanup runs on an uninitialized context.

Location: components/runners/claude-code-runner/observability.py:163-179

try:
    self._propagate_ctx = propagate_attributes(...)
    self._propagate_ctx.__enter__()
except Exception:
    if self._propagate_ctx:
        try:
            self._propagate_ctx.__exit__(None, None, None)
        except Exception:
            pass
    self._propagate_ctx = None
    raise

Issue: If __enter__ fails before assignment completes, self._propagate_ctx is still set but partially initialized.

Recommendation:

ctx = None
try:
    ctx = propagate_attributes(...)
    ctx.__enter__()
    self._propagate_ctx = ctx  # Only assign after successful __enter__
except Exception:
    if ctx:
        try:
            ctx.__exit__(None, None, None)
        except Exception:
            pass
    raise

5. Hardcoded Model Default Should Be Configurable (wrapper.py:269)

Issue: Default model claude-sonnet-4-5@20250929 is hardcoded. Future model updates require code changes.

Location: components/runners/claude-code-runner/wrapper.py:269

configured_model = model or 'claude-sonnet-4-5@20250929'

Recommendation:

DEFAULT_MODEL = os.getenv('DEFAULT_LLM_MODEL', 'claude-sonnet-4-5@20250929')
configured_model = model or DEFAULT_MODEL

Benefit: Allows platform admins to update default via ConfigMap without code changes.

6. Flush Timeout Configuration Lacks Documentation (observability.py:482)

Issue: LANGFUSE_FLUSH_TIMEOUT is configurable but not documented in the deployment example secret or docs.

Location: components/runners/claude-code-runner/observability.py:482

Recommendation: Add to ambient-admin-langfuse-secret.yaml.example:

# Optional: Langfuse flush timeout (default: 30s)
# Increase for large traces or constrained networks
# LANGFUSE_FLUSH_TIMEOUT: "60.0"

7. Test Coverage Gaps

Missing Tests:

Backend: No tests for isValidKubernetesName edge cases (empty string, >63 chars, uppercase, special chars)
Operator: No tests for Langfuse secret validation or cleanup logic (deleteAmbientLangfuseSecret)
Runner: No integration test for Langfuse disabled mode (env vars unset)

Recommendation: Add unit tests for:

isValidKubernetesName("") -> false
isValidKubernetesName("A-Project") -> behavior?
deleteAmbientLangfuseSecret with/without CopiedFromAnnotation

Note: Excellent test coverage for observability.py (549 lines) and security_utils.py (396 lines) ✅

🔵 Minor Issues

8. Inconsistent Error Logging Patterns

Issue: Some functions log errors before returning, others don't. Example:

# observability.py:186 - logs before returning
except Exception as e:
    error_msg = sanitize_exception_message(e, secrets)
    logging.warning(f"Langfuse init failed: {error_msg}")  # ✅

# observability.py:407 - catches but doesn't log
except Exception as e:
    logging.debug(f"Langfuse: Failed to track tool use: {e}")  # Debug level

Recommendation: Standardize error levels:

logging.error for initialization failures (user-visible impact)
logging.warning for non-fatal failures (observability only)
logging.debug for expected silent failures

9. Magic String "ambient-admin-langfuse-secret" Repeated

Issue: Secret name appears 12+ times across operator, docs, and examples. No constant definition.

Recommendation: Define in components/operator/internal/types/resources.go:

const AmbientLangfuseSecretName = "ambient-admin-langfuse-secret"

Benefit: Single source of truth, easier refactoring.

10. Deployment Script Has Weak Credential Defaults

Issue: Test mode uses simple passwords (postgres123, clickhouse123) which may be accidentally deployed to production.

Location: e2e/scripts/deploy-langfuse.sh:118-120

Recommendation:

# Add warning banner
if [[ "$USE_TEST_CREDS" =~ ^[Yy]$ ]]; then
  echo "⚠️  WARNING: Using test credentials (INSECURE - dev/test only\!)"
  echo "   DO NOT use in production environments"
  read -p "Continue? (y/n): " CONFIRM
  [[ "$CONFIRM" =~ ^[Yy]$ ]] || exit 1
  # ... existing code
fi

Positive Highlights

🌟 Excellent Security Practices

Secret Sanitization: Comprehensive sanitize_exception_message with post-validation defense-in-depth (security_utils.py:17-68)
Input Validation: Model name sanitization prevents injection attacks (security_utils.py:175-205)
RBAC Documentation: Clear comments explaining why backend service account is used for secret copies (operator/sessions.go:116-122)
Timeout Protection: Both async and sync timeout wrappers prevent hanging operations (security_utils.py:71-143)

🧪 Strong Test Coverage

549 lines of observability tests covering edge cases (duplicate turns, context cleanup, flush timeouts)
396 lines of security utils tests (sanitization validation, timeout scenarios)
Mock-based architecture allows testing without Langfuse dependencies
Async test patterns properly handle event loop and context managers

📚 Excellent Documentation

ADR-style comments in code explaining architectural decisions (observability.py:2-43)
Step-by-step deployment guide with prerequisite checks (deploy-langfuse.sh)
Comprehensive example secret with inline explanations (ambient-admin-langfuse-secret.yaml.example)
Model pricing reference added to docs (docs/model-pricing.md)

🏗️ Clean Architecture

Separation of concerns: Observability logic isolated in dedicated module
Graceful degradation: System works without Langfuse (ImportError handling)
Idempotent operations: Secret cleanup checks CopiedFromAnnotation before deleting
Resource lifecycle: OwnerReferences ensure proper cleanup (not shown in diff but follows patterns)

🔧 Production-Ready Features

Configurable flush timeout: Handles slow networks without data loss
Session grouping: Uses propagate_attributes for automatic trace aggregation
Cache token tracking: Separate fields for accurate cost calculation (cache_read vs cache_creation)
Platform detection: Deploy script auto-detects OpenShift vs Kubernetes

Recommendations

Priority 1 (Before Merge)

✅ Fix namespace validation to support OpenShift naming (uppercase/underscores)
✅ Add secret validation in operator before copying Langfuse secret
✅ Add minimum length check to sanitization validation (prevent false positives)
✅ Add unit tests for isValidKubernetesName edge cases

Priority 2 (Near-term)

📝 Document LANGFUSE_FLUSH_TIMEOUT in example secret
🏗️ Extract magic strings to constants (secret name, default model)
🧪 Add operator tests for Langfuse secret lifecycle

Priority 3 (Nice-to-have)

🔍 Standardize error logging levels across modules
⚠️ Add production warning to deployment script test mode
📊 Consider metrics endpoint for observability health checks

Final Notes

This is a high-quality PR with exceptional attention to security, testing, and documentation. The code follows repository standards from CLAUDE.md (secret sanitization, timeout wrappers, RBAC documentation).

The critical issues are primarily about edge case handling (namespace validation, secret validation) rather than fundamental design flaws. Once addressed, this will be a production-ready observability system.

Estimated review effort: 2-3 hours for fixes + testing
Merge readiness: 85% (pending critical fixes)

Reviewed by: Claude Code (Sonnet 4.5)
Review Date: 2025-11-26
PR: #329 - Observability: Langfuse

sallyom · 2025-11-26T06:05:08Z

examples from langfuse UI

This comment has been minimized.

Sign in to view