Skip to content

fix deployment env vars#370

Merged
cooktheryan merged 1 commit intoambient-code:mainfrom
sallyom:fix-deploy
Nov 25, 2025
Merged

fix deployment env vars#370
cooktheryan merged 1 commit intoambient-code:mainfrom
sallyom:fix-deploy

Conversation

@sallyom
Copy link
Collaborator

@sallyom sallyom commented Nov 25, 2025

No description provided.

Signed-off-by: sallyom <somalley@redhat.com>
@cooktheryan cooktheryan merged commit e7b302b into ambient-code:main Nov 25, 2025
21 checks passed
@github-actions
Copy link
Contributor

Claude Code Review

Summary

This PR fixes a critical omission in the GitHub Actions deployment workflows by adding the four required Vertex AI environment variables to the operator deployment patches. When Vertex AI support was added in PR #243, the base operator deployment manifest was correctly updated, but the workflow's oc patch commands were not. This meant deployments via CI/CD would overwrite the Vertex configuration with incomplete environment variable sets, breaking Vertex AI functionality.

Issues by Severity

🟡 Major Issues

1. Missing optional: true for ConfigMap References

The configMapKeyRef entries don't include optional: true, which means:

  • If the operator-config ConfigMap doesn't exist in the target namespace, the deployment will fail
  • This creates a deployment dependency that may cause issues in non-production environments

Affected lines: Both patch commands (lines 227 and 281)

Context: The base operator deployment (components/manifests/base/operator-deployment.yaml:40-59) also lacks optional: true, so this PR is consistent with the existing pattern. However, this creates a hard dependency on the ConfigMap.

Recommendation: Consider adding optional: true to all Vertex AI ConfigMap references in both:

  1. The base deployment manifest
  2. These workflow patch commands

This would allow graceful degradation when Vertex is not configured, though the preflight check in operator/main.go:25-29 would still catch the issue at runtime.

Example fix:

{"name":"CLAUDE_CODE_USE_VERTEX","valueFrom":{"configMapKeyRef":{"name":"operator-config","key":"CLAUDE_CODE_USE_VERTEX","optional":true}}}

2. Extremely Long JSON Patch Commands (>1000 characters)

Both oc patch commands are single-line JSON strings exceeding 1000 characters, making them:

  • Difficult to read and review
  • Error-prone to modify (easy to miss a comma or brace)
  • Hard to validate JSON structure
  • Challenging to debug when they fail

Affected lines: 227 and 281

Recommendation: Refactor to use multi-line YAML or JSON files:

# Option 1: Use a patch file
cat > /tmp/operator-patch.json <<'EOF'
[{
  "op": "replace",
  "path": "/spec/template/spec/containers/0/env",
  "value": [
    {"name":"NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}},
    {"name":"BACKEND_NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}},
    # ... (formatted for readability)
  ]
}]
EOF
oc patch deployment agentic-operator -n ambient-code --type=json --patch-file=/tmp/operator-patch.json

# Option 2: Use strategic merge patch with YAML
cat > /tmp/operator-patch.yaml <<'EOF'
spec:
  template:
    spec:
      containers:
      - name: agentic-operator
        env:
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        # ... (much more readable)
EOF
oc patch deployment agentic-operator -n ambient-code --patch-file=/tmp/operator-patch.yaml

🔵 Minor Issues

3. Workflow Maintainability

The pattern of manually patching deployment environment variables in GitHub Actions creates a maintenance burden:

  • Changes to environment variables require updates in 3 places: base manifest, this workflow (2 locations), and prod workflow
  • Risk of drift between kustomize manifests and workflow patches
  • PR Allow for the usage of Vertex #243 demonstrates this risk - base manifests were updated but workflows were missed

Recommendation: Consider one of these architectural improvements:

Option A: Use kustomize for all environment variable management

# Instead of oc patch, use kustomize patches
cd components/manifests/overlays/production
kustomize edit add patch --kind Deployment --name agentic-operator \
  --patch 'env vars from file or inline'
oc apply -k . -n ambient-code

Option B: Add a helper script for environment variable updates

# scripts/patch-operator-env.sh
#!/bin/bash
# Centralized script that reads from manifests/base to ensure consistency

Option C: Use ConfigMap-based environment variables more extensively

  • Move more configuration into ConfigMaps
  • Reduce the number of direct env var definitions in deployment specs

4. Documentation Gap

The PR description is empty, and there's no comment explaining why these variables were missing or what issue this fixes.

Recommendation:


Positive Highlights

Correct fix for a real bug - The missing environment variables would cause Vertex AI deployments to fail or fall back to non-Vertex mode

Consistent with base manifests - The environment variables match exactly what's in components/manifests/base/operator-deployment.yaml

Applied to both workflows - Fixed in both components-build-deploy.yml (line 227) and the dispatch workflow (line 281)

Complete set of Vertex variables - All four required variables are included:

  • CLAUDE_CODE_USE_VERTEX
  • CLOUD_ML_REGION
  • ANTHROPIC_VERTEX_PROJECT_ID
  • GOOGLE_APPLICATION_CREDENTIALS

Follows operator preflight checks - Aligns with validation in operator/internal/preflight/vertex.go:20-31


Recommendations

Priority 1 (Before Merge):

  1. ✅ Verify operator-config ConfigMap exists in ambient-code namespace (otherwise deployment will fail)
  2. Add a meaningful PR description explaining the fix and context

Priority 2 (Follow-up PR):
3. Add optional: true to all Vertex ConfigMap references (base manifest + workflows)
4. Refactor patch commands to use multi-line format or external files for readability
5. Add workflow validation step to compare patch content against base manifests

Priority 3 (Future Improvement):
6. Consolidate environment variable management to reduce duplication across workflows and manifests
7. Add integration test to verify operator starts successfully with Vertex configuration after CI/CD deployment
8. Document the relationship between workflow patches and kustomize overlays in docs/ or CLAUDE.md


Verdict

Approve with minor concerns. This PR correctly addresses a production bug where Vertex AI configuration was being lost during automated deployments. The fix is technically sound and necessary. The maintainability concerns raised are about the broader pattern in the codebase, not specific to this PR.

The main blocker would be ensuring the operator-config ConfigMap exists in the target namespace before merging, as the lack of optional: true will cause deployment failures otherwise.

@bobbravo2 bobbravo2 added this to the v0.0.10 milestone Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants