Gatewayapi Namespaced Mode#4690
Conversation
electricjesus
left a comment
There was a problem hiding this comment.
Good work on the runtime Helm rendering migration — dropping 53K lines of pre-rendered YAML is a huge win. The GatewayNamespace mode looks solid with good test coverage. A few observations below.
| CurrentGatewayClasses: set.New[string](), | ||
| } | ||
|
|
||
| if gatewayAPI.Spec.GatewayDeploymentMode == nil { |
There was a problem hiding this comment.
Nit: The CRD already has +kubebuilder:default=ControllerNamespace, so any persisted GatewayAPI resource will have this field populated by the API server. This runtime defaulting only matters for in-memory objects that were never persisted (tests?). Not a problem, just noting the redundancy — if the CRD default is the source of truth, a comment here explaining why you also default in code would help future readers.
There was a problem hiding this comment.
It is used by the tests
All good catches man, all sorted |
| // Gateway resources using operator-managed GatewayClasses. These namespaces need | ||
| // per-namespace Enterprise resources (SA, CRB, pull secrets). | ||
| if *gatewayAPI.Spec.GatewayDeploymentMode == operatorv1.GatewayDeploymentModeGatewayNamespace && | ||
| variant.IsEnterprise() { |
There was a problem hiding this comment.
Why does the Variant matter here at all?
- Swap the checked-in gateway_api_resources.yaml for the embedded gateway-helm.tgz rendered via the helm SDK at startup; K8SGatewayAPICRDs/GatewayAPICRDs now take a runtime.Scheme and return an error (istio_controller updated for the new signature) - Deploy two envoy-gateway controllers: legacy in tigera-gateway (user-declared classes via Spec.GatewayClasses) and a new one in calico-system with deploy.type=GatewayNamespace; auto-provision the tigera-gateway-class-ns GatewayClass bound to the new controller - Group the tigera-gateway install behind legacyObjects/legacyTeardownObjects so the eventual deprecation is a single delete - HasLegacyGateways classifier in the controller: build a className -> controllerName map seeded from Spec.GatewayClasses + existing GatewayClass resources, classify every live Gateway; when no Gateway targets the tigera-gateway controller, the install is torn down; during the teardown-then-redeploy race the legacy render is deferred to avoid a "Namespace is terminating, skipping creation" log flood - Legacy teardown queues only the Namespace + cluster-scoped objects + the Deployment (for status.RemoveDeployments); in-namespace RBAC/Secrets ride the cascade to avoid the tigera-operator-secrets RoleBinding race - Move the shared waf-http-filter ClusterRoles out of the legacy bundle so the calico-system-side proxies keep their cluster-scoped perms after tigera-gateway is retired - Per-namespace Enterprise resources (SA, RoleBindings, pull secret, shared CRB subject) for namespaces hosting a namespaced-class Gateway; reserved namespaces skip shared resource create/delete; Secret goes before RoleBinding on cleanup to avoid 403 - Gate v3 NetworkPolicies on the calico-system Tier; render calico-system.envoy-gateway allow for the controller and certgen - Update unit tests and Makefile/docs accordingly Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Cover the calico-system envoy-gateway controller lifecycle, per-namespace resource provisioning and cleanup, custom EnvoyProxy and EnvoyGateway ConfigMap watches, owning-gateway env vars in l7-log-collector, and the legacy-class teardown path - Teardown sequencing for tigera-gateway cascading Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
04d49c6 to
8c6f64c
Compare
Description
Add a
gatewayDeploymentModefield to theGatewayAPICR that controls where Envoy Gateway deploys managed proxy workloads, plus the operator-side plumbing to support both modes.ControllerNamespace(default, existing behaviour): Envoy Gateway controller and all Gateway proxies live intigera-gateway.GatewayNamespace(new): controller lives incalico-system(to concentrate Calico components in one namespace); proxies are deployed into each Gateway's own namespace so namespace admins can own their Gateways' NetworkPolicies.self == oldSelf).gateway-helm.tgz) and render at runtime via the Helm SDK — matches the Istio chart pattern, drops a 3.3MB pre-rendered YAML from git. Render results are cached per deployment mode; the release namespace is derived from the mode so every downstream resource (Deployment, Service, Role/RoleBinding, MutatingWebhookConfiguration, certgen Job, EnvoyProxy) lands in the right namespace automatically.GatewayNamespace+ Enterprise mode, per-namespace resources are provisioned for each namespace containing an operator-managed Gateway:waf-http-filterServiceAccount,waf-http-filter-gateway-resourcesRoleBinding (scoped to that namespace's Gateway-API resources),tigera-operator-secretsRoleBinding, and atigera-pull-secretcopy. Cluster-scoped permissions (licensekeys, tokenreviews) go through a single sharedwaf-http-filter-gateway-namespacesClusterRoleBinding with one subject per Gateway namespace, avoiding N cluster-scoped objects.waf-http-filterClusterRole intowaf-http-filter-cluster-scopedandwaf-http-filter-gateway-resources. Deprecated combined ClusterRole/ClusterRoleBinding are cleaned up on every reconcile for upgrade safety.calico-system,tigera-operator) are excluded fromtigera-operator-secretsRoleBinding + pull-secret creation/deletion — the core Installation controller owns those there.tigera-operator-secretsRoleBinding (that RoleBinding is what grants us delete on Secrets, so reversing it yields a 403 and aborts the reconcile); shared ClusterRoleBinding subjects are recomputed every reconcile and the CRB is removed entirely when no Gateway namespaces remain.calico-systemtier to keep gateway components functional under a default-deny posture:calico-system.default-denyintigera-gateway(ControllerNamespace mode only;calico-systemalready has one via core Installation),calico-system.envoy-gatewayallow policy for the controller + certgen Job, andcalico-system.envoy-proxyallow policy for Envoy Proxy workloads (ControllerNamespace mode only, since in GatewayNamespace mode proxies live in user namespaces and are out of scope for our default-deny). Policy rendering is gated on thecalico-systemTier existing (viaWaitToAddTierWatch+Get(Tier)), matchingapiserver,manager,logstorage/*, andmonitorcontrollers.Security
GatewayNamespacemode copiestigera-pull-secretinto every user namespace that owns a Gateway — permissive RBAC on those namespaces can expose the pull secret. Operator-owned shared resources incalico-system/tigera-operatorare not touched by the gateway feature, protecting against accidental deletion.Upgrade / compatibility
gatewayDeploymentModeis immutable; switching modes requires deleting and recreating theGatewayAPIresource.GatewayNamespacemode has not shipped in any release yet, so there is no in-place migration path from old state. Deprecated combinedwaf-http-filterClusterRole/ClusterRoleBinding are cleaned up on every reconcile.Release Note
For PR author
make gen-filesmake gen-versionsFor PR reviewers
A note for code reviewers - all pull requests must have the following:
kind/bugif this is a bugfix.kind/enhancementif this is a a new feature.enterpriseif this PR applies to Calico Enterprise only.