# model-validation-operator **Repository Path**: Sigstore/model-validation-operator ## Basic Information - **Project Name**: model-validation-operator - **Description**: Kubernetes controller to validate AI models - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-05-09 - **Last Updated**: 2026-04-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Model Validation Controller This project is a proof of concept based on the [sigstore/model-transperency-cli](https://github.com/sigstore/model-transparency). It offers a Kubernetes/OpenShift operator designed to validate AI models before they are picked up by actual workload. This project provides a webhook that adds an initcontainer to perform model validation. The operator uses a custom resource to define how the models should be validated, such as utilizing [Sigstore](https://www.sigstore.dev/) or public keys. ### Features - Model Validation: Ensures AI models are validated before they are used by workloads. - Webhook Integration: A webhook automatically injects an initcontainer into pods to perform the validation step. - Custom Resource: Configurable `ModelValidation` custom resource to specify how models should be validated. - Supports methods like [Sigstore](https://www.sigstore.dev/), pki or public key validation. - Continuous Validation: Optional periodic re-validation of models using Kubernetes native sidecars (requires Kubernetes 1.28+). ### Prerequisites - Kubernetes 1.29+ or OpenShift 4.16+ (Kubernetes 1.28+ for continuous validation) - Proper configuration for model validation (e.g., Sigstore, public keys) - A signed model (e.g. check the `testdata` or `examples` folder) ### Installation The operator can be installed via [kustomize](https://kustomize.io/) using different deployment overlays. #### Production Deployment For production environments with cert-manager integration: **Prerequisites:** Install [cert-manager](https://cert-manager.io/) first: ```bash kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.17.2/cert-manager.yaml ``` Then deploy the operator: ```bash kubectl apply -k https://github.com/sigstore/model-validation-operator/config/overlays/production # or local kubectl apply -k config/overlays/production ``` #### Testing Deployment For testing environments with manual certificate management: ```bash kubectl apply -k https://github.com/sigstore/model-validation-operator/config/overlays/testing # or local kubectl apply -k config/overlays/testing ``` #### Development Deployment For development environments, deploying the operator without the webhook integration: ```bash kubectl apply -k https://github.com/sigstore/model-validation-operator/config/overlays/development # or local kubectl apply -k config/overlays/development ``` #### OLM Deployment For OpenShift/OLM environments: ```bash kubectl apply -k https://github.com/sigstore/model-validation-operator/config/overlays/olm # or local kubectl apply -k config/overlays/olm ``` #### Uninstall To uninstall the operator, use the same overlay you used for installation: ```bash kubectl delete -k config/overlays/production ``` ### Configuration Structure The operator uses a kustomize based, overlay configuration structure, aiming to separate generated content from environment specific content: ``` config/ ├── crd/ # Custom Resource Definitions ├── rbac/ # RBAC permissions ├── webhook/ # Webhook configuration ├── manager/ # Controller manager deployment ├── manifests/ # OLM manifests ├── components/ # Reusable components │ ├── webhook/ # Webhook service component │ ├── certmanager/ # Certificate manager component │ ├── manual-tls/ # Manual TLS configuration │ ├── metrics-port/ # Metrics configuration │ └── webhook-replacements/ # Webhook configuration replacements └── overlays/ # Environment-specific overlays ├── production/ # Production (cert-manager) ├── development/ # Development (operator only, no webhooks) ├── testing/ # Testing (manual, self-signed certs) └── olm/ # OpenShift/OLM ``` #### Certificate Management The operator supports different certificate management approaches: 1. **Production**: Uses cert-manager for automatic certificate management - **⚠️ Important**: The default cert-manager configuration uses self-signed certificates - For production environments, you should configure cert-manager with a proper CA issuer 2. **Development**: Does not use certificates, there are no webhook configurations in this overlay 3. **Testing**: Uses manual, self-signed certificate management for testing scenarios 4. **OLM**: Uses OLM's built-in certificate management for OpenShift deployments #### Running the Webhook Server Locally The webhook server requires TLS certificates. When you run the operator locally, certificates will be generated automatically: ```bash make run ``` This command will start the webhook server on https://localhost:9443, using the generated certs. ### Known limitations The project is at an early stage and therefore has some limitations. - There is no validation or defaulting for the custom resource. - The validation is namespace scoped and cannot be used across multiple namespaces. - There are no status fields for the custom resource. - The model and signature path must be specified, there is no auto discovery. - TLS certificates used by the webhook are self generated. ### Usage First, a ModelValidation CR must be created as follows: ```yaml apiVersion: ml.sigstore.dev/v1alpha1 kind: ModelValidation metadata: name: demo spec: config: sigstoreConfig: certificateIdentity: "https://github.com/sigstore/model-validation-operator/.github/workflows/sign-model.yaml@refs/tags/v0.0.2" certificateOidcIssuer: "https://token.actions.githubusercontent.com" model: path: /data/tensorflow_saved_model signaturePath: /data/tensorflow_saved_model/model.sig ``` Pods in the namespace that have the label `validation.ml.sigstore.dev/ml: ""` will be validated using the specified ModelValidation CR. It should be noted that this does not apply to subsequently labeled pods. ```diff apiVersion: v1 kind: Pod metadata: name: whatever-workload + labels: + validation.ml.sigstore.dev/ml: "demo" spec: restartPolicy: Never containers: - name: whatever-workload image: nginx ports: - containerPort: 80 volumeMounts: - name: model-storage mountPath: /data volumes: - name: model-storage persistentVolumeClaim: claimName: models ``` ### Continuous Model Validation The operator supports continuous validation, which periodically re-validates models after the initial validation. This feature uses Kubernetes 1.28+ native sidecars with `restartPolicy: Always`. #### How It Works When continuous validation is enabled: 1. The validation container runs as a native sidecar (not just an init container) 2. After the initial validation succeeds, the container becomes ready 3. The validation repeats at the specified interval 4. On validation failure, the error is logged but the container continues running 5. The readiness probe reflects the validation state #### Configuration Add the `continuousValidation` field to your ModelValidation CR: ```yaml apiVersion: ml.sigstore.dev/v1alpha1 kind: ModelValidation metadata: name: demo-continuous spec: config: sigstoreConfig: certificateIdentity: "user@example.com" certificateOidcIssuer: "https://token.actions.githubusercontent.com" model: path: /data/tensorflow_saved_model signaturePath: /data/tensorflow_saved_model/model.sig continuousValidation: enabled: true interval: "10m" # Supports s, m, h units (e.g., "30s", "5m", "1h") ``` #### Requirements - Kubernetes 1.28 or later (for native sidecar support with `restartPolicy: Always`) - The validation container will consume resources continuously (CPU/memory) - Consider longer intervals (e.g., 10m, 1h) for production workloads ### Examples The example folder contains example files for testing the operator. #### Example Continuous Validation See `examples/continuous-validation.yaml` for a complete example. #### Prerequisites for Examples Before running the examples, create a namespace for testing (separate from the operator namespace): ```bash kubectl create namespace testing ``` **Important**: Do not deploy examples in the operator namespace (e.g., `model-validation-operator-system`). The operator namespace has the label `validation.ml.sigstore.dev/ignore: "true"` which prevents the webhook from processing pods in that namespace. #### Example Files - **prepare.yaml**: Contains a persistent volume claim and a job that downloads a signed test model. ```bash kubectl apply -f https://raw.githubusercontent.com/sigstore/model-validation-operator/main/examples/prepare.yaml -n testing # or local kubectl apply -f examples/prepare.yaml -n testing ``` - **verify.yaml**: Contains a model validation manifest for the validation of this model and a demo pod, which is provided with the appropriate label for validation. ```bash kubectl apply -f https://raw.githubusercontent.com/sigstore/model-validation-operator/main/examples/verify.yaml -n testing # or local kubectl apply -f examples/verify.yaml -n testing ``` - **unsigned.yaml**: Contains an example of a pod that would fail validation (for testing purposes). ```bash kubectl apply -f https://raw.githubusercontent.com/sigstore/model-validation-operator/main/examples/unsigned.yaml -n testing # or local kubectl apply -f examples/unsigned.yaml -n testing ``` After the example installation, the logs of the generated job should show a successful download: ```bash $ kubectl logs -n testing job/download-extract-model Connecting to github.com (140.82.121.3:443) Connecting to objects.githubusercontent.com (185.199.108.133:443) saving to '/data/tensorflow_saved_model.tar.gz' tensorflow_saved_mod 44% |************** | 3983k 0:00:01 ETA tensorflow_saved_mod 100% |********************************| 8952k 0:00:00 ETA '/data/tensorflow_saved_model.tar.gz' saved ./ ./model.sig ./variables/ ./variables/variables.data-00000-of-00001 ./variables/variables.index ./saved_model.pb ./fingerprint.pb ``` The operator logs should show that a pod has been modified: ```bash $ kubectl logs -n model-validation-operator-system deploy/model-validation-controller-manager time=2025-01-20T22:13:05.051Z level=INFO msg="Starting webhook server on :9443" time=2025-01-20T22:13:47.556Z level=INFO msg="new request, path: /mutate-v1-pod" time=2025-01-20T22:13:47.557Z level=INFO msg="Execute webhook" time=2025-01-20T22:13:47.560Z level=INFO msg="Search associated Model Validation CR" pod=whatever-workload namespace=testing time=2025-01-20T22:13:47.591Z level=INFO msg="construct args" time=2025-01-20T22:13:47.591Z level=INFO msg="found sigstore config" ``` Finally, the test pod should be running and the injected initcontainer should have been successfully validated. ```bash $ kubectl logs -n testing whatever-workload model-validation INFO:__main__:Creating verifier for sigstore INFO:tuf.api._payload:No signature for keyid f5312f542c21273d9485a49394386c4575804770667f2ddb59b3bf0669fddd2f INFO:tuf.api._payload:No signature for keyid ff51e17fcf253119b7033f6f57512631da4a0969442afcf9fc8b141c7f2be99c INFO:tuf.api._payload:No signature for keyid ff51e17fcf253119b7033f6f57512631da4a0969442afcf9fc8b141c7f2be99c INFO:tuf.api._payload:No signature for keyid ff51e17fcf253119b7033f6f57512631da4a0969442afcf9fc8b141c7f2be99c INFO:tuf.api._payload:No signature for keyid ff51e17fcf253119b7033f6f57512631da4a0969442afcf9fc8b141c7f2be99c INFO:__main__:Verifying model signature from /data/model.sig INFO:__main__:all checks passed ``` In case the workload is modified, is not executed: ```bash ERROR:__main__:verification failed: the manifests do not match ``` #### Ignore Options The `model` section of the ModelValidation CR supports additional options to control which files are included during verification: | Field | Type | Description | |-------|------|-------------| | `ignorePaths` | `[]string` | List of file paths to exclude from verification | | `ignoreGitPaths` | `bool` | When `true`, excludes git-related files (e.g., `.git/`, `.gitignore`) | | `ignoreUnsignedFiles` | `bool` | When `true`, unsigned files will not cause verification to fail | | `allowSymlinks` | `bool` | When `true`, symbolic links will be followed and their targets verified | Example with ignore options: ```yaml apiVersion: ml.sigstore.dev/v1alpha1 kind: ModelValidation metadata: name: demo spec: config: sigstoreConfig: certificateIdentity: "https://github.com/sigstore/model-validation-operator/.github/workflows/sign-model.yaml@refs/tags/v0.0.2" certificateOidcIssuer: "https://token.actions.githubusercontent.com" model: path: /data/tensorflow_saved_model signaturePath: /data/tensorflow_saved_model/model.sig ignorePaths: - /data/tensorflow_saved_model/cache - /data/tensorflow_saved_model/tmp ignoreGitPaths: true allowSymlinks: true ``` #### Pod Annotations Ignore options can also be specified or overridden on individual pods using annotations. Pod annotations take precedence over the ModelValidation CR settings. | Annotation | Value | Description | |------------|-------|-------------| | `validation.ml.sigstore.dev/ignore-paths` | Comma-separated paths | Paths to exclude from verification | | `validation.ml.sigstore.dev/ignore-git-paths` | `"true"` or `"false"` | Exclude git-related files | | `validation.ml.sigstore.dev/ignore-unsigned-files` | `"true"` or `"false"` | Allow unsigned files | | `validation.ml.sigstore.dev/allow-symlinks` | `"true"` or `"false"` | Follow symbolic links | Example pod with annotation overrides: ```yaml apiVersion: v1 kind: Pod metadata: name: whatever-workload labels: validation.ml.sigstore.dev/ml: "demo" annotations: validation.ml.sigstore.dev/ignore-paths: "/data/tensorflow_saved_model/logs,/data/tensorflow_saved_model/tmp" validation.ml.sigstore.dev/ignore-git-paths: "true" spec: # ... rest of pod spec ```