Okay, here is the detailed introductory guide to Kubernetes Custom Resource Definitions (CRDs).

Kubernetes Custom Resource Definitions (CRDs): An Introductory Guide

Introduction: Extending the Kubernetes Universe

Kubernetes has fundamentally changed how we deploy, manage, and scale containerized applications. At its core, Kubernetes provides a powerful API built around a set of well-defined resource types – Pods, Services, Deployments, StatefulSets, ConfigMaps, Secrets, and many more. These built-in resources cover a vast range of common infrastructure and application management needs. We interact with these resources declaratively: we define the desired state in a manifest file (typically YAML), apply it to the cluster, and Kubernetes controllers work tirelessly behind the scenes to reconcile the actual state with our desired state.

However, the world of software is diverse and complex. While the built-in Kubernetes resources are versatile, they can’t possibly cater to every specific need or represent every type of entity you might want to manage within your cluster. What if you want to manage a custom database cluster, a specific type of message queue, a CI/CD pipeline definition, or even a configuration specific to your organization’s internal platform using the same declarative Kubernetes patterns?

This is where Custom Resource Definitions (CRDs) come into play. CRDs are a powerful extensibility mechanism built into Kubernetes that allows you to define your own custom API objects, effectively teaching Kubernetes about new types of resources. Once a CRD is created, you can create instances of that custom resource (CRs) just like you would create a Pod or a Service, managing them using kubectl, integrating them with Kubernetes RBAC, and leveraging the entire Kubernetes ecosystem.

This guide provides a comprehensive introduction to Kubernetes CRDs. We will explore:

What CRDs and Custom Resources (CRs) are.
Why you would want to use them.
How to define and create a CRD.
How to create and manage instances (CRs) of your custom type.
The crucial role of Controllers and the Operator pattern.
Advanced CRD features and best practices.
How CRDs compare to alternative approaches.

By the end of this guide, you will understand the fundamental concepts behind CRDs and appreciate their significance in extending Kubernetes to manage virtually any kind of resource declaratively.

What are Custom Resource Definitions (CRDs) and Custom Resources (CRs)?

Let’s break down the two key concepts:

Custom Resource Definition (CRD):
- A CRD is itself a Kubernetes resource (kind: CustomResourceDefinition, apiVersion: apiextensions.k8s.io/v1).
- It defines a new, custom resource type within your Kubernetes cluster. Think of it like creating a schema or a blueprint for a new kind of object that the Kubernetes API server should understand.
- The CRD specifies the name of the new resource (kind), its API group, its version(s), its scope (namespaced or cluster-wide), and optionally, a validation schema for its structure.
- Creating a CRD object in your cluster dynamically registers this new resource type with the Kubernetes API server. No recompilation or restarting of the API server is needed.
Custom Resource (CR):
- A CR is an instance of a resource type defined by a CRD.
- If a CRD defines a kind: MyDatabase, then a specific YAML file describing a particular database instance (e.g., my-production-db) would be a Custom Resource (CR).
- CRs contain the actual data and configuration for an instance of your custom type, typically within a spec field (representing the desired state) and potentially a status field (representing the observed state, usually managed by a controller).
- You interact with CRs using standard Kubernetes tools like kubectl (kubectl get mydatabases, kubectl apply -f my-prod-db.yaml).

Analogy: Think of Object-Oriented Programming.
* A CRD is like defining a Class. It specifies the structure, properties, and potentially validation rules for objects of that class.
* A CR is like creating an Object or an Instance of that Class. It holds the specific data for one particular entity conforming to the class definition.

Once a CRD is registered, the Kubernetes API server provides RESTful endpoints for your custom resource type (e.g., /apis/<group>/<version>/namespaces/<namespace>/<plural-name>). This allows kubectl and other clients to interact with CRs just like built-in resources. They become first-class citizens within the Kubernetes API.

Why Use CRDs? The Benefits of Extending Kubernetes Natively

Before CRDs became stable, users often resorted to less ideal methods for managing custom configurations or application-specific state within Kubernetes:

ConfigMaps/Secrets: Storing configuration as raw data or key-value pairs. This lacks structure, validation, versioning, and fine-grained access control specific to the configuration’s purpose.
Annotations: Attaching arbitrary metadata to existing objects like Deployments or Services. This is unstructured, hard to query, lacks validation, and clutters the primary object’s metadata.
External Databases/Configuration Systems: Managing the configuration outside Kubernetes. This breaks the unified declarative model, requires separate tooling and access control, and makes it harder to tie application configuration lifecycle to the application’s Kubernetes resources.

CRDs offer significant advantages over these approaches:

Declarative Management: Define your custom resources using the same familiar YAML/JSON manifest format used for built-in Kubernetes resources. Store these manifests in version control (GitOps) and apply them using kubectl apply. Kubernetes’ control plane handles storing the object.
Native Kubernetes Tooling: Manage your CRs using kubectl (get, describe, apply, delete, edit, label, etc.). No need for separate CLIs or tools just for your custom types.
RBAC Integration: Apply standard Kubernetes Role-Based Access Control (RBAC) rules to your custom resources. You can grant specific users or service accounts permissions like get, list, watch, create, update, patch, delete on your custom resource types, either cluster-wide or within specific namespaces.
API Discovery: Your custom resources appear in API discovery mechanisms (e.g., kubectl api-resources).
Structured Data and Validation: CRDs allow you to define an OpenAPI v3 schema for your custom resources. The Kubernetes API server uses this schema to validate any CRs created or updated, ensuring data integrity and providing immediate feedback to users if their manifest is invalid. This is a massive improvement over unstructured ConfigMaps or annotations.
Watch Mechanism: Clients (like custom controllers) can “watch” for changes (creation, updates, deletion) to your custom resources via the Kubernetes API, just like they watch built-in resources. This is fundamental for building automation.
Audit Logging: Operations on your custom resources are recorded in the Kubernetes audit logs, providing visibility and traceability.
UI Integration: Kubernetes dashboards and UIs can often discover and display custom resources, providing a visual way to interact with them.
Enabling the Operator Pattern: CRDs are the cornerstone of the Operator pattern. An Operator is a custom controller that watches a specific CRD and takes domain-specific actions to manage the application or infrastructure represented by the CRs. For example, a Database CRD might be managed by an Operator that actually provisions the database instances, sets up replication, handles backups, and manages upgrades based on the spec defined in the CR.

By using CRDs, you leverage the robust, distributed, and extensible nature of the Kubernetes API machinery for your own specific needs, creating a unified and consistent management experience across your entire stack.

Defining a Custom Resource Definition (CRD)

Let’s dive into the structure of a CRD manifest. We’ll use an example of defining a custom resource for managing simple web applications, let’s call it WebApp.

“`yaml

webapp-crd.yaml

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
# Name must be ‘.‘
name: webapps.stable.example.com
spec:
# Group name used in the API: /apis//
group: stable.example.com

# List of versions supported by this CRD
versions:
– name: v1alpha1 # Version name, e.g., v1, v1beta1, v1alpha1
served: true # This version is enabled and served by the API server
storage: true # This version is the storage version (only one version can be storage: true)
schema:
# Define the structure and validation using OpenAPI v3 schema
openAPIV3Schema:
type: object
properties:
spec: # The desired state, defined by the user
type: object
properties:
image:
type: string
description: “The container image to deploy.”
replicas:
type: integer
minimum: 1
description: “Number of desired replicas.”
port:
type: integer
default: 80
description: “The port the container listens on.”
configMapRef:
type: string
description: “Optional: Name of a ConfigMap to mount as volume.”
serviceType:
type: string
enum: [“ClusterIP”, “NodePort”, “LoadBalancer”]
default: “ClusterIP”
description: “The type of Kubernetes Service to create.”
required: # Fields required within the spec
– image
– replicas
status: # The observed state, managed by a controller
type: object
properties:
conditions:
type: array
items:
type: object
properties:
type:
type: string
status:
type: string
enum: [“True”, “False”, “Unknown”]
lastTransitionTime:
type: string
format: date-time
reason:
type: string
message:
type: string
required:
– type
– status
availableReplicas:
type: integer
description: “Total number of available replicas.”
serviceUrl:
type: string
description: “URL of the exposed service, if applicable.”
# Mark status fields as read-only for users, managed by controller
x-kubernetes-preserve-unknown-fields: true # Required for status subresource
# Required fields at the top level (spec is usually required)
required:
– spec

  # Subresources allow standard controllers (like HPA) or kubectl commands to interact
  subresources:
    # /status enables the status subresource
    # Updates to the CR status field must go through the /status endpoint
    status: {}
    # /scale enables the scale subresource (/apis/<group>/<version>/namespaces/<ns>/<plural>/<name>/scale)
    # Allows interaction with HorizontalPodAutoscaler (HPA)
    scale:
      specReplicasPath: .spec.replicas   # Path to the desired replicas field in the CR spec
      statusReplicasPath: .status.availableReplicas # Path to the current replicas field in the CR status
      # Optional: label selector path for pods managed by this CR
      # labelSelectorPath: .status.labelSelector

  # Additional columns to print for 'kubectl get <plural>'
  additionalPrinterColumns:
    - name: Replicas
      type: integer
      description: The desired number of pods
      jsonPath: .spec.replicas
    - name: Available
      type: integer
      description: The number of available pods
      jsonPath: .status.availableReplicas
    - name: Image
      type: string
      description: The container image
      jsonPath: .spec.image
    - name: ServiceType
      type: string
      jsonPath: .spec.serviceType
    - name: Age
      type: date
      jsonPath: .metadata.creationTimestamp

# Define the scope: Namespaced or Cluster
scope: Namespaced # Or “Cluster” for cluster-wide resources (like Node, Namespace)

# Names specify how the resource is referred to
names:
plural: webapps # Used in the URL: /apis/// singular: webapp # Used as alias in kubectl and for display
kind: WebApp # Used in object manifests (Kind: WebApp)
shortNames: # Optional short aliases for kubectl (e.g., kubectl get wa)
– wa
# categories: # Optional grouping for kubectl get all or kubectl get <category>
# – all
# – exampleapps
“`

Let’s break down the key fields in the spec:

group: A unique name for your API group, typically using a DNS-style name (e.g., stable.example.com, batch.mycompany.io). This prevents collisions with built-in Kubernetes groups (like apps, batch, core) or other custom groups.
versions: A list of API versions for your custom resource. This allows you to evolve your API schema over time.
- name: The version identifier (e.g., v1, v1beta1, v1alpha1). Follow Kubernetes API versioning conventions (alpha -> beta -> stable).
- served: Boolean indicating if this version should be exposed via the Kubernetes API. You might have older versions defined but no longer served.
- storage: Boolean indicating which version should be used to store the CR object in etcd. Exactly one version must be marked as storage: true. When you retrieve an object, it can be served in any served: true version (with automatic conversion if necessary, potentially requiring conversion webhooks for complex changes), but it’s always stored in the format of the storage: true version.
scope: Determines if the custom resource is Namespaced (like Pods, Deployments) or Cluster-scoped (like Nodes, PersistentVolumes). Namespaced resources exist within a specific namespace, providing isolation, while Cluster resources are global. Choose based on whether your resource naturally belongs within a namespace or applies cluster-wide.
names: Defines how the resource type is identified.
- plural: The name used in API URLs and often in kubectl get (e.g., webapps). The CRD metadata.name must be <plural>.<group>.
- singular: The singular form (e.g., webapp). Used in some kubectl output and potentially as an alias.
- kind: The Kind value used in the CR manifest (e.g., WebApp). Must be CamelCase.
- shortNames: Optional short aliases for use with kubectl (e.g., kubectl get wa).
- categories: Optional list of category names. Allows grouping resources, e.g., kubectl get databases could list all CRs belonging to the databases category. all is a common category.
schema.openAPIV3Schema: This is crucial for defining the structure and validation rules for your CRs. It uses the OpenAPI v3 specification (a subset of it).
- type: object: The root of a CR is always an object.
- properties: Defines the fields within the object (like spec and status).
- spec: This object typically contains the fields that users set to define the desired state.
- status: This object typically contains fields that controllers update to reflect the observed state. It’s best practice to define a status structure.
- Inside properties, you define individual fields with their type (string, integer, boolean, object, array), description, validation rules (minimum, maximum, pattern, enum, format), and whether they are required.
- Using a schema enables server-side validation. When a user runs kubectl apply, the API server checks the CR against this schema before storing it. Invalid CRs are rejected with helpful error messages.
subresources: Optional feature to enable standard Kubernetes subresources for your CR.
- status: Enables the /status subresource. This is highly recommended. It ensures that updates to the .status field of a CR can only happen via the /status endpoint, preventing controllers and users from accidentally overwriting each other’s changes to spec and status in a single update operation. Controllers should always update status via this subresource. Requires x-kubernetes-preserve-unknown-fields: true in the status schema block (as of Kubernetes 1.20+, this is generally implicitly true for status but explicitly setting it can avoid issues in some versions/configurations).
- scale: Enables the /scale subresource. This allows standard tooling like the HorizontalPodAutoscaler (HPA) or kubectl scale to interact with your CR, provided you map the replica counts correctly using specReplicasPath and statusReplicasPath.
additionalPrinterColumns: Customize the output of kubectl get <plural-name>. You specify the column name, data type, and a JSONPath expression to extract the value from the CR. This makes it much easier for users to see relevant information at a glance without needing -o yaml or -o json.

Creating the CRD:

To create this CRD in your cluster, save the YAML above to a file (e.g., webapp-crd.yaml) and apply it:

“`bash
kubectl apply -f webapp-crd.yaml

Output: customresourcedefinition.apiextensions.k8s.io/webapps.stable.example.com created

“`

After a short delay, the API server will register the new WebApp resource type. You can verify this:

“`bash
kubectl api-resources | grep webapp

Output might look like:

webapps wa stable.example.com true WebApp

kubectl explain webapp

Output: Describes the WebApp kind and its fields based on the schema

kubectl explain webapp.spec

Output: Describes the fields within the spec

kubectl explain webapp.spec.replicas

Output: Describes the replicas field, including type and minimum value

“`

Creating and Managing Custom Resources (CRs)

Now that the WebApp CRD is registered, we can create instances (CRs) of the WebApp type.

Here’s an example manifest for a WebApp instance:

“`yaml

my-sample-app.yaml

apiVersion: stable.example.com/v1alpha1 # Matches the group and version from the CRD
kind: WebApp # Matches the Kind from the CRD names
metadata:
name: my-sample-app # Name of this specific WebApp instance
namespace: default # Namespace where this CR will live (since it’s Namespaced scope)
spec: # The desired state defined by us
image: nginx:1.21
replicas: 3
port: 8080
serviceType: LoadBalancer
# configMapRef: my-app-config # Optional field
“`

Let’s examine this CR manifest:

apiVersion: Must match the group and a served version defined in the CRD (e.g., stable.example.com/v1alpha1).
kind: Must match the kind defined in the CRD’s names (e.g., WebApp).
metadata: Standard Kubernetes object metadata, including name and namespace (if namespaced). Labels and annotations can also be added here.
spec: This is where you define the desired state according to the spec schema defined in the CRD. We’re asking for 3 replicas of nginx:1.21 listening on port 8080, exposed via a LoadBalancer Service.

Creating the CR:

Save the YAML above to my-sample-app.yaml and apply it:

“`bash
kubectl apply -f my-sample-app.yaml

Output: webapp.stable.example.com/my-sample-app created

“`

Interacting with the CR using kubectl:

Now you can use standard kubectl commands:

Get:
“`bash
kubectl get webapp my-sample-app -n default
# Output (using additionalPrinterColumns defined in CRD):
# NAME REPLICAS AVAILABLE IMAGE SERVICETYPE AGE
# my-sample-app 3 nginx:1.21 LoadBalancer 10s

kubectl get wa -n default # Using shortName

Output: (Same as above)

kubectl get webapps –all-namespaces # List all webapps across namespaces
`` *(Note: The 'AVAILABLE' column is` initially because no controller has updated the status yet.)*
Describe:
bash kubectl describe webapp my-sample-app -n default # Output: Shows metadata, spec fields, empty status (initially), and events
Get YAML/JSON:
bash kubectl get webapp my-sample-app -n default -o yaml # Output: Shows the full CR object as stored in etcd, including API version, kind, metadata, spec, and an empty or minimal status field.
Edit:
bash kubectl edit webapp my-sample-app -n default # Opens the CR manifest in your default editor. You can change spec fields (e.g., change replicas to 5). # On saving, kubectl applies the changes. The API server validates against the schema.
Delete:
bash kubectl delete webapp my-sample-app -n default # Output: webapp.stable.example.com "my-sample-app" deleted
Validation in Action:
Try creating a CR that violates the schema defined in the CRD:

“`yaml

invalid-app.yaml

apiVersion: stable.example.com/v1alpha1
kind: WebApp
metadata:
name: invalid-app
namespace: default
spec:
image: nginx:latest # Missing required ‘replicas’ field
port: “eighty” # Invalid type for ‘port’ (should be integer)
“`

“`bash
kubectl apply -f invalid-app.yaml

Output: Error message indicating validation failure

The WebApp “invalid-app” is invalid:

* spec.replicas: Required value

* spec.port: Invalid value: “eighty”: spec.port in body must be of type integer: “string”

“`
This immediate feedback is invaluable for users.

The Missing Piece: Automation

At this point, we have defined a new API type (WebApp) and created an instance (my-sample-app). The Kubernetes API server knows about it, stores it, validates it, and lets us manage it with kubectl.

However, nothing actually happens in the cluster based on this CR yet. Creating the WebApp CR doesn’t automatically create Deployments, Services, or Pods. The CR simply represents data – a desired state.

To make the WebApp CR do something, we need a controller.

Controllers and the Operator Pattern

A controller in Kubernetes is a background process that watches the state of resources via the API server and works to reconcile the current state towards the desired state. Kubernetes itself runs many built-in controllers (e.g., the Deployment controller watches Deployments and manages ReplicaSets; the ReplicaSet controller watches ReplicaSets and manages Pods).

When you introduce CRDs, you typically also need to implement a custom controller that understands your specific CRD. This controller will:

Watch: Use the Kubernetes API to watch for Create, Update, and Delete events for CRs of the type it manages (e.g., WebApp resources).
Analyze: For each event, compare the desired state defined in the CR’s spec with the actual state of the world (e.g., are the required Deployment and Service present? Do they match the spec?).
Act: Take actions to converge the actual state towards the desired state. This might involve creating, updating, or deleting other Kubernetes resources (like Deployments, Services, ConfigMaps, Secrets) or interacting with external systems (like provisioning a cloud database).
Update Status: Update the CR’s .status field to reflect the observed state (e.g., number of available replicas, service URL, conditions indicating readiness or errors). This update must go through the /status subresource if enabled.

The Reconciliation Loop: This Watch -> Analyze -> Act -> Update Status cycle is known as the reconciliation loop or control loop, and it’s the heart of how Kubernetes automation works. The controller continuously strives to make reality match the desired state defined in the CR.

Example Controller Logic for WebApp:

A controller for our WebApp CRD might perform the following reconciliation logic when it sees a WebApp CR:

Check WebApp exists: If the WebApp CR has been deleted, ensure any associated resources (Deployment, Service) are also deleted (garbage collection, often handled via Owner References).
Read WebApp.spec: Get the desired image, replicas, port, serviceType, configMapRef.
Manage Deployment:
- Check if a Deployment corresponding to the WebApp exists.
- If not, create a new Deployment configured with spec.image, spec.replicas, and spec.port. Mount spec.configMapRef if provided. Set an Owner Reference on the Deployment pointing to the WebApp CR (this ensures the Deployment is garbage collected when the WebApp is deleted).
- If it exists, check if its configuration matches the WebApp.spec. If not (e.g., image or replicas changed), update the Deployment.
Manage Service:
- Check if a Service corresponding to the WebApp exists.
- If not, create a new Service targeting the pods managed by the Deployment, using the correct spec.port and spec.serviceType. Set an Owner Reference.
- If it exists, check if its configuration matches the WebApp.spec. If not (e.g., serviceType changed), update the Service.
Update WebApp.status:
- Query the state of the managed Deployment and Service.
- Update WebApp.status.availableReplicas based on the Deployment’s status.
- Update WebApp.status.serviceUrl based on the Service’s status (e.g., LoadBalancer ingress IP).
- Set appropriate conditions (e.g., Available: True/False, Progressing: True/False).
- Perform the update using the /status subresource API call.

The Operator Pattern:

When a custom controller is designed specifically to manage a complex, stateful application or piece of infrastructure using CRDs, it’s often referred to as an Operator. The Operator encapsulates the operational knowledge required to run that application – how to deploy it, scale it, upgrade it, back it up, handle failures, etc. – directly into Kubernetes.

Examples of popular Operators:

Prometheus Operator: Manages Prometheus monitoring instances, Alertmanagers, and related configurations using CRDs like Prometheus, ServiceMonitor, Alertmanager.
etcd Operator: Manages etcd clusters, handling provisioning, backups, and recovery using an EtcdCluster CRD.
Database Operators (e.g., for PostgreSQL, MySQL, Cassandra): Manage database lifecycle, including setup, replication, failover, and backups, often triggered by CRs like PostgresqlCluster.
Certificate Management (cert-manager): Uses CRDs like Certificate and Issuer to automate the provisioning and renewal of TLS certificates.

Building Controllers/Operators:

Writing a controller from scratch involves interacting directly with the Kubernetes API (using client libraries like client-go for Go, or similar libraries for other languages), handling watch events, managing local caches (informers), implementing the reconciliation logic, and dealing with concurrency and error handling.

To simplify this, several frameworks exist:

Kubebuilder (Go): A popular framework from the Kubernetes SIGs (Special Interest Groups) that scaffolds projects with boilerplate code for CRDs, controllers, and webhooks.
Operator SDK (Go, Ansible, Helm): Another popular framework, originally from CoreOS/Red Hat, offering multiple ways to build Operators, including using Go, Ansible playbooks, or Helm charts to implement the reconciliation logic.
KUDO (Kubernetes Universal Declarative Operator): Allows building operators primarily using YAML definitions for operational phases (deploy, upgrade, backup, etc.).
Metacontroller: An add-on that allows implementing controllers using simple webhook scripts, abstracting away much of the complex Kubernetes API interaction.

CRDs provide the declarative API, but it’s the custom controller or Operator that brings that API to life by encoding the operational intelligence.

Advanced CRD Features

Beyond the basics, CRDs offer several advanced features:

Versioning and Schema Migration:
- As your custom resource evolves, you’ll need to introduce new API versions (e.g., move from v1alpha1 to v1beta1, then v1). The versions list in the CRD definition manages this.
- When you have multiple served: true versions, Kubernetes can automatically convert CRs between these versions if the schemas are structurally compatible.
- For incompatible schema changes between versions (e.g., renaming a field, changing its structure significantly), you need to implement a Conversion Webhook. This is an HTTPS endpoint (typically running as a Pod in the cluster) that Kubernetes calls to convert CRs between different API versions on demand. The CRD definition is configured with details about this webhook. This ensures users can interact with the API using different versions, while the object is consistently stored in the single storage: true version format.
Validation Webhooks (Admission Controllers):
- While the OpenAPI schema provides static validation, sometimes you need more complex, dynamic validation logic (e.g., checking against other resources in the cluster, enforcing cross-field dependencies, validating against external systems).
- Validating Admission Webhooks allow you to implement such logic. When a CR is created or updated, the API server calls your webhook. The webhook inspects the CR and returns whether the operation should be allowed or denied (with an error message).
- These are configured via ValidatingWebhookConfiguration resources.
Mutation Webhooks (Admission Controllers):
- Sometimes you want to automatically modify a CR during creation or update, for example, to set default values based on complex logic or inject sidecar containers based on annotations.
- Mutating Admission Webhooks allow this. They are called before validation webhooks and schema validation. Your webhook receives the CR, can modify it (e.g., add default fields), and returns the modified object.
- These are configured via MutatingWebhookConfiguration resources.
Finalizers:
- When a user requests deletion of a Kubernetes object (kubectl delete ...), the object isn’t immediately removed from the API server. Instead, it’s marked with a deletionTimestamp, and the controllers responsible for it get a chance to perform cleanup actions.
- Finalizers are identifiers added to an object’s metadata.finalizers list. As long as this list is not empty, Kubernetes will not physically delete the object.
- A custom controller can add its own finalizer to a CR upon creation. When a deletion request comes in, the controller detects the deletionTimestamp. It then performs necessary cleanup actions (e.g., deprovisioning external resources managed by the CR, ensuring child objects are gone). Once cleanup is complete, the controller removes its finalizer from the list. When the metadata.finalizers list becomes empty, Kubernetes finally removes the CR object.
- This ensures that resources managed by the CR are properly cleaned up before the CR itself disappears.

These advanced features provide fine-grained control over the lifecycle, validation, and evolution of your custom resources.

Best Practices for Designing CRDs

Designing a good CRD is like designing any API – usability, clarity, and maintainability are key.

Follow Kubernetes API Conventions: Align your CRD structure and field names with built-in Kubernetes resources where applicable (e.g., use replicas, image, selector, status.conditions). This makes your CRD feel familiar to Kubernetes users.
Clear Separation of spec and status: Use spec exclusively for the desired state defined by the user. Use status exclusively for the observed state reported by the controller. Enable the /status subresource.
Robust Validation: Leverage the openAPIV3Schema extensively. Define types, formats, patterns, enums, minimums, maximums, and required fields. Good validation prevents bad data and provides excellent user feedback. Consider validation webhooks for more complex rules.
Use status.conditions: Adopt the standard conditions pattern (with type, status, reason, message, lastTransitionTime) in your status field to provide detailed, standardized status reporting. This helps users and other tools understand the state of your resource.
Plan for Versioning Early: Think about how your API might evolve. Start with an alpha (v1alpha1) or beta (v1beta1) version. Be mindful that changing schemas later might require conversion webhooks.
Choose Scope Wisely: Decide carefully between Namespaced and Cluster scope based on the nature of the resource. Most application-level resources are Namespaced.
Keep spec User-Focused: The spec should represent the user’s intent clearly and concisely. Avoid including implementation details or runtime state in the spec.
Idempotent Controller Logic: Ensure your controller’s reconciliation logic is idempotent – running it multiple times with the same input CR should produce the same end state without unintended side effects.
Owner References: Have your controller set OwnerReferences on any objects it creates (Deployments, Services, etc.) pointing back to the parent CR. This enables automatic garbage collection by Kubernetes when the CR is deleted.
Use additionalPrinterColumns: Make kubectl get output informative for your users by defining relevant columns.
Provide Documentation: Document your CRD’s fields, purpose, and usage, just like any other API.

CRDs vs. Alternatives Revisited

Now that we understand CRDs and controllers, let’s briefly revisit why they are often superior to alternatives for managing Kubernetes-native configurations:

vs. ConfigMaps: CRDs offer structure, schema validation, versioning, fine-grained RBAC, status reporting, and the ability to trigger complex automation via controllers. ConfigMaps are just unstructured key-value pairs or files.
vs. Annotations: Annotations are unstructured metadata, not suitable for defining desired state. They lack validation, versioning, RBAC (beyond the parent object), and are hard to query or watch reliably for automation.
vs. External Systems: While external systems might be necessary for some things, managing application or infrastructure configuration natively within Kubernetes using CRDs provides a unified API, consistent tooling (kubectl), integrated RBAC and auditing, and enables powerful automation through the Operator pattern, tightly coupling the configuration lifecycle with the application lifecycle within the cluster.

CRDs provide the most Kubernetes-native way to extend the API for declarative configuration and management.

Conclusion

Custom Resource Definitions are a cornerstone of Kubernetes’ extensibility. They transform Kubernetes from a container orchestrator with a fixed set of resource types into a general-purpose declarative API platform.

By defining CRDs, you can model virtually any concept – applications, infrastructure components, configuration policies, business workflows – as native Kubernetes objects. This allows you to leverage the power of the Kubernetes control plane, kubectl, RBAC, and the declarative model for managing these custom concepts.

While CRDs define the what (the API), custom controllers and Operators provide the how (the automation and operational logic). Together, they enable the powerful Operator pattern, allowing you to encapsulate domain-specific knowledge for managing complex applications directly within Kubernetes.

From simple application configurations using WebApp CRs to sophisticated database cluster management via Operators, CRDs provide the foundation for building robust, declarative, and automated systems on Kubernetes. Understanding CRDs is essential for anyone looking to tailor Kubernetes to their specific needs and unlock its full potential as an extensible control plane. As you encounter more complex scenarios in Kubernetes, chances are CRDs and Operators will be a key part of the solution.