The Kubernetes architecture includes several mechanisms for data storage, each offering varying degrees of persistence and complexity. Each method requires some security expertise to deploy and manage properly.
The etcd component is the key-value data store for the cluster that stores metadata, and both desired and current state. This data can be security-sensitive, and allowing access to a single user may effectively grant that user cluster-admin privileges.
Persistent Volumes
Persistent Volumes (PVs) expose physical storage implementations to Kubernetes clusters so that data generated and stored by immutable containers can be retained for use beyond a container’s lifecycle. A PV can be backed by locally attached storage on a host node or networked storage systems. A PV can be accessed by multiple nodes in read-only mode or single pods in read-write modes. A PV can be created declaratively by specifying a persistent volume configuration (PVC) in a YAML file.
The PVC defines the volume’s access rights, disk size, and access mode. A PVC can be provisioned statically or dynamically. In static provisioning, a cluster administrator creates classes of storage that link to an external or internal storage system. A developer then claims a slice of this storage with PVC. A Pod can then mount the PVC.
If a stateful application uses PersistentVolumeClaim templates to define storage for its replicas, the controller can automatically bind the appropriate PVC to each replicated Pod in a Stateful Set. This binding happens regardless of some normal volume-matching criteria, including node affinity. The PVC’s claimed field references the associated PersistentVolumeClaim so that other claimers cannot bind to it. In addition, the PVC can be encrypted to protect confidential information at rest and in transit.
Shared Volumes
A shared volume is a file system mounted to multiple containers, allowing them to share access to data. This can be useful for workloads such as databases. However, it is important to consider security implications when using shared volumes. For example, if a shared volume is mapped to the container file system, it could expose sensitive information. It is also possible to run malicious code on the host machine if the shared volume contains security-sensitive folders accessible by multiple containers.
Persistent Volumes provide persistent kubernetes database storage for your cluster. The durable disk and data back this storage, which exists independently of pods, allows administrators to manage backups, performance, and capacity allocations easily. Persistent Volumes are accessed through PersistentVolumeClaims, which cluster administrators create to provision durable storage for applications.
PersistentVolumeClaims are matched to available storage through a StorageClass object that defines a volume plug-in and external provider (if applicable). This makes it easy for administrators to create new volumes on demand. This dynamic configuration is a major advantage over traditional static storage setups, as it can save administrators time and effort.
Etcd Snapshots
When a node in the etcd cluster fails, the entire cluster can be restored with a single snapshot “db” file. The snapshot is then copied to a data directory where other etcd nodes can find and use it for restoration. The restore operation overwrites some snapshot metadata (specifically, the member ID and cluster ID) to clarify to other etcd members that the new etcd is a fresh startup of a logical cluster.
This is a common backup and recovery strategy. A second reason for backing up etcd is to enable migration, which enables the transfer of application workloads and their associated data between Kubernetes clusters. This can be used for several purposes, including disaster recovery, freeing up capacity on high-priority clusters, or reassigning applications to different infrastructures.
To take an etcd snapshot, the command line utility ectdctl is used to create a binary image of the etcd state and write it to disk. Then a utility Pod that overwrites the existing etcd data with the snapshot can be deployed. The Pod must be scheduled to run on the node where the etcd is located and have access to that node’s data directory. The Pod can also include a pre and post-rule to quiesce the applications running on the PVC before and after the snapshot.
Cloud Storage
Kubernetes is a leading cloud-native container orchestration platform running many applications, including stateful workloads like databases. However, running a database in a Kubernetes cluster presents some unique challenges that must be addressed to ensure application performance, scalability, and security.
Most threats to a database cluster come from external actors, whether DDoS attackers trying to cripple a service or hackers trying to penetrate a cluster for long-term eavesdropping. In addition, internal vulnerabilities can allow data to be exposed or destroyed. Therefore, securing the application and its storage in a Kubernetes cluster is essential.
To secure a Kubernetes cluster, you must enforce Role Based Access Control (RBAC) and use dedicated service accounts for each application. This allows administrators to manage authorization per application and protects the integrity of each application’s data. Additionally, it’s critical to use multi-factor authentication and Transport Layer Security (TLS) for accessing the APIs that run the cluster components.
Finally, it’s important to ensure that all communication between a database and the rest of the system is encrypted with end-to-end encryption. This includes connections between the database and Kubernetes and between the database and frontend applications. Moreover, it’s crucial to have tools that provide a live map of all communications between databases and other systems in your Kubernetes environment.