阿里云Ubuntu24.04上K8S集群搭建避坑指南:从Docker到CRI-Dockerd全流程解析 阿里云Ubuntu24.04上K8S集群搭建避坑指南从Docker到CRI-Dockerd全流程解析在云原生技术快速发展的今天KubernetesK8S已成为容器编排领域的事实标准。对于需要在阿里云Ubuntu24.04系统上部署K8S集群的开发者来说从Docker过渡到CRI-Dockerd是一个常见但充满挑战的过程。本文将深入解析这一过程中的关键步骤和常见陷阱帮助您高效完成集群搭建。1. 环境准备与系统优化在开始K8S集群部署前合理的系统配置是确保后续流程顺利的基础。阿里云Ubuntu24.04系统虽然已经做了部分优化但仍需进行针对性调整。1.1 服务器规格与网络规划建议至少准备三台配置相同的服务器如8核16GB分别作为master节点和worker节点。阿里云内网环境下需要注意以下几点确保所有实例位于同一VPC网络内安全组需开放6443API Server、2379-2380etcd、10250kubelet等端口建议使用HAVIP高可用虚拟IP作为控制平面端点# 检查网络连通性示例 ping 172.31.0.61 telnet 172.31.0.61 64431.2 系统基础配置阿里云Ubuntu24.04已默认禁用swap但仍需确认以下配置# 禁用swap确认已禁用 sudo swapoff -a sudo sed -i /swap/s/^/#/ /etc/fstab # 设置主机名各节点不同 sudo hostnamectl set-hostname k8s-master01 # 主节点 sudo hostnamectl set-hostname k8s-node01 # 工作节点 # 更新/etc/hosts echo 172.31.0.61 k8s-master01 | sudo tee -a /etc/hosts echo 172.31.1.61 k8s-node01 | sudo tee -a /etc/hosts1.3 内核参数调优K8S对Linux内核参数有特定要求特别是网络相关设置# 配置内核参数 cat EOF | sudo tee /etc/sysctl.d/k8s.conf net.ipv4.ip_forward 1 net.bridge.bridge-nf-call-iptables 1 net.bridge.bridge-nf-call-ip6tables 1 EOF # 应用配置 sudo sysctl --system2. 容器运行时选择与配置K8S自1.24版本起移除了对Docker的直接支持需要通过CRI-Dockerd作为适配层。2.1 Docker引擎安装与配置虽然不再直接使用Docker作为运行时但仍需安装Docker作为底层引擎# 安装依赖 sudo apt-get update sudo apt-get install -y ca-certificates curl gnupg # 添加Docker官方GPG密钥 sudo install -m 0755 -d /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg sudo chmod ar /etc/apt/keyrings/docker.gpg # 设置仓库 echo \ deb [arch$(dpkg --print-architecture) signed-by/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \ $(. /etc/os-release echo $VERSION_CODENAME) stable | \ sudo tee /etc/apt/sources.list.d/docker.list /dev/null # 安装Docker引擎 sudo apt-get update sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin配置Docker以适配K8S需求// /etc/docker/daemon.json { exec-opts: [native.cgroupdriversystemd], log-driver: json-file, log-opts: { max-size: 10m, max-file: 3 }, storage-driver: overlay2 }2.2 CRI-Dockerd安装与配置CRI-Dockerd是连接K8S与Docker的桥梁安装过程如下# 下载最新版CRI-Dockerd wget https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.18/cri-dockerd-0.3.18.amd64.tgz # 解压并安装 tar xvf cri-dockerd-0.3.18.amd64.tgz sudo cp cri-dockerd/cri-dockerd /usr/local/bin/创建systemd服务文件# /etc/systemd/system/cri-docker.service [Unit] DescriptionCRI Interface for Docker Application Container Engine Documentationhttps://docs.mirantis.com Afternetwork-online.target firewalld.service docker.service Wantsnetwork-online.target Requirescri-docker.socket [Service] Typenotify ExecStart/usr/local/bin/cri-dockerd \ --pod-infra-container-imageregistry.aliyuncs.com/google_containers/pause:3.10 \ --network-plugincni \ --cni-conf-dir/etc/cni/net.d \ --cni-bin-dir/opt/cni/bin Restartalways RestartSec2 LimitNOFILEinfinity LimitNPROCinfinity LimitCOREinfinity TasksMaxinfinity [Install] WantedBymulti-user.target启用并启动服务sudo systemctl daemon-reload sudo systemctl enable cri-docker.service sudo systemctl start cri-docker.service3. K8S组件安装与集群初始化3.1 Kubeadm、Kubelet和Kubectl安装使用阿里云镜像源加速安装# 添加K8S apt仓库 sudo apt-get install -y apt-transport-https ca-certificates curl curl -fsSL https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg echo deb [signed-by/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main | sudo tee /etc/apt/sources.list.d/kubernetes.list # 安装组件 sudo apt-get update sudo apt-get install -y kubelet kubeadm kubectl sudo apt-mark hold kubelet kubeadm kubectl3.2 集群初始化主节点初始化命令sudo kubeadm init \ --apiserver-advertise-address172.31.0.61 \ --control-plane-endpointcluster-endpoint \ --image-repositoryregistry.aliyuncs.com/google_containers \ --service-cidr10.96.0.0/16 \ --pod-network-cidr192.168.0.0/16 \ --cri-socketunix:///var/run/cri-dockerd.sock成功初始化后按照提示配置kubectlmkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config工作节点加入集群需替换实际token和hashsudo kubeadm join cluster-endpoint:6443 \ --token token \ --discovery-token-ca-cert-hash sha256:hash \ --cri-socketunix:///var/run/cri-dockerd.sock4. 网络插件与附加组件部署4.1 Calico网络插件安装Calico是K8S推荐的网络插件之一适合生产环境使用# 下载Calico manifest curl https://docs.projectcalico.org/manifests/calico.yaml -O # 修改Pod CIDR与初始化时一致 sed -i s/192.168.0.0/192.168.0.0/ calico.yaml # 部署Calico kubectl apply -f calico.yaml验证网络插件状态kubectl get pods -n kube-system -w4.2 Metrics Server部署Metrics Server提供集群资源监控数据# 下载部署文件 wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml # 修改配置添加--kubelet-insecure-tls参数 sed -i /- --metric-resolution/a\ - --kubelet-insecure-tls components.yaml # 部署Metrics Server kubectl apply -f components.yaml验证安装kubectl top nodes kubectl top pods -A5. 私有镜像仓库集成与维护5.1 配置K8S访问私有仓库在需要使用私有镜像的命名空间创建secretkubectl create secret docker-registry aliyun-registry \ --docker-serverregistry.cn-shanghai.aliyuncs.com \ --docker-usernameyour-username \ --docker-passwordyour-password \ --namespacetarget-namespace在Deployment中引用secretspec: template: spec: imagePullSecrets: - name: aliyun-registry containers: - name: my-app image: registry.cn-shanghai.aliyuncs.com/your-repo/your-image:tag5.2 镜像与存储维护定期清理无用镜像和存储# Docker存储清理 docker system prune -af # 查看存储使用情况 docker system df kubectl get pv,pvc -A6. 集群升级与证书管理6.1 证书更新K8S集群证书默认有效期为1年可通过以下命令更新# 检查证书有效期 kubeadm certs check-expiration # 更新所有证书 kubeadm certs renew all # 重启控制平面组件 sudo systemctl restart kubelet6.2 集群版本升级升级K8S版本的标准流程# 1. 升级kubeadm sudo apt-get update sudo apt-get install -y kubeadm1.33.2-00 # 2. 升级控制平面 sudo kubeadm upgrade plan sudo kubeadm upgrade apply v1.33.2 # 3. 升级kubelet和kubectl sudo apt-get update sudo apt-get install -y kubelet1.33.2-00 kubectl1.33.2-00 sudo systemctl daemon-reload sudo systemctl restart kubelet工作节点升级流程# 1. 腾空节点 kubectl drain node-name --ignore-daemonsets # 2. 升级kubeadm和kubelet sudo apt-get update sudo apt-get install -y kubeadm1.33.2-00 kubelet1.33.2-00 # 3. 升级节点配置 sudo kubeadm upgrade node # 4. 恢复节点 kubectl uncordon node-name7. 常见问题排查与优化7.1 节点NotReady状态排查当节点出现NotReady状态时可按以下步骤排查# 查看节点详细信息 kubectl describe node node-name # 检查kubelet日志 journalctl -u kubelet -f # 验证网络连通性 ping node-ip telnet node-ip 102507.2 Pod创建失败排查Pod创建失败的常见原因及解决方案错误现象可能原因解决方案ImagePullBackOff镜像拉取失败检查镜像地址、pull secretCrashLoopBackOff应用启动失败查看Pod日志(kubectl logs)Pending资源不足检查资源配额、节点状态ContainerCreating容器运行时问题检查Docker/CRI-Dockerd日志7.3 性能优化建议调整kubelet的--max-pods参数默认110为关键组件如kube-apiserver配置资源请求和限制使用NodeSelector或Affinity分散控制平面Pod考虑使用Local SSD提升etcd性能# 示例kube-apiserver资源限制 apiVersion: v1 kind: Pod metadata: name: kube-apiserver namespace: kube-system spec: containers: - name: kube-apiserver resources: requests: cpu: 2 memory: 4Gi limits: cpu: 4 memory: 8Gi