ceph添加节点添加节点过程Ceph采用共享秘钥进行身份验证 使用命令“ceph cephadm get-pub-key” 获取到主机接入集群时所需的ssh 公钥。获取到公钥后 使用该公钥实现对节点的免密ssh管理。使用命令“ceph orch host add” 添加主机。# 为了配置方便我们在ceph1上安装ceph客户端工具 ceph-common[rootceph1 ~]# dnf install -y ceph-common# 获取集群公钥[rootceph1 ~]# ceph cephadm get-pub-key ~/ceph.pub# 推送公钥到其他节点[rootceph1 ~]# ssh-copy-id -f -i ~/ceph.pub rootceph2.laogao.cloud[rootceph1 ~]# ssh-copy-id -f -i ~/ceph.pub rootceph3.laogao.cloud# 添加节点[rootceph1 ~]# ceph orch host add ceph2.laogao.cloudAddedhostceph2.laogao.cloudwith addr192.168.108.12[rootceph1 ~]# ceph orch host add ceph3.laogao.cloudAddedhostceph3.laogao.cloudwith addr192.168.108.13[rootceph1 ~]# ceph orch host lsHOST ADDR LABELS STATUS ceph1.laogao.cloud192.168.108.11 _admin ceph2.laogao.cloud192.168.108.12 ceph3.laogao.cloud192.168.108.133hostsincluster# 等待自动部署服务到其他节点部署完成后效果如下[rootceph1 ~]# ceph orch lsNAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,90941/1 8m ago 9m count:1 crash3/3 8m ago 9m * grafana ?:30001/1 8m ago 9m count:1 mgr2/2 8m ago 9m count:2 mon3/5 8m ago 9m count:5 node-exporter ?:91003/3 8m ago 9m * prometheus ?:90951/1 8m ago 9m count:1# crash 3/3个# mgr 2/2个# mon 3/5个# node-exporter 3/3个部署 mon 和 mgr# 禁用 mon 和 mgr 服务的自动扩展功能[rootceph1 ~]# ceph orch apply mon --unmanagedtrue[rootceph1 ~]# ceph orch apply mgr --unmanagedtrue[rootceph1 ~]# ceph orch lsNAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,90941/1 56s ago 12m count:1 crash3/3 57s ago 12m * grafana ?:30001/1 56s ago 12m count:1 mgr2/2 57s ago 3sunmanagedmon3/5 57s ago 8sunmanagednode-exporter ?:91003/3 57s ago 12m * prometheus ?:90951/1 56s ago 12m count:1# mon 和 mgr 的 PLACEMENT 状态为 unmanaged# 配置主机标签ceph2 和 ceph3 添加标签“ _admin”[rootceph1 ~]# ceph orch host label add ceph2.laogao.cloud _adminAdded label _admin tohostceph2.laogao.cloud[rootceph1 ~]# ceph orch host label add ceph3.laogao.cloud _adminAdded label _admin tohostceph3.laogao.cloud[rootceph1 ~]# ceph orch host lsHOST ADDR LABELS STATUS ceph1.laogao.cloud192.168.108.11 _admin ceph2.laogao.cloud192.168.108.12 _admin ceph3.laogao.cloud192.168.108.13 _admin3hostsincluster# 将 mon 和 mgr 组件部署到具有_admin标签的节点上[rootceph1 ~]# ceph orch apply mon --placementlabel:_adminScheduled mon update...[rootceph1 ~]# ceph orch apply mgr --placementlabel:_adminScheduled mgr update...#观察现象[rootceph1 ~]# ceph orch ls | egrep mon|mgrmgr3/3 2m ago 14s label:_admin mon3/3 2m ago 28s label:_admin[rootceph1 ~]# ceph orch ps | egrep mon|mgr部署 OSD# 将所有主机上闲置的硬盘添加为 OSD[rootceph1 ~]# ceph orch apply osd --all-available-devicesScheduled osd.all-available-devices update...验证查看集群中部署的服务[rootceph1 ~]# ceph orch lsNAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,90941/1 3s ago 15m count:1 crash3/3 4s ago 15m * grafana ?:30001/1 3s ago 15m count:1 mgr3/3 4s ago 2m label:_admin mon3/3 4s ago 2m label:_admin node-exporter ?:91003/3 4s ago 15m * osd.all-available-devices94s ago 25s * prometheus ?:90951/1 3s ago 15m count:1部分输出说明**RUNNING**服务的运行状态前一个数字表示当前运行的服务数量后一个数字表示系统根据策略或配置推荐的服务部署数量。**PLACEMENT**为服务编排器部署服务时提供的参数编排器可根据该参数判断服务所部署的节点常见的 placement 包括具体节点名称例如 --placementceph2标签例如 --placement“label:mylabel”数量例如 --placement“3 host1 host2 host3”unmanaged 表示服务不自动部署。通过设置 --unmanaged 为 true 打开该功能设置为false 关闭该功能查看集群状态[rootceph1 ~]# ceph -scluster: id: 2faf683a-7cbf-11f0-b5ba-000c29e0ad0e health: HEALTH_OK services: mon:3daemons, quorum ceph1.laogao.cloud,ceph2,ceph3(age 6m)mgr: ceph1.laogao.cloud.zoqmbt(active, since 15m), standbys: ceph2.oetbal, ceph3.npaxvt osd:9osds:9up(since 30s),9in(since 45s)data: pools:1pools,1pgs objects:0objects,0B usage:2.6GiB used,177GiB /180GiB avail pgs:1activeclean命令 ceph -s 对应的长命令为 ceph --status 。输出包含MON、 MGR及OSD的状态包括数量、位置及运行时间。集群的健康状态可分为HEALTH_OK表示健康状态良好HEALTH_WARN表示集群存在告警需进行排查处理后可转为HEALTH_OKHEALTH_ERR表示集群存在比较严重的错误需要立即处理查看集群 osd 结构[rootceph1 ~]# ceph osd treeID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF-10.17537root default-30.05846hostceph10hdd0.01949osd.0 up1.000001.000003hdd0.01949osd.3 up1.000001.000006hdd0.01949osd.6 up1.000001.00000-50.05846hostceph22hdd0.01949osd.2 up1.000001.000004hdd0.01949osd.4 up1.000001.000007hdd0.01949osd.7 up1.000001.00000-70.05846hostceph31hdd0.01949osd.1 up1.000001.000005hdd0.01949osd.5 up1.000001.000008hdd0.01949osd.8 up1.000001.00000查看集群组件集群中运行的主要组件mgrceph 管理程序monitorceph 监视器osdceph 对象存储进程rgwceph 对象存储网关其他组件crash崩溃数据收集模块prometheus监控组件grafana监控数据展示dashboardalertmanagerprometheus告警组件node_exporterprometheus节点数据收集组件查询出服务的具体情况后 可对指定服务进一步操作使用命令 ceph orch daemon start|stop|restart|redeploy|reconfig service_name 对指定服务进行启动、停止、重启等操作。使用命令 ceph orch daemon rm service_name [–force] 可删除指定服务。这时关闭所有ceph存储节点。并打快照便于后续做实验。
20260527 ceph添加节点
发布时间:2026/5/28 23:49:49
ceph添加节点添加节点过程Ceph采用共享秘钥进行身份验证 使用命令“ceph cephadm get-pub-key” 获取到主机接入集群时所需的ssh 公钥。获取到公钥后 使用该公钥实现对节点的免密ssh管理。使用命令“ceph orch host add” 添加主机。# 为了配置方便我们在ceph1上安装ceph客户端工具 ceph-common[rootceph1 ~]# dnf install -y ceph-common# 获取集群公钥[rootceph1 ~]# ceph cephadm get-pub-key ~/ceph.pub# 推送公钥到其他节点[rootceph1 ~]# ssh-copy-id -f -i ~/ceph.pub rootceph2.laogao.cloud[rootceph1 ~]# ssh-copy-id -f -i ~/ceph.pub rootceph3.laogao.cloud# 添加节点[rootceph1 ~]# ceph orch host add ceph2.laogao.cloudAddedhostceph2.laogao.cloudwith addr192.168.108.12[rootceph1 ~]# ceph orch host add ceph3.laogao.cloudAddedhostceph3.laogao.cloudwith addr192.168.108.13[rootceph1 ~]# ceph orch host lsHOST ADDR LABELS STATUS ceph1.laogao.cloud192.168.108.11 _admin ceph2.laogao.cloud192.168.108.12 ceph3.laogao.cloud192.168.108.133hostsincluster# 等待自动部署服务到其他节点部署完成后效果如下[rootceph1 ~]# ceph orch lsNAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,90941/1 8m ago 9m count:1 crash3/3 8m ago 9m * grafana ?:30001/1 8m ago 9m count:1 mgr2/2 8m ago 9m count:2 mon3/5 8m ago 9m count:5 node-exporter ?:91003/3 8m ago 9m * prometheus ?:90951/1 8m ago 9m count:1# crash 3/3个# mgr 2/2个# mon 3/5个# node-exporter 3/3个部署 mon 和 mgr# 禁用 mon 和 mgr 服务的自动扩展功能[rootceph1 ~]# ceph orch apply mon --unmanagedtrue[rootceph1 ~]# ceph orch apply mgr --unmanagedtrue[rootceph1 ~]# ceph orch lsNAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,90941/1 56s ago 12m count:1 crash3/3 57s ago 12m * grafana ?:30001/1 56s ago 12m count:1 mgr2/2 57s ago 3sunmanagedmon3/5 57s ago 8sunmanagednode-exporter ?:91003/3 57s ago 12m * prometheus ?:90951/1 56s ago 12m count:1# mon 和 mgr 的 PLACEMENT 状态为 unmanaged# 配置主机标签ceph2 和 ceph3 添加标签“ _admin”[rootceph1 ~]# ceph orch host label add ceph2.laogao.cloud _adminAdded label _admin tohostceph2.laogao.cloud[rootceph1 ~]# ceph orch host label add ceph3.laogao.cloud _adminAdded label _admin tohostceph3.laogao.cloud[rootceph1 ~]# ceph orch host lsHOST ADDR LABELS STATUS ceph1.laogao.cloud192.168.108.11 _admin ceph2.laogao.cloud192.168.108.12 _admin ceph3.laogao.cloud192.168.108.13 _admin3hostsincluster# 将 mon 和 mgr 组件部署到具有_admin标签的节点上[rootceph1 ~]# ceph orch apply mon --placementlabel:_adminScheduled mon update...[rootceph1 ~]# ceph orch apply mgr --placementlabel:_adminScheduled mgr update...#观察现象[rootceph1 ~]# ceph orch ls | egrep mon|mgrmgr3/3 2m ago 14s label:_admin mon3/3 2m ago 28s label:_admin[rootceph1 ~]# ceph orch ps | egrep mon|mgr部署 OSD# 将所有主机上闲置的硬盘添加为 OSD[rootceph1 ~]# ceph orch apply osd --all-available-devicesScheduled osd.all-available-devices update...验证查看集群中部署的服务[rootceph1 ~]# ceph orch lsNAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,90941/1 3s ago 15m count:1 crash3/3 4s ago 15m * grafana ?:30001/1 3s ago 15m count:1 mgr3/3 4s ago 2m label:_admin mon3/3 4s ago 2m label:_admin node-exporter ?:91003/3 4s ago 15m * osd.all-available-devices94s ago 25s * prometheus ?:90951/1 3s ago 15m count:1部分输出说明**RUNNING**服务的运行状态前一个数字表示当前运行的服务数量后一个数字表示系统根据策略或配置推荐的服务部署数量。**PLACEMENT**为服务编排器部署服务时提供的参数编排器可根据该参数判断服务所部署的节点常见的 placement 包括具体节点名称例如 --placementceph2标签例如 --placement“label:mylabel”数量例如 --placement“3 host1 host2 host3”unmanaged 表示服务不自动部署。通过设置 --unmanaged 为 true 打开该功能设置为false 关闭该功能查看集群状态[rootceph1 ~]# ceph -scluster: id: 2faf683a-7cbf-11f0-b5ba-000c29e0ad0e health: HEALTH_OK services: mon:3daemons, quorum ceph1.laogao.cloud,ceph2,ceph3(age 6m)mgr: ceph1.laogao.cloud.zoqmbt(active, since 15m), standbys: ceph2.oetbal, ceph3.npaxvt osd:9osds:9up(since 30s),9in(since 45s)data: pools:1pools,1pgs objects:0objects,0B usage:2.6GiB used,177GiB /180GiB avail pgs:1activeclean命令 ceph -s 对应的长命令为 ceph --status 。输出包含MON、 MGR及OSD的状态包括数量、位置及运行时间。集群的健康状态可分为HEALTH_OK表示健康状态良好HEALTH_WARN表示集群存在告警需进行排查处理后可转为HEALTH_OKHEALTH_ERR表示集群存在比较严重的错误需要立即处理查看集群 osd 结构[rootceph1 ~]# ceph osd treeID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF-10.17537root default-30.05846hostceph10hdd0.01949osd.0 up1.000001.000003hdd0.01949osd.3 up1.000001.000006hdd0.01949osd.6 up1.000001.00000-50.05846hostceph22hdd0.01949osd.2 up1.000001.000004hdd0.01949osd.4 up1.000001.000007hdd0.01949osd.7 up1.000001.00000-70.05846hostceph31hdd0.01949osd.1 up1.000001.000005hdd0.01949osd.5 up1.000001.000008hdd0.01949osd.8 up1.000001.00000查看集群组件集群中运行的主要组件mgrceph 管理程序monitorceph 监视器osdceph 对象存储进程rgwceph 对象存储网关其他组件crash崩溃数据收集模块prometheus监控组件grafana监控数据展示dashboardalertmanagerprometheus告警组件node_exporterprometheus节点数据收集组件查询出服务的具体情况后 可对指定服务进一步操作使用命令 ceph orch daemon start|stop|restart|redeploy|reconfig service_name 对指定服务进行启动、停止、重启等操作。使用命令 ceph orch daemon rm service_name [–force] 可删除指定服务。这时关闭所有ceph存储节点。并打快照便于后续做实验。