背景

输入docker ps等docker相关命令会卡住,并且查看机器负载不高

于是就想重启docker重启下服务,但是重启后报错

[root@vlxdg1bdbi1 multi-user.target.wants]# systemctl status docker
 docker.service - Docker Daemon
   Loaded: loaded (/etc/systemd/system/multi-user.target.wants/docker.service; bad; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2024-10-29 09:52:17 CST; 6s ago
  Process: 370804 ExecStart=/usr/bin/dockerd --default-shm-size 1000000000 --default-ulimit nofile=50000:99999 (code=exited, status=1/FAILURE)
 Main PID: 370804 (code=exited, status=1/FAILURE)

Oct 29 09:52:16 vlxdg1bdbi1.ramaxel.local dockerd[370804]: time="2024-10-29T09:52:16+08:00" level=info msg=serving... address="/var/run/docker/cont...d/grpc"
Oct 29 09:52:16 vlxdg1bdbi1.ramaxel.local dockerd[370804]: time="2024-10-29T09:52:16+08:00" level=info msg="containerd successfully booted in 0.012...tainerd
Oct 29 09:52:16 vlxdg1bdbi1.ramaxel.local dockerd[370804]: time="2024-10-29T09:52:16.791661842+08:00" level=info msg="[graphdriver] using prior sto...erlay2"
Oct 29 09:52:16 vlxdg1bdbi1.ramaxel.local dockerd[370804]: time="2024-10-29T09:52:16.917209150+08:00" level=info msg="Graph migration to content-ad...econds"
Oct 29 09:52:16 vlxdg1bdbi1.ramaxel.local dockerd[370804]: time="2024-10-29T09:52:16.919943534+08:00" level=info msg="Loading containers: start."
Oct 29 09:52:17 vlxdg1bdbi1.ramaxel.local dockerd[370804]: time="2024-10-29T09:52:17.037589231+08:00" level=error msg="Failed to load container f62...ectory"
Oct 29 09:52:17 vlxdg1bdbi1.ramaxel.local dockerd[370804]: Error starting daemon: Error initializing network controller: error obtaining controller...atabase
Oct 29 09:52:17 vlxdg1bdbi1.ramaxel.local systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Oct 29 09:52:17 vlxdg1bdbi1.ramaxel.local systemd[1]: Unit docker.service entered failed state.
Oct 29 09:52:17 vlxdg1bdbi1.ramaxel.local systemd[1]: docker.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

报错中有个明显的Error starting daemon: Error initializing network controller: error obtaining controller...atabase字段,初始化网络出了问题

解决方法:

rm -rf /var/lib/docker/network/files/* #操作前可以备份下,cp到/tmp下

systemctl restart docker

然后docker状态就是active了

至于为什么输入docker命令会卡死,服务内核日志messages报错全是too many open files,如下所示

docker重启后起不来 - 图1

可以修改/etc/security/limits.conf文件,将句柄数调大,也可以临时设置下

ulimit -n 1000000