-
Kubernetes中的服務發現與故障排除(本篇)
-
Docker與Kubernetes在WayBlazer的使用案例(敬請期待)
-
它們是孤立的,你和你要監視的行程之間存在一定的障礙,在主機上執行的傳統故障排除工具不瞭解容器、名稱空間和編排平臺。
-
它們帶來最小的執行時間,只需要服務及其依賴項,而不需要所有的故障排除工具,想象下只用busybox來進行故障排除!
-
它們排程在你叢集、進行容器移動,擴容或者縮容。隨著流程的結束,它們變得非常不穩定,出現並消失。
-
並透過新的虛擬網路層互相通訊。
-
第一部分:Kubernetes故障排除之服務發現
-
第二部分:Kubernetes故障排除之DNS解析
-
第三部分:Kubernetes故障排除之使用Docker執行容器
$ kubectl create namespace critical-app
namespace "critical-app" created
$ kubectl create -f backend.yaml
service "backend" created
deployment "backend" created
$ kubectl run -it --image=tutum/curl client --namespace critical-app --restart=Never
$ sudo sysdig -k http://127.0.0.1:8080 -s8192 -zw capture.scap
-
-k http://localhost:8080 連線Kubernetes API
-
-s8192 擴大IO緩衝區,因為我們需要顯示全部內容,否則預設情況下會被切斷,
-
-zw capture.scap 將系統呼叫與元資料資訊壓縮並dump到檔案中
$ sysdig -r capture.scap -pk -NA "fd.type in (ipv4, ipv6) and (k8s.ns.name=critical-app or proc.name=skydns)" | less
-
-r capture.scap 讀取捕獲檔案
-
-pk 列印Kubernetes部分到stdout中
-
-NA 顯示ASCII輸出
[...]
10030 16:41:39.536689965 0 client (b3a718d8b339) curl (22370:13) < socket fd=3(<4>)
10031 16:41:39.536694724 0 client (b3a718d8b339) curl (22370:13) > connect fd=3(<4>)
10032 16:41:39.536703160 0 client (b3a718d8b339) curl (22370:13) < connect res=0 tuple=172.17.0.7:46162->10.0.2.15:53
10048 16:41:39.536831645 1 (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53)
10049 16:41:39.536834352 1 (36ae6d09d26e) skydns (17280:11) < recvmsg res=87 size=87 data=
backendcritical-appsvcclusterlocalcritical-appsvcclusterlocal tuple=::ffff:172.17.0.7:46162->:::53
10050 16:41:39.536837173 1 (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53)
[...]
[...]
10096 16:41:39.538247116 1 (36ae6d09d26e) skydns (4639:8) > write fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=221
10097 16:41:39.538275108 1 (36ae6d09d26e) skydns (4639:8) < write res=221 data=
GET /v2/keys/skydns/local/cluster/svc/critical-app/local/cluster/svc/critical-app/backend?quorum=false&recursive;=true&sorted;=false HTTP/1.1
Host: 10.0.2.15:4001
User-Agent: Go 1.1 package http
Accept-Encoding: gzip
10166 16:41:39.538636659 1 (36ae6d09d26e) skydns (4617:1) > read fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=4096
10167 16:41:39.538638040 1 (36ae6d09d26e) skydns (4617:1) < read res=285 data=
HTTP/1.1 404 Not Found
Content-Type: application/json
X-Etcd-Cluster-Id: 7e27652122e8b2ae
X-Etcd-Index: 1259
Date: Thu, 08 Dec 2016 15:41:39 GMT
Content-Length: 112
{"errorCode":100,"message":"Key not found","cause":"/skydns/local/cluster/svc/critical-app/local","index":1259}
[...]
[...]
10218 16:41:39.538914765 0 client (b3a718d8b339) curl (22370:13) < connect res=0 tuple=172.17.0.7:35547->10.0.2.15:53
10242 16:41:39.539005618 1 (36ae6d09d26e) skydns (17280:11) < recvmsg res=74 size=74 data=
backendcritical-appsvcclusterlocalsvcclusterlocal tuple=::ffff:172.17.0.7:35547->:::53
10247 16:41:39.539018226 1 (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53)
10248 16:41:39.539019925 1 (36ae6d09d26e) skydns (17280:11) < recvmsg res=74 size=74 data=
0]backendcritical-appsvcclusterlocalsvcclusterlocal tuple=::ffff:172.17.0.7:35547->:::53
10249 16:41:39.539022522 1 (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53)
10273 16:41:39.539210393 1 (36ae6d09d26e) skydns (4639:8) > write fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=208
10274 16:41:39.539239613 1 (36ae6d09d26e) skydns (4639:8) < write res=208 data=
GET /v2/keys/skydns/local/cluster/svc/local/cluster/svc/critical-app/backend?quorum=false&recursive;=true&sorted;=false HTTP/1.1
Host: 10.0.2.15:4001
User-Agent: Go 1.1 package http
Accept-Encoding: gzip
10343 16:41:39.539465153 1 (36ae6d09d26e) skydns (4617:1) > read fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=4096
10345 16:41:39.539467440 1 (36ae6d09d26e) skydns (4617:1) < read res=271 data=
HTTP/1.1 404 Not Found
[...]
[...]
10396 16:41:39.539686075 0 client (b3a718d8b339) curl (22370:13) < connect res=0 tuple=172.17.0.7:40788->10.0.2.15:53
10418 16:41:39.539755453 0 (36ae6d09d26e) skydns (17280:11) < recvmsg res=70 size=70 data=
backendcritical-appsvcclusterlocalclusterlocal tuple=::ffff:172.17.0.7:40788->:::53
10433 16:41:39.539800679 0 (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53)
10434 16:41:39.539802549 0 (36ae6d09d26e) skydns (17280:11) < recvmsg res=70 size=70 data=
backendcritical-appsvcclusterlocalclusterlocal tuple=::ffff:172.17.0.7:40788->:::53
10437 16:41:39.539805177 0 (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53)
10478 16:41:39.540166087 1 (36ae6d09d26e) skydns (4639:8) > write fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=204
10479 16:41:39.540183401 1 (36ae6d09d26e) skydns (4639:8) < write res=204 data=
GET /v2/keys/skydns/local/cluster/local/cluster/svc/critical-app/backend?quorum=false&recursive;=true&sorted;=false HTTP/1.1
Host: 10.0.2.15:4001
User-Agent: Go 1.1 package http
Accept-Encoding: gzip
10523 16:41:39.540421040 1 (36ae6d09d26e) skydns (4617:1) > read fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=4096
10524 16:41:39.540422241 1 (36ae6d09d26e) skydns (4617:1) < read res=267 data=
HTTP/1.1 404 Not Found
[...]
[...]
10690 16:41:39.541376928 1 (36ae6d09d26e) skydns (4639:8) > connect fd=8(<4>)
10691 16:41:39.541381577 1 (36ae6d09d26e) skydns (4639:8) < connect res=0 tuple=10.0.2.15:44249->8.8.8.8:53
10702 16:41:39.541415384 1 (36ae6d09d26e) skydns (4639:8) > write fd=8(<4u>10.0.2.15:44249->8.8.8.8:53) size=68
10703 16:41:39.541531434 1 (36ae6d09d26e) skydns (4639:8) < write res=68 data=
Nbackendcritical-appsvcclusterlocallocaldomain
10717 16:41:39.541629507 1 (36ae6d09d26e) skydns (4639:8) > read fd=8(<4u>10.0.2.15:44249->8.8.8.8:53) size=512
10718 16:41:39.541632726 1 (36ae6d09d26e) skydns (4639:8) < read res=-11(EAGAIN) data=
58215 16:41:43.541261462 1 (36ae6d09d26e) skydns (4640:9) > close fd=7(<4u>10.0.2.15:54272->8.8.8.8:53)
58216 16:41:43.541263355 1 (36ae6d09d26e) skydns (4640:9) < close res=0
[...]
[...]
58292 16:41:43.542822050 1 (36ae6d09d26e) skydns (4640:9) < write res=68 data=
Nbackendcritical-appsvcclusterlocallocaldomain
58293 16:41:43.542829001 1 (36ae6d09d26e) skydns (4640:9) > read fd=8(<4u>10.0.2.15:56371->8.8.8.8:53) size=512
58294 16:41:43.542831896 1 (36ae6d09d26e) skydns (4640:9) < read res=-11(EAGAIN) data=
75904 16:41:44.543459524 0 (36ae6d09d26e) skydns (17280:11) < recvmsg res=68 size=68 data=
[...]
75923 16:41:44.543560717 0 (36ae6d09d26e) skydns (17280:11) < recvmsg res=68 size=68 data=
Nbackendcritical-appsvcclusterlocallocaldomain tuple=::ffff:172.17.0.7:47441->:::53
75927 16:41:44.543569823 0 (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53)
104208 16:41:47.551459027 1 (36ae6d09d26e) skydns (4640:9) > close fd=7(<4u>10.0.2.15:42126->8.8.8.8:53)
104209 16:41:47.551460674 1 (36ae6d09d26e) skydns (4640:9) < close res=0
[...]
[...]
104406 16:41:47.552626262 0 (36ae6d09d26e) skydns (4639:8) < write res=190 data=
GET /v2/keys/skydns/local/cluster/svc/critical-app/backend?quorum=false&recursive;=true&sorted;=false HTTP/1.1
[...]
104457 16:41:47.552919333 1 (36ae6d09d26e) skydns (4617:1) < read res=543 data=
HTTP/1.1 200 OK
[...]
{"action":"get","node":{"key":"/skydns/local/cluster/svc/critical-app/backend","dir":true,"nodes":[{"key":"/skydns/local/cluster/svc/critical-app/back
end/6ead029a","value":"{\"host\":\"10.3.0.214\",\"priority\":10,\"weight\":10,\"ttl\":30,\"targetstrip\":0}","modifiedIndex":270,"createdIndex":270}],
"modifiedIndex":270,"createdIndex":270}}
[...]
107992 16:41:48.087346702 1 client (b3a718d8b339) curl (22369:12) < connect res=-115(EINPROGRESS) tuple=172.17.0.7:36404->10.3.0.214:80
108002 16:41:48.087377769 1 client (b3a718d8b339) curl (22369:12) > sendto fd=3(<4t>172.17.0.7:36404->10.3.0.214:80) size=102 tuple=NULL
108005 16:41:48.087401339 0 backend-1440326531-csj02 (730a6f492270) nginx (7203:6) < accept fd=3(<4t>172.17.0.7:36404->172.17.0.5:80) tuple=172.17.0.7:36404->172.17.0.5:80 queuepct=0 queuelen=0 queuemax=128
108006 16:41:48.087406626 1 client (b3a718d8b339) curl (22369:12) < sendto res=102 data=
GET / HTTP/1.1
[...]
108024 16:41:48.087541774 0 backend-1440326531-csj02 (730a6f492270) nginx (7203:6) < writev res=238 data=
HTTP/1.1 200 OK
Server: nginx/1.10.2
[...]
$ sysdig -pk -NA -r capture.scap -c echo_fds "fd.type=file and fd.name=/etc/resolv.conf"
------ Read 119B from [k8s_client.eee910bc_client_critical-app_da587b4d-bd5a-11e6-8bdb-028ce2cfb533_bacd01b6] [b3a718d8b339] /etc/resolv.conf (curl)
search critical-app.svc.cluster.local svc.cluster.local cluster.local localdomain
nameserver 10.0.2.15
options ndots:5
[...]
$ sudo sysdig -pk -s8192 -c echo_fds -NA "fd.type in (unix) and evt.buffer contains localdomain"
$ kubectl run -it --image=tutum/curl client-foobar --namespace critical-app --restart=Never
[...]
------ Write 2.61KB to [k8s-kubelet] [de7157ba23c4] (hyperkube)
POST /containers/create?name=k8s_POD.d8dbe16c_client-foobar_critical-app_085ac98f-bd64-11e6-8bdb-028ce2cfb533_9430448e HTTP/1.1
Host: docker
[...]
"DnsSearch":["critical-app.svc.cluster.local","svc.cluster.local","cluster.local","localdomain"],
[...]
-
https://github.com/gianlucaborello/healthchecker-kubernetes/blob/master/manifests/backend.yaml
-
https://sysdig.com/blog/understanding-how-kubernetes-services-dns-work/
-
http://kubernetes.io/docs/user-guide/services/#dns
-
http://blog.kubernetes.io/2015/11/monitoring-Kubernetes-with-Sysdig.html
-
https://github.com/bencer/troubleshooting-docker-kubernetes-sysdig/raw/master/capture.scap
-
http://man7.org/linux/man-pages/man5/resolv.conf.5.html
-
http://www.sysdig.org/install/
-
https://www.sysdig.com/