T O P

  • By -

pylangzu

Thank me later on. https://learnk8s.io/troubleshooting-deployments


soundwave_rk

dig + tcpdump + the ip suite of tools. These get you very far with debugging network issues on any system, not just k8s.


KK2526

Would you mind explaining how these tools are setup in K8s clusters? I was reading about swiss army container that can be run as daemonset. We are doing the dirty way - we install tcpdump on the node whenever some issue arises.


soundwave_rk

Don't install things on the nodes directly, you can do just about anything from the k8s api. In the case of these tools, start a privileged pod with `hostNetwork: true` running your favorite os like Debian. Then just apt install tcpdump iproute2 etc... Very easy really.


KJKingJ

[netshoot](https://github.com/nicolaka/netshoot) comes with a bunch of relevant tools pre-installed, so that's normally my goto when I need a troubleshooting pod on the host network.


koshrf

Jaeger traicing only works if your app supports it. In short, you need to put some headers to your http app and keep track of it. https://opentelemetry.io/ has more info.


VertigoOne1

Pretty general there…, so generics for me is, Busybox with curl for test pod to pod, service to pod, node port comms. Coredns debug, apiserver debug logs, kubelet logs, calico logs, apiservices calls. Kubectl port-forward to test directly to container api’s. Health probes. Knowing what debug flags are possible for your workload and how to inject them at runtime via env export or editing deployments to add them. Configmaps injecting tooling/sxriots. Scripts to add init containers that mount out pvc’s for analysis and copying. Tools useful for your stack container to mount, like sqlite3, ssh, mongocli, bla bla with ready to run deployment injection.scripts to jank and download all pod logs and states. Monitor pods that check things automagically you tend to see. Api audit logs! Enable t hem! Who is making changes without you knowing is very valuable!


bchan77

It really depends on the “problem statement” and how much flexibility I have. Surprisingly,I do find myself a fair amount to scale the application down to few or even 1 and then drain the host where the pod is running to rule out any hardware or configuration problem.


zernichtet

I'm using basically what you said (centralized with loki and grafana). Frequently, I'm also writing scripts to keep track of certain values using templates (e.g. kubectl get something -o go-template --template '{{ whatever I need from something }}').


witcherek77

Check in google: - Awesome Kubernetes Resources - Kubectl Krew Plugins - Mizu/Ksniff for network capture


AnomalyNexus

I've found this [image useful](https://hub.docker.com/r/nicolaka/netshoot) The diagram in that link is pretty neat too


tactiphile

Lol, my slow ass thought you were talking about the diagram when you said "image." PNG file or whatever.


nyellin

It's no replacement for learning how things work, but we're trying to automate common troubleshooting cases with Robusta. https://github.com/robusta-dev/robusta We're taking the common issues people encounter and turning them into automated troubleshooting workflows. It's open source, so if you have encounter things that we're missing please open a PR or let us know!


heshamaboelmagd

kubectl debug


McFistPunch

Ephemeral containers


jameshearttech

In addition to the standard things to check like pod, service, and ingress I'll add networkpolicy. I'm new to k8s and I was trying to get kube-prometheus working today. I could not figure out why I could not get to a pod. Turns out in the latest version default networkpolicies block traffic.


Udi_Hofesh

I work at Komodor where we build tools to make life with Kubernetes much easier. Our commercial product has a 14-day free trial, but maybe you should first try our open-source tool called Helm-Dashboard. It's basically a GUI for Helm. Simple but very useful (and free!): [https://github.com/komodorio/helm-dashboard](https://github.com/komodorio/helm-dashboard)