Kubernetes Cluster: Troubleshooting – STP

Kubernetes Cluster: Troubleshooting

Setting up a Kubernetes Cluster is very complex, involving many steps. Each step can fail for different reasons, so it’s important to analyze these issues to avoid them in the future. Here, we’ll look at the main troubleshooting steps for different stages and various log files:

K8s Installation

Installing a Kubernetes cluster is complex but automated. The platform handles common errors and shows their causes in the dashboard. You can send a report to the support team using the widget for more complex issues. This report includes logs and error messages, making it easier to troubleshoot an error deploying in Kubernetes cluster.

logs and error messages.

After installation, the package checks all cluster components. You can see the details in the /var/log/k8s-health-check.log file on the master node. A script checks the health of components like Weave CNI Plugin, Ingress Controller, Metrics Server, Kubernetes Dashboard, Node Problem Detector, Monitoring Tools, Remote API, NFS Storage, and Sample App.

If the health check fails, you will see a notification. This warning doesn’t always mean something is wrong (e.g., deployments might still be in progress). You can run the kubectl get pods –all-namespaces command to check the pod states. If all pods are Running, your cluster is fine. Otherwise, contact support and attach the logs from the /var/log directory.

Events Tracking

You can use kubectl or Kubernetes Dashboard to track and analyze events for a specific namespace or all namespaces (you need the right permissions):

Events Tracking

Events in Kubernetes Dashboard

Example output from the kubectl get events -n $namespace command

Kubernetes Command

Pod Logs

After scheduling pods to run on a node, you can view their logs using:

  • Kubernetes Dashboard: Go to the pod page and click the Logs button in the top right corner.
  • kubectl: Use commands to manage the pods.

For example, these logs can help identify the cause of the “Back-off restart failed container” event for your pods.

Save $100 in the next
5:00 minutes?

Register Here