Hitting prometheus API with curl and jq
Determine offending pods that use more RAM than requested, causing OOM
Situation, prometheus alerting on out-of-memory on kubernetes clusters. Red host warning emails are landing in everyone’s inboxes. Special ops (err dev ops) team dispatched to investigate. You login to a kubernetes node and find many kernel level OOM warnings like
$ journalctl --since today | grep "invoked oom-killer"
Oct 20 17:02:07 myhost kernel: gunicorn invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=936
Oct 20 17:07:51 myhost kernel: java invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=994
Alright it seems kubernetes nodes are out of memory and the kernel had to call the OOM killer to get rid of some processes. One reason this could happen, is when the cluster is overcommitted. When many processes say I only need 100Mi of RAM, but in reality consume 1Gi, that can cause what we’re see’ing.
Armed with that knowledge, we jump into grafana hit the explore button and carve the perfect query to get the offending containers. Offending here means, containers that have Used/Required RAM that is greater than one. The higher the number, the worse the situation is! Let’s take a look on this query
sort_desc( sum(container_memory_working_set_bytes) by (container_name, namespace) / sum(label_join(kube_pod_container_resource_requests_memory_bytes, "container_name", "", "container")) by (container_name, namespace) > 1)
Yeah! That’s what it takes to interrogate prometheus. We’re having it easy right! So, this gets you what you’re looking for. Except I want the data in a file I can easily email to people! I tried all sorts of voodoo to export or even copy and paste from either grafana or prometheus web UIs albeit unsuccessfully!
So, I brought the big guns. I’m gonna hit the API directly and massage the data into a nice little csv file that excel can open, so that my devs can easily check their usage. Sixteen hours later, this is the curl + jq magic that was needed ;)
curl -fs --data-urlencode 'query=sort_desc( sum(container_memory_working_set_bytes) by (container_name, namespace) / sum(label_join(kube_pod_container_resource_requests_memory_bytes, "container_name", "", "container")) by (container_name, namespace) > 1)' https://prometheus/api/v1/query | jq -r '.data.result[] | [.metric.container_name, .metric.namespace, .value[1] ] | @csv'
That’s it. We’re hitting prometheus API with the query from earlier. The output is piped to jq (the json swiss army knife!), where we unrap the container `.data.result` then extract the interesting metrics and value (without the time), and finally request jq to politely format as csv. Redirect that output into a file, and excel can easily open it, yielding something like
Yes, the 19 up there means the app is actually using 19 times more memory than it has requested in its kubernetes yaml file. This actually means the value for resources.requests.memory are wildly inaccurate and need to be adjusted to avoid future trouble!
We like to keep it short and to the point. Hit that subscribe button above to join this community and receive members-only content. We’re made of developers and ops people who like Kubernetes, Openshift and cloud related stuff!
Hey I’m available for consulting, click that button below to email me