On 2024-06-21, TUX02 experienced an outage because we ran out of RAM on the server. Here we outline how to instrument processes that consume RAM, in particular, what to watch out for.
The output of "free -m -h" looks like:
total used free shared buff/cache available Mem: 251G 88G 57G 6.2G 105G 155G Swap: 29G 20G 9.8G
When running "free", you can refresh the output regularly. As an example, to get human readable output every 2 seconds:
free -m -h -s 2
It's tempting to check the "free" column to see how much RAM is being used. However, this column also includes disk caching. Disk caching doesn't prevent applications from getting the memory they want[1]. What we need to be aware of instead are:
Also, use htop/top and filter out the process (and preferably order by RAM usage) you are monitoring to see how much RAM a process and it's children (if any) consume.