The symptoms

You’re running an Apache + PHP stack, and every now and then Apache just stops. If you try curl’ing your host, you get something like this:

$ curl -Ivv api.something.com
* Rebuilt URL to: api.something/
*   Trying xxx.xxx.xxx.xxx...
* Connected to api.something.com (xxx.xxx.xxx.xxx) port 80 (#0)
> HEAD / HTTP/1.1
> Host: api.something.com
> User-Agent: curl/7.43.0
> Accept: */*
> 

…and nothing more.

The diagnosis

First, let’s look at what Apache processes you have. Get a console on the server, and run ps -ax | grep apache. You’ll get a response like this:

# ps -ax | grep apache
3015 ?        Ss     0:00 /usr/sbin/apache2 -k start
3019 ?        S      0:00 /usr/sbin/apache2 -k start
3020 ?        S      0:00 /usr/sbin/apache2 -k start
3026 ?        S      0:00 /usr/sbin/apache2 -k start
3028 ?        S      0:00 /usr/sbin/apache2 -k start
3033 pts/0    S+     0:00 grep --color=auto apache

The first five lines are what we’re interested in. The left column is the process ID for the Apache workers.

Next, let’s see what each of those processes is doing. In my case, I started with process 3015. Run the command strace -p 3015, substituting in your first process identifier for 3015.

When I did this, I saw the following:

# strace -p 3015
Process 3015 attached
select(0, NULL, NULL, NULL, {0, 515464}) = 0 (Timeout)
wait4(-1, 0x7fffd736a4e4, WNOHANG|WSTOPPED, NULL) = 0
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
wait4(-1, 0x7fffd736a4e4, WNOHANG|WSTOPPED, NULL) = 0
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)

Those two lines were repeated over and over again. That looks fine, it’s definitely not hung. (To be honest, this is expected, it’s Apache’s host process. But it’s a good illustration of what a healthy process looks like.) Hit Ctrl-C to exit.

Next up, 3019.

# strace -p 3019
Process 3019 attached
flock(29, LOCK_EX

Woah. The output doesn’t change. This process looks hung. The flock is also a big clue, the process is trying to get an exclusive lock (LOCK_EX) on a file. The 29 tells us which file it is. I hit Ctrl-C to exit.

Let’s have a look for files that are used by process 3019. You can use the command lsof -p 3019, and narrow down the results using grep 29 because we know (from the strace that we’re looking for file 29). I saw the following:

# lsof -p 3019 | grep 29
apache2 3019 www-data  mem    REG              252,0   295816  3414376 /usr/lib/x86_64-linux-gnu/libhx509.so.5.0.0
apache2 3019 www-data  mem    REG              252,0   109296  3414384 /usr/lib/x86_64-linux-gnu/libsasl2.so.2.0.25
apache2 3019 www-data  mem    REG              252,0   290520  3409846 /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2.2
apache2 3019 www-data  mem    REG              252,0    97296 22545808 /lib/x86_64-linux-gnu/libnsl-2.19.so
apache2 3019 www-data  mem    REG              252,0   108736  4982329 /usr/lib/apache2/modules/mod_proxy.so
apache2 3019 www-data  mem    REG              252,0    18440  4982296 /usr/lib/apache2/modules/mod_mime.so
apache2 3019 www-data  mem    REG              252,0    22544  4982295 /usr/lib/apache2/modules/mod_authz_core.so
apache2 3019 www-data    0r   CHR                1,3      0t0     1029 /dev/null
apache2 3019 www-data    1w   CHR                1,3      0t0     1029 /dev/null
apache2 3019 www-data   15w   REG              252,0 72996896 14161104 /var/log/apache2/api.release.error.log
apache2 3019 www-data   29uW  REG              252,0       50 14158656 /var/lib/php5/sess_g6evsj4nbuainfqi1bvlo08323

That last line, for file /var/lib/php5/sess_g6evsj4nbuainfqi1bvlo08323, is the PHP session file.

So, let’s see which processes are using that file. You can do this with the fuser command.

# fuser /var/lib/php5/sess_g6evsj4nbuainfqi1bvlo08323
/var/lib/php5/sess_g6evsj4nbuainfqi1bvlo08323:  3019  3020  3026  3028  3037  3041  3042  3043  3055  3056  3057  3059  3060  3061  3062  3063  3065  3066  3067  3103  3104  3105  3106  3107  3108  3109  3110  3111  3113  3114  3119  3120  3121  3126  3127  3128  3129  3131  3132  3133  3134  3135  3136  3138  3139  3140  3143  3144  3145  3146  3147  3148  3149  3150  3151  3152  3153  3154  3155  3157  3158  3159  3162  3163  3164  3166  3167  3168  3169  3170  3171  3172  3173  3174  3175  3178  3181  3183  3185  3186

In this case, we have a PHP session problem. PHP isn’t releasing the lock on the session file so all the Apache requests are hanging.