Bug #12663

symbiosis-httpd-logger memory usage

Added by dedwards dedwards almost 4 years ago. Updated over 3 years ago.

Status:NewStart date:2016-03-11
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:jessie

Description

Now that Lets Encrypt is gaining traction, and included in Jessie Symbiosis we're going to be seeing it used a lot more. This means symbiosis-ssl generating more custom configs in sites-enabled directory.

With each entry having potentially 2 symbiosis-httpd-logger processes, the memory overhead can be substantial. The client should be rewritten in a non GC language (for Linux COW) , or re-thought.

History

#1 Updated by iainharrison iainharrison almost 4 years ago

dedwards dedwards wrote:

The client should be rewritten in a non GC language (for Linux COW) , or re-thought.

ISPConfig (which I used before switching to Symbiosis) logs all web requests to a single log file, then a program runs as part of an overnight batch to strip the log entries out and put them into logfiles host by host. That seemed to work very well indeed and was a lot less load on the system during the working day. There may be snags, but it worked for them.

#2 Updated by Steve Kemp over 3 years ago

The symbiosis http logger package is responsible for ensuring that each site hosted upon a symbiosis system has logs written in the correct location. This is done in real-time.

There is some magic to ensure that security issues are not introduced, which largely revolves around finding the correct UID/GID to write the logs as (which will typically be admin.admin, but they might be something different).

Apache’s logging is pretty good, but not ideal. It is flexible enough to allow writing to a pipe, and that is the approach we’ve taken. Every distinct site uses the apache template (in symbiosis) to start a logger process. These logger process are ruby-based and memory-heavy.

As to a fix there is really two things we can do:

  • Stop using this approach. Make Apache log beneath /var/log/apache2/access.log + error.log, then split the logs up on a daily cycle (easy).
  • Attempt to stop leaking/growing in the ruby process (hard).

The downside of the former approach is that error-logs are hard to split, because they don’t contain the virtual host which has been requested. The splitting of the access-log is easy enough if we template with the vhost which is being used.

For reference we used to make the first approach. I wrote the utility to do the daily log-splitting, but over time Patrick replaced that with this newer solution. I imagine the rationale was to make logs “live”, or at least nearly live.

As a quick fix I expect we could revert to the global logfile + splitter, but that has documentation implications.

That's the current summary, and David has probably hit upon the solution above which is to use the global file.

NOTE If we go down this route we ALSO have to update the statistic generation (awstats).

#3 Updated by Steve Kemp over 3 years ago

Quick suggestion:

  • Try updating the logger to add the `--sync` flag.

This will involve updating the Apache config files, obviously. As per the documentation:

@symbiosis-apache-logger [ --max-files | -f <n> ] [ -s | --sync ]

-s, --sync Open the file in sync mode, i.e. all data are immediately flushed to the OS and not buffered by the script.

If the issue is relating to the script that might help. If not it'll need a leak-check running against EventMachine/the script. Or other profiling.

#4 Updated by Steve Kemp over 3 years ago

Proof of concept logger written in golang here:

https://gist.github.com/skx/32a21b71b60012e09da7aff160bfcc8c

TODO:

  • Ensure the file/directory have the correct UID/GID
  • Avoid directory-traversal.
  • Work out how to build/compile for symbiosis.
  • Add documentation.

#5 Updated by Steve Kemp over 3 years ago

Trying to reproduce this I created a fresh BigV guest, and created a bunch of sites upon it:

cd /srv/
for i in com org net foo bar fi org.uk co.uk info biz sucks rocks website site hosts; do mkdir -p /srv/example.${i}/public/htdocs; cp /usr/share/symbiosis/static/index.html /srv/example.${i}/public/htdocs/ ; done
chown -R admin.admin /srv/

This gave me 15 vhosts.

Once that was done, from offsite, I ran the following script:

root@symbiosis:~# cat curl 
#!/bin/sh
for i in com org net foo bar fi org.uk co.uk info biz sucks rocks website site hosts; do
curl -H "Host:example.${i}" symbiosis.default.skx.uk0.bigv.io/
done

Because I was using only a signle logger I could track RAM via the PID:

root@symbiosis:~# top -p 16608
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16608 root 20 0 64716 17268 7008 S 0.0 0.4 0:00.13 symbiosis-httpd

5 minutes later:

VIRT: 73260
RES: 17788

Not a huge change, but still visible.

5.5 minutes later:
VIRT: 73392

After that, despite running the curl command in a tight loop I saw no further rise in RAM. I'll leave it running for an hour or so.

#6 Updated by Steve Kemp over 3 years ago

14:16:
VIRT:73392
RES:18008

14:17:
VIRT:73392
RES:18024

14:19:
VIRT:73524
RES:18048

14:20:
VIRT:73524
RES:18052

(Slowish) Leak/RAM usage confirmed.

So there are two solutions:

  • Look at replacing the ruby-script with the go-prototype I used.
  • Look at dropping the logger and running a splitter once a day.

First solution preferred.

#7 Updated by Steve Kemp over 3 years ago

Replacing the logger with the naive implementation meant compiling the code on my desktop, and copying the the remote host.

Once done I updated the Apache vhost file with to read thusly:

CustomLog "|| /root/http-logger" zz_mass_hosting_combined

Memory usage was immediately significantly reduced:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
26942 root 20 0 9836 6148 1820 S 0.7 0.2 0:00.51 http-logger

Leaving it running for 10 minutes showed only a minor change:

26942 root      20   0    9836   6984   1884 S   0.3  0.2   0:07.46 http-logger

So it seems that the code, naive as it is, should be a good replacement.

The outstanding task will be to test for real on a volunteer site, and add options-parsing, such that the compiled binary can be a drop-in replacement. Once that is done we can package and deploy to select users.

#8 Updated by Steve Kemp over 3 years ago

I've reimplemented the Ruby logging process in golang, since the previous updates do indeed prove that RAM is being lost over time.

The Debian package I've produced generates a single (static) binary, and makes use of the Debian diversions to replace the existing/current/broken logger. This means upgrading should be a simple matter of :

dpkg --install symbiosis-httpd-logger_0.1_all.deb

Removing that package will remove the diversion, hopefully leaving things no more broken than previously. The source has been committed to the symbiosis repository but interested parties will need to build them for the moment.

#9 Updated by Patrick Cherry over 3 years ago

  • Target version set to jessie

Also available in: Atom PDF