Open Source Analytics with Piwik
Google Analytics is a great tool to get more insight into user behaviour on a website. Despite being such a great tool, it requires the usage of the Google platform which results in handing over all this data to Google too. Furthermore, you are required to use cookies with Google Analytics. The usage of cookies is not necessarily a bad thing, but if the first thing people see is a cookie banner, they might quit the page immediately. So getting rid of cookies while still getting some more insight into user behaviour than what is typically available with a log analyser would be nice.
Piwik is a self-hosted open source solution that is perfect for these needs! Not only are we hosting everything ourselves, but we can also deactivate cookies and tracking in its entirety if the user’s browser requests this. For me, this is the perfect compromise between user privacy while still being able to monitor popular pages and tailer content to more popular subjects.
Everything discussed here was tested on a Debian 8.5 (jessie) VPS setup. We are going to identify the Piwik installation using a subdomain: stats (e.g. stats.domain.com), so be sure to configure your DNS accordingly (if needed). We are going to use nginx as our web server and PHP5 as backend for Piwik. This is based on the piwik-nginx README and a this tutorial by Muhammad Arul.
Prerequisites
Install and Configure PHP5
The first thing we are going to do, is install PHP-FPM:
sudo apt-get install php5-fpm php5-mysql php5-curl php5-gd php5-cli php5-geoip
Open the standard PHP configuration file:
sudo nano /etc/php5/fpm/php.ini
And set the cgi.fix_pathinfo
to 0
and the always_populate_raw_post_data
to -1
. Normally these are commented, so first uncomment them by removing the ;
. If you are using the nano editor you can quickly search for these with CTRL-W (^W
) and then type the variable name.
cgi.fix_pathinfo
fixes an important security concern explained here. Te other parameter (always_populate_raw_post_data
) force PHP not to define $HTTP_RAW_POST_DATA
. Since PHP 5.6 it’s deprecated, and even unavailable in PHP 7. The preferred way to input data is through php://input
. Deactivating the olde method makes your system more secure. See also the official PHP documentation.
Start PHP with this command:
sudo service php5-fpm start
Backing up Nginx
We will be using a preconfigured nginx setup, so backup the current one in case you want to revert or something goes wrong. Just move all files to a new directory:
sudo mv /etc/nginx/ /etc/nginx-old/
New Nginx Configuration
Serving Files over a Secure Connection
Now install the preconfigured nginx files from this git repo:
sudo git clone https://github.com/perusio/piwik-nginx.git /etc/nginx
Change the default configuring file’s name to something more appropriate and open it to add our domain:
cd /etc/nginx/sites-available/ # move to nginx conf directory
sudo mv stats.example.com.conf stats # change filename to `stats`
sudo nano stats # open Piwik configuration file for nginx
Update the configuration to something similar to the file below. We are not going to use IPv6 (because there is limited value in it) and removing it simplifies the setup. You can always add IPv6 later on, but for now we’ll continue with an IPv4-only configuration.
# -*- mode: nginx; mode: flyspell-prog; mode: autopair; ispell-local-dictionary: "american" -*-
### Nginx configuration for Piwik.
# HTTP traffic
server {
listen 80;
## Uncomment to activate IPv6 and update the address below
## (stolen from wikipedia).
#listen [fe80::202:b3ff:fe1e:8329]:80 ipv6only=on;
limit_conn arbeit 64;
server_name stats.domain.com;
## Access and error log files.
access_log /var/log/nginx/stats.domain.com_access.log;
error_log /var/log/nginx/stats.domain.com.log;
## See the blacklist.conf file at the parent dir: /etc/nginx.
## Deny access based on the User-Agent header.
## -> Uncomment the lines below to enable bad bot blocking based
## on UA string.
# if ($bad_bot) {
# return 444;
# }
## -> Uncomment the lines below to enable bad bot blocking based
## on referer header.
## Deny access based on the Referer header.
# if ($bad_referer) {
# return 444;
# }
# redirect all traffic to HTTPS
location / {
return 301 https://$host$request_uri;
}
# Let's Encrypt challenges can go over HTTP
location /.well-known/acme-challenge/ {
alias /home/acme/challenges/;
try_files $uri =404;
}
}
# HTTPS traffic
server {
listen 443 ssl;
server_name stats.domain.com;
# limit number of connections to arbeit zone
limit_conn arbeit 64;
# SSL/TLS configuration
ssl on;
ssl_certificate /etc/letsencrypt/live/chained.pem;
ssl_certificate_key /etc/letsencrypt/live/domain.key;
ssl_session_timeout 5m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA;
ssl_dhparam /etc/ssl/certs/dhparam.pem;
ssl_prefer_server_ciphers on;
ssl_stapling on;
ssl_stapling_verify on;
access_log /var/log/nginx/stats.domain.com_access.log;
error_log /var/log/nginx/stats.domain.com.log;
## See the blacklist.conf file at the parent dir: /etc/nginx.
## Deny access based on the User-Agent header.
root /var/www/piwik; # Piwik root directory
index index.php piwik.php;
## Include the piwik configuration.
include apps/piwik/piwik.conf;
}
The comments should provide sufficient documentation. Be sure to have a valid certificate before continuing. Update the locations to reflect your setup. You can read my Let’s Encrypt post on how to set up HTTPS certificates with Let’s Encrypt.
Go to one of the many intermediate nginx configuration files for Piwik:
sudo nano /etc/nginx/apps/piwik/piwik.conf
Change the valid_referers
variable to include your domain: valid_referers = none blocked *.domain.com domain.com;
. For my configuration, that results in: none blocked *.olivierpieters.be olivierpieters.be;
This sets all the “Referer” (yes, that typo was intentional) request header field values that will cause the embedded $invalid_referer
variable to be set to an empty string. Basically, it redefines which referer values are allowed. none
means that requests that are not referred are passed, while blocked
means that the referer field that is present, but its value has beed deleted (e.g. by a proxy or firewall) is also allowed. Finally, we also allow referring on our own website (final two). More information in the official nginx docs.
Also disable proxy caching by commenting include apps/piwik/proxy_piwik_cache.conf;
since we are not using Apache. After applying these final changes, check the nginx configuration with sudo nginx -t
.
Enabling PHP
Time to modify the PHP setup. Go to the php-fpm upstream configuration and change the server to a php5 socket. These should be faster than regular network connections:
sudo nano /etc/nginx/upstream_phpcgi.conf
Update the server
as follows:
server unix:/var/run/php5-fpm.sock;
If you suspect your socket to be set up on a different location, you can check the location by listing all sockets present: netstat --unix -l
.
FastCGI will need a cache directory for Piwik, so let’s create it:
sudo mkdir -p /var/cache/nginx/fcgicache
sudo chown -R www-data:www-data /var/cache/nginx/
sudo chown -R www-data:www-data /var/cache/nginx/fcgicache
We changed ownership to the www-data
user and group. This is the user on which nginx is run by default.
Create a symbolic link that will tell nginx you want to activate the Piwik configuration:
sudo mkdir /etc/nginx/sites-enabled/
sudo ln -s /etc/nginx/sites-available/stats /etc/nginx/sites-enabled/stats
Finally, we need to test and activate this configuration. Test the configuration for possible errors with sudo nginx -t
. If error are present in the configuration, this will tell you where the first error occurred. These need to be fixed before we can continue. Finally enable the configuration:
sudo service php5-fpm restart # restart PHP
sudo service nginx restart # or start if nginx is not running
Creating a Database for Piwik
As database backend, we are going to use the MySQL compatible MariaDB. Install the server and the client, then set up a root user (similar to a Linux root user):
sudo apt-get install mariadb-server mariadb-client
mysql_secure_installation
You should remove the test and anonymous users for security reasons. Now login into the database and create the Piwik database:
mysql -u root -p
Enter password:
MariaDB [(none)]> CREATE DATABASE piwikdb;
MariaDB [(none)]> CREATE USER piwikuser@localhost IDENTIFIED BY 'PASSWORD';
MariaDB [(none)]> GRANT ALL PRIVILEGES ON piwikdb.* TO piwikuser@localhost IDENTIFIED BY 'PASSWORD';
MariaDB [(none)]> FLUSH PRIVILEGES;
MariaDB [(none)]> \q
Now we have created a database (piwikdb
) and user piwikuser
with password PASSWORD
(change this to a strong password!) for Piwik. This wraps up all the work we had to do prior to installing Piwik. We will now install Piwik (finally!).
Installing Piwik
We already set the install location for Piwik to the /var/www/piwik
folder (this must match the root
variable in your nginx configuration for Piwik). So let’s move to that folder and download Piwik:
cd /var/www
# download Piwik installation
sudo wget https://github.com/piwik/piwik/archive/master.zip
# extract Piwik installation
sudo unzip master.zip
# move to desired directory
sudo mv piwik-master/ piwik/
# remove zipped Piwik installation
sudo rm master.zip
Now move into the Piwik installation and install PHP-composer and the Piwik PHP dependencies:
cd piwik/
sudo curl -sS https://getcomposer.org/installer | sudo php
sudo php composer.phar install --no-dev
cd ..
# change ownership to www-data user:group
sudo chown -R www-data:www-data piwik/
Apply all changes to PHP-FPM and nginx by restarting both:
sudo service nginx restart
sudo service php5-fpm restart
Now you should be able to use the Piwik browser setup to finish the installation.
Browser Based Piwik Configuring
Go to stats.domain.com and you should now see the Piwik Welcome screen. Follow the setup (it’s very straightforward) and you should end up with a working Piwik configuration. Some notes about the setup: See to it that everything works during the system check (I got a HTTPS 500 warning on the piwik.php page, but this is not a real issue since the index.php page is still accessible). During database configuration, enter the details of the MariaDB we created earlier and set the table prefix to stats_
and select the MYSQLI
adapter. Piwik now uses this to create its tables and you can continue to the Super user login. Pick a nice login name and strong password to protect your data. Finally enter some details about your website and copy the snippet for usage on your websites.
As discussed in the introduction privacy is important. Thus, it’s good practice to enable IP anonymisation and “Do Not Track” support. By default, cookies will be used to get more user information, but we are going to deactivate this. To this end, add _paq.push(['disableCookies']);
before _paq.push(['trackPageView']);
in the snippet Piwik produced.
Conclusion
Now you have a working Piwik configuration running and you can start observing which pages of your website get the most attention without requiring a nasty cookie popup or neglecting your visitor’s privacy.
Some Additional Notes
Piwik might display an ‘Oeps…’ error. It can easily be fixed by changing the PHP configuration as suggested here.
Open the configuration file from before (/etc/php5/fpm/php.ini
). Then append the following at the end:
apc.include_once_override = 0
apc.canonicalize = 0
apc.stat = 0
Finally, restart the PHP service for the changes to take effect: sudo service php5-fpm restart
. This should fix this specific issue.