Data Masking for User Privacy with nginScript
In October 2016, the Court of Justice of the European Union ruled that IP addresses are “personal information” and as such fall under the Data Protection Directive and General Data Protection Regulation (GPDR). For many website owners, this presents challenges for archiving and analyzing log files if the data leaves the EU. For data moving into the US, the EU‑US Privacy Shield provides some protection but faces legal challenges by privacy groups and governments who do not believe the level of protection is adequate.
However, protecting personal data in log files is not just an EU problem. For organizations with security certifications like ISO/ICE 27001, moving log files outside of the security realm where they were generated, say from network ops to marketing, can compromise the scope and compliance of the certification.
In this blog post, we describe some simple solutions for sanitizing NGINX Plus and NGINX log files so that they can be safely exported without exposing what is often called personally identifiable information (PII).
Simple Approaches Do Not Work
The simplest approach to personal data protection is to strip IP addresses from logs before they are exported. This is easy to achieve with standard Linux command line tools, but log analysis systems may expect log files in a standard format and fail to import logs that omit the IP address field. Even if logs are successfully imported, the value of log processing may be significantly reduced if the analysis system relies on IP addresses to track a user across a site.
Another potential approach, substituting fake or random values for real IP addresses, results in log files that look complete, but the quality of log analysis is compromised because each log entry appears to originate from a different randomly generated IP address.
Masking the Client IP Address
The most effective solution is to use a technique called data masking to transform the real IP address into one that does not identify the end user but still allows correlation of website activity for a particular user. Data masking algorithms always produce the same pseudorandom value for a given input value in a way that ensures it cannot be converted back to the original input value. Every occurrence of an IP address is always transformed to the same pseudorandom value.
You can implement IP address masking in NGINX and NGINX Plus with the nginScript module. nginScript is a unique JavaScript implementation for NGINX and NGINX Plus, designed specifically for server‑side use cases and per‑request processing. In this case we execute a small amount of JavaScript code to mask the client IP address as each request is logged.
Instructions for enabling nginScript with NGINX and NGINX Plus appear at the end of this article.
NGINX and NGINX Plus Configuration for IP Address Masking
The log_format
directive controls which information appears in the access logs. NGINX and NGINX Plus ship with a default log format called combined which produces log files that can be processed by most log‑processing tools.
For this configuration we create a new log format, masked, which is identical to the combined format except for the first field, where we replace the $remote_addr
variable with $remote_addr_masked
. The new variable is evaluated by executing our JavaScript code, which generates a masked version of the client IP address.
To have NGINX and NGINX Plus write access logs in this format we configure a
server
block with an access_log
directive that specifies the masked format.
$remote_addr_maskedn”; # For testing only
}
}
[/config]–>As we are going to use nginScript to mask the client IP address, we use the js_include
directive to specify the location of our JavaScript code. The js_set
directive specifies the JavaScript function that will be executed when evaluating the $remote_addr_masked
variable.
Notice that we use two access_log
directives. The first uses the default log format to produce access logs that can be used by administrators for operational purposes. The second specifies the masked log format. With this configuration we write two access logs for each request – one for sysadmins and DevOps, and one for export.
Finally, the location
block defines a very simple response using the return
directive to show that data masking is working. In production, this would most likely contain a proxy_pass
directive to direct requests to a backend server.
nginScript Code for IP Address Masking
We construct masked IP addresses using three simple JavaScript functions. Dependent functions must appear first so we will discuss them in that order.
The essence of the data masking solution is to use a one‑way hashing algorithm to transform the client IP address. In this example we are using the FNV‑1a hash algorithm, which is compact, fast, and has reasonably good distribution characteristics. Its other advantage is that it returns a positive 32‑bit integer (the same size as an IPv4 address), which makes it trivial to present as an IP address. The fnv32a
function is a JavaScript implementation of the FNV‑1a algorithm.
The i2ipv4
function converts a 32‑bit integer to an IPv4 address in quad‑dotted notation. It takes the hashed values from fnv32a()
and provides a representation that “looks right” in our access log. Both IPv6 addresses and IPv4 addresses are represented in IPv4 format
Finally, we have the maskRemoteAddress
function, which is referenced by the js_set
directive in the NGINX and NGINX Plus configuration above. It has a single parameter, req
, which is the JavaScript object representing the HTTP request. The remoteAddress
property contains the value of the client IP address (equivalent to the $remote_addr
variable).
<!– [config]
function fnv32a(str) {
var hval = 2166136261;
for (var i = 0; i < str.length; ++i ) {
hval ^= str.charCodeAt(i);
hval += (hval << 1) + (hval << 4) + (hval << 7) + (hval << 8) + (hval <>> 0;
}
function i2ipv4(i) {
var octet1 = (i >> 24) & 255;
var octet2 = (i >> 16) & 255;
var octet3 = (i >> 8) & 255;
var octet4 = i & 255;
return octet1 + “.” + octet2 + “.” + octet3 + “.” + octet4;
}
function maskRemoteAddress(req) {
return i2ipv4(fnv32a(req.remoteAddress));
}
[/config]–>
IP Address Masking in Action
With the above configuration in place, we can make a simple request to our server and check the response and the resultant access log entries.
$ curl http://localhost/
127.0.0.1 -> 8.163.209.30
$ sudo tail --lines=1 /var/log/nginx/access*.log
==> /var/log/nginx/access.log /var/log/nginx/access_masked.log <==
8.163.209.30 - - [16/Mar/2017:19:08:19 +0000] "GET / HTTP/1.1" 200 26 "-" "curl/7.47.0"
Masking Personal Data in the Query String
Log files can also contain personal data other than IP addresses. Email addresses, postal addresses, and other identifiers can be passed as query‑string parameters and consequently be logged as part of the request URI. If your application passes personal data in this way, you can extend the nginScript IP address‑masking solution to sanitize personal data in the query string.
NGINX and NGINX Plus Configuration for Query String Masking
Like the default combined format, the masked log format defined above for IP address masking logs the $request
variable, which captures three components of a request: HTTP method, URI (including query string), and HTTP version. We need to mask only the query string, so in the interests of code efficiency we use a separate variable for each of the three components, transforming only the request URI (second component) with the $request_uri_masked
variable and using standard variables ($request_method
and $server protocol
) for the first and third components.
The server
block requires another js_set
directive to define how the $request_uri_masked
variable is evaluated.
$remote_addr_maskedn
$request -> $request_uri_masked"; # For testing only
}
}
[/config]-->
nginScript Code for Query String Masking
We add the maskRequestURI
JavaScript function to the logmask.js file shown above. Like maskRemoteAddress()
, maskRequestURI()
depends on the fnv32a
hashing function and so appears below it in the file.
The maskRequestURI
function iterates through each key‑value pair in the query string, looking for specific keys that are known to contain personal data. For each of these keys, the value is transformed to a masked value.
Depending on the type of processing to be carried out on NGINX and NGINX Plus log files, the masked query string values may need to resemble genuine data. In the example above we have formatted zip
to be five digits and email
to comply with RFC 821. Other keys may require more sophisticated formatting or dedicated functions to construct them.
Query String Masking in Action
With these additions to our configuration, we can see query string masking in action.
$ curl "http://localhost/index.php?foo=bar&zip=90210&email=liam@nginx.com"
127.0.0.1 -> 8.163.209.30
$ sudo tail --lines=1 /var/log/nginx/access*.log
==> /var/log/nginx/access.log /var/log/nginx/access_masked.log <==
8.163.209.30 - - [16/Mar/2017:20:05:55 +0000] "GET /index.php?foo=bar&zip=38643&email=2852675791@example.com HTTP/1.1" 200 26 "-" "curl/7.47.0"
Summary
NGINX and NGINX Plus with nginScript provide a simple and powerful solution for applying custom logic to request processing. In this article we demonstrated how nginScript can be used to mask personal data so that log files can be written in a way that allows offline analysis without violating data protection requirements.
We would love to hear about how you are using nginScript to solve business problems – please tell us about it in the comments section below.
To try NGINX Plus, start your free 30‑day trial today or contact us for a demo.
Enabling nginScript for NGINX and NGINX Plus
- Loading nginScript for NGINX Plus
- Loading nginScript for Open Source NGINX
- Compiling nginScript as a Dynamic Module for Open Source NGINX
Loading nginScript for NGINX Plus
nginScript is available as a free dynamic module for NGINX Plus subscribers (for open source NGINX, see Loading nginScript for Open Source NGINX below).
-
Obtain the module itself by installing it from the NGINX Plus repository.
-
For Ubuntu and Debian systems:
$ sudo apt‑get install nginx-plus-module-njs
For RedHat, CentOS, and Oracle Linux systems:
$ sudo yum install nginx-plus-module-njs
-
-
Enable the module by including a
load_module
directive for it in the top‑level ("main") context of the nginx.conf configuration file (not in thehttp
orstream
context). This example loads the nginScript module for HTTP traffic.load_module modules/ngx_http_js_module.so;
-
Reload NGINX Plus to load the nginScript module into the running instance.
$ sudo nginx -s reload
Loading nginScript for Open Source NGINX
If your system is configured to use the official pre‑built packages for open source NGINX and your installed version is 1.9.11 or later, then you can install nginScript as a pre‑built package for your platform.
-
Install the pre‑built package.
-
For Ubuntu and Debian systems:
$ sudo apt-get install nginx-module-njs
-
For RedHat, CentOS, and Oracle Linux systems:
$ sudo yum install nginx-module-njs
-
-
Enable the module by including a
load_module
directive for it in the top‑level ("main") context of the nginx.conf configuration file (not in thehttp
orstream
context). This example loads the nginScript module for HTTP traffic.load_module modules/ngx_http_js_module.so;
-
Reload NGINX Plus to load the nginScript module into the running instance.
$ sudo nginx -s reload
Compiling nginScript as a Dynamic Module for Open Source NGINX
If you prefer to compile an NGINX module from source:
- Follow these instructions to build the nginScript module from the open source repository.
- Copy the module binary (ngx_http_js_module.so) to the modules subdirectory of the NGINX root (usually /etc/nginx/modules).
- Perform Steps 2 and 3 in Loading nginScript for Open Source NGINX.
The post Data Masking for User Privacy with nginScript appeared first on NGINX.