Why you need Network Error Logging (NEL)

Introduction

Wouldn't it be great if every visitor to your site took the time to notify you when they were experiencing connectivity issues? Wouldn't it be even better if they told you exactly what caused the issue? Things like DNS lookup failures, connection timeouts, reset connections, or dead links that result in 404 errors?

As most of these issues are not detectable server-side—because, by definition, the client may not have managed to establish a successful connection with the server— getting client-side feedback would be extremely valuable!

Well..., stop fantasizing because this is now possible by adding just two response headers to your website. By adding the Report-To and NEL headers, you instruct supporting user agents (browsers) to send you reports about connectivity issues. Aside from logging errors,  NEL can also be used to log successful requests, allowing you to determine rates of errors across different client populations.

Report-To Response headers

The Report-To header instructs the user agent where to send the reports. Its value is a JSON-formatted array of objects. After adding this header, you'll receive deprecation, intervention, and crash reports. I won't dive into these right now, but you can find more information about these reports here. Example:

Report-To: {"group":"endpoint-1","max_age":10886400,"endpoints":[{"url":"https://example.uriports.com/reports"}],"include_subdomains":true}

And you'll find more information about this header's elements here.

NEL Response headers

The NEL header is also a JSON-formatted array of objects and refers to a reporting endpoint defined in the previously mentioned Report-To header. This header is an extension to the Report-To header that instructs the user agent to send network error reports. Therefore, it is not possible to configure a NEL header without a Report-To header. Example:

NEL: {"report_to":"endpoint-1","max_age":2592000,"include_subdomains":true,"failure_fraction":0.5}

Find more information about this header's elements here.

Sampling report volume

Two optional fraction values (success_fraction and failure_fraction) allow you to define a sampling rate between 0 and 1, inclusive. If you have a high-traffic website, it would be a good idea to set a value lower than 1 to decrease the number of reports being sent. For instance, a value of 0.25 will instruct browsers to send only 1 out of 4 reports (25%). The default value for success_fraction is 0, so you will not receive any success reports by default. Be sure to start small when enabling this feature, as this will undoubtedly result in many reports. So start at 0.001 and work your way up if you need more.

Example report

Below is an example report from the W3C Editor's Draft. The report is delivered using a POST to the endpoint that was specified in the Report-To response header.

{
  "age": 0,
  "type": "network-error",
  "url": "https://widget.com/thing.js",
  "body": {
    "sampling_fraction": 1.0,
    "referrer": "https://www.example.com/",
    "server_ip": "",
    "protocol": "",
    "method": "GET",
    "request_headers": {},
    "response_headers": {},
    "status_code": 0,
    "elapsed_time": 143,
    "phase": "dns",
    "type": "dns.name_not_resolved"
  }
}

The example report above indicates that the user agent attempted to fetch https://widget.com/thing.js from https://www.example.com/. However, the user agent was unable to resolve the DNS name (widget.com) and the request was aborted by the user agent after 143 milliseconds. Because a previous request to widget.com delivered a valid NEL policy, the user agent generates a network error report for this request. The report was uploaded immediately after the network error was encountered (i.e., the report age is 0).

Network error types

Below is a list of predefined error codes and their descriptions. As you can see, you can gain a lot of detailed information from adding a NEL header to your website.

dns.unreachable
    DNS server is unreachable
dns.name_not_resolved
    DNS server responded but is unable to resolve the address
dns.failed
    Request to the DNS server failed due to reasons not covered by previous
    errors
dns.address_changed
    Indicates that the resolved IP address for a request's origin has
    changed since the corresponding NEL policy was received 

tcp.timed_out
    TCP connection to the server timed out
tcp.closed
    The TCP connection was closed by the server
tcp.reset
    The TCP connection was reset
tcp.refused
    The TCP connection was refused by the server
tcp.aborted
    The TCP connection was aborted
tcp.address_invalid
    The IP address is invalid
tcp.address_unreachable
    The IP address is unreachable
tcp.failed
    The TCP connection failed due to reasons not covered by previous errors

tls.version_or_cipher_mismatch
    The TLS connection was aborted due to version or cipher mismatch
tls.bad_client_auth_cert
    The TLS connection was aborted due to invalid client certificate
tls.cert.name_invalid
    The TLS connection was aborted due to invalid name
tls.cert.date_invalid
    The TLS connection was aborted due to invalid certificate date
tls.cert.authority_invalid
    The TLS connection was aborted due to invalid issuing authority
tls.cert.invalid
    The TLS connection was aborted due to invalid certificate
tls.cert.revoked
    The TLS connection was aborted due to revoked server certificate
tls.cert.pinned_key_not_in_cert_chain
    The TLS connection was aborted due to a key pinning error
tls.protocol.error
    The TLS connection was aborted due to a TLS protocol error
tls.failed
    The TLS connection failed due to reasons not covered by previous errors

http.error
    The user agent successfully received a response, but it had a 4xx or
    5xx status code 
http.protocol.error
    The connection was aborted due to an HTTP protocol error
http.response.invalid
    Response is empty, has a content-length mismatch, has improper encoding,
    and/or other conditions that prevent user agent from processing the
    response
http.response.redirect_loop
    The request was aborted due to a detected redirect loop
http.failed
    The connection failed due to errors in HTTP protocol not covered by
    previous errors

abandoned
    User aborted the resource fetch before it is complete
unknown
    error type is unknown

Notifications

Collecting and analyzing these reports can reveal valuable data that allows you to improve your website. At URIports, we automatically detect issues and send notifications when something is wrong. You will receive an email or push notification when multiple sources (different IP/browser combinations) experience the same issues, like 404 not-found errors, expired or wrongly configured certificates, or connectivity or DNS issues. This will keep you up-to-date and allow you to quickly resolve the issues without the site visitor taking the time to alert you personally.

Adoption

Based on the (January 2020) data from Scott Helme's Crawler.Ninja, only 1.2% of the websites in the Alexa top 1 million use Network Error Logging. In my opinion, that would make NEL the most undervalued monitoring technique available today.

Conclusion

Setting up Network Error Logging is easy, and best of all, it is free and already supported by most browsers like Chrome (Mobile), Opera, Yandex, Electron, Vivaldi, Edge, etc. Furthermore, combining it with URIports makes it even more valuable, allowing you to easily navigate and analyze the data and add notifications as a bonus.

Getting Started with URIports

We have a Getting Started page to help you set up everything by adding a few response headers and DNS records. You'll have your free 30-day URIports trial account set up and working in less than 30 minutes. No commitment or credit card details are required, and no strings attached. An additional advantage of using URIports is that the reports are collected outside the website network. This way, they can be delivered, even when the entire network of the website is down.

As always, if you have any questions, please find me on Twitter @freddieleeman