Give your logs more context — Part 1

Give your logs more context

How to make sense out of your Node.js web app logs

Logging might be one of the most difficult things to do right when building a real world application. Log too little and you will be staring at your screen trying to make sense of them (or the charts generated from them). Log too much and you will end up lost in a swamp of useless information, still having no clue if everything is OK or if you have a problem.

matrix.jpeg Logs without the right amount of context look like…

Speaking specifically of the Node.js/Javascript ecosystem, the top 3 logging libraries — Winston, Bunyan and Pino — can help you to manage context way better than the good ol’ console.log could ever do.

For this article I will use Pino, but the ideas can be easily replicated for both Bunyan and Winston (or any other mainstream logging utility).

Use log levels wisely

Pino has 6 default log levels, with increasing severity: trace, debug, info, warn, error and fatal. Each one of these levels maps to an integer from 10 to 60. This makes it easy to analyze your logs later using tools like jq:

jq 'select(.level > 40)' # gets ERROR and FATAL logs

While Pino allows you to define custom log levels, I have never seen a use case where they would be necessary, so I tend to stick with the default ones.

Usually, for production, it is recommended to ignore trace and debug levels, unless you are explicitly trying to debug some production issue.

Pino has a configuration option that allows you to define the minimum required level for the log entry to be generated. You can use environment variables to avoid having to make a deploy just to change the log level:

const logger = pino({
  level: process.env.LOG_LEVEL || 'info'
});

Rules of thumb

  • Use trace for internal logging that has a potentially high throughput.
  • Use debug for eventual debugging sessions you might need, but remember to remove them after you are finished.
  • Use info for regular application workflow logs.
  • Use warn for expected and frequent error conditions (like user input validation).
  • Use error for expected but infrequent error conditions (like network failures, database timeouts).
  • Use fatal for unexpected error conditions.

Embrace request IDs

While we are still developing the application, running unit/integration tests, manually triggering some request to see if everything is running smoothly, it’s all good. The events being produced happen in a more or less predictable order, so it’s easy to follow.

However, once the production version is launched, things can go really crazy. Your app will most certainly process concurrent requests. If you have a few asynchronous steps — like querying a database or calling some external services — the order of each event will be completely unpredictable. In this case, if you are manually inspecting the logs (we all have done this at some point 😅), you can become very frustrated trying to find a thread of execution.

Some frameworks — like Hapi — already take care of this for you. But if you like me still rely on good ol’ express, you have to do it yourself. Defining a middleware that does that is as simple as:

function setRequestId(generateId) {
  return (req, res, next) => {
    req.id = generateId();
    next();
  };
}

Then use it:

let i = 0;
const generateId = () => i++;
app.use(setRequestId(generateId));</span>

Of course, this naive implementation would not work if you ever restart your server, since the counter would be reset to 0. For a real world application, it’s recommended to use a more robust ID generator, such as uuid or, my personal choice, cuid.

If you use a micro-services architecture (or want to be prepared to), you can leverage distributed tracing simply by allowing your services to forward and receive a given request ID:

function setDistributedRequestId(generateId) {
  return (req, res, next) => {
    const reqId = req.get('X-Request-Id') || generateId();
    req.id = reqId;
    res.set('X-RequestId', reqId);
    next();
  };
}

Now we can create another middleware that logs incoming requests:

function logIncomingRequests(logger) {
  return (req, res, next) => {
    logger.trace({ req, requestId: req.id}, 'Incoming request');
    next();
  }
}

And use it:

app.use(logIncommingRequests(pino()))</span>

The generated log entry would look like:

{"level":30, "time":1533749413556, "pid":15377, "hostname":"henrique-pc", "msg":"Incoming request", "req":{"method":"GET", "url":"/", "headers":{"host":"localhost:4004", "user-agent":"curl/7.61.0", "accept":"*/*"}},
"requestId":1, # <---- notice here!
"v":1}

So far, so good. We could even use express-pino-logger to further integrate the logger with our express app. The major problem here is that the request ID is tightly coupled with our web layer. Unless you define all your business logic within express handlers — something I urge you to please don’t — you won’t be able to access the request ID value in other layers.

Oh, I could store the IDs in-memory or in a Redis cache and then retrieve it when logging something in other layers!!!

Yeah, nice try. I thought that myself too, but it doesn’t work. The reason is that you can’t know which request you are currently processing when you have concurrent accesses. Or can you?

Meet Continuation Local Storage

Imagine that each request is an isolated “thread” of connected execution paths (function calls) that is discarded when the result of the original call is returned.

While Javascript doesn’t spawn real threads for handling user requests, it emulates this by registering callbacks that will be called in the proper sequence when the results of the function calls are available.

Luckily for us, Node.js provides a way to intercept the hops through this execution “thread”. Continuation Local Storage (or CLS for short) leverages this capability to keep data available within a given “thread”.

When you set values in continuation-local storage, those values are accessible until all functions called from the original function — synchronously or asynchronously — have finished executing. This includes callbacks passed to process.nextTick and the timer functions (setImmediate, setTimeout, and setInterval), as well as callbacks passed to asynchronous functions that call native functions (such as those exported from the fs, dns, zlib and crypto modules).

giphy.webp Me when I first discovered CLS…

Redefining our request ID middleware, we would have something like:

import { createNamespace } from 'cls-hooked';
import cuid from 'cuid';
const loggerNamespace = createNamespace('logger');

function clsRequestId(namespace, generateId) {
  return (req, res, next) => {
    const reqId = req.get('X-Request-Id') || generateId();

    res.set('X-RequestId', reqId);

    namespace.run(() => {
      namespace.set('requestId', reqId);
      next();
    });
  };
}

app.use(clsRequestId(loggerNamespace, cuid));

Breaking it down:

  • A namespace is roughly the CLS equivalent of a table from a relational database or a collection/key space from a document store. To create one, we simply need to identify it as a string.
  • Our “high order” middleware clsRequestId now needs two parameters: the namespace and the ID generator function.
  • namespace.run is the function that creates a new context, bounded to the execution “thread”.
  • namespace.set puts the request ID into local storage.
  • next will call the next express handler. IMPORTANT: to make this work as expected, next MUST be called inside the namespace.run callback.

Now, whenever we need to access this value, we can use getNamespace from cls-hooked:

import { getNamespace } from 'cls-hooked';
import pino from 'pino';
const logger = pino();

loggerNamespace = getNamespace('logger');

function doStuff() {
  // ...
  logger.info({ requestId: loggerNamespace.get('requestId') }, "Some message");
}

If function doStuff call was ultimately originated in one of the handlers from the express app which registered that clsRequestId middleware, the value will be available.

Putting everything toghether:

Here’s a sample output generated with autocannon:

{"level":30,"time":1533759930690,"msg":"App is running!","pid":4985,"hostname":"henrique-pc","endpoint":"http://localhost:4000","v":1}
{"level":30,"time":1533759933634,"msg":"Before","pid":4985,"hostname":"henrique-pc",**"requestId":"cjkll2awx0000uhwg9qh20e0b"**,"v":1}
{"level":30,"time":1533759933636,"msg":"Before","pid":4985,"hostname":"henrique-pc",**"requestId":"cjkll2awz0001uhwgoyiptfxv"**,"v":1}
{"level":30,"time":1533759935531,"msg":"Middle","pid":4985,"hostname":"henrique-pc",**"requestId":"cjkll2awz0001uhwgoyiptfxv"**,"v":1}
{"level":30,"time":1533759939590,"msg":"Middle","pid":4985,"hostname":"henrique-pc",**"requestId":"cjkll2awx0000uhwg9qh20e0b"**,"v":1}
{"level":30,"time":1533759941222,"msg":"After","pid":4985,"hostname":"henrique-pc",**"requestId":"cjkll2awz0001uhwgoyiptfxv"**,"v":1}
{"level":30,"time":1533759941228,"msg":"Before","pid":4985,"hostname":"henrique-pc",**"requestId":"cjkll2grw0002uhwgzz14qyb6"**,"v":1}
{"level":30,"time":1533759943632,"msg":"Before","pid":4985,"hostname":"henrique-pc",**"requestId":"cjkll2imo0003uhwgf4dutgz3"**,"v":1}
{"level":30,"time":1533759946244,"msg":"Middle","pid":4985,"hostname":"henrique-pc",**"requestId":"cjkll2grw0002uhwgzz14qyb6"**,"v":1}
{"level":30,"time":1533759949490,"msg":"After","pid":4985,"hostname":"henrique-pc",**"requestId":"cjkll2awx0000uhwg9qh20e0b"**,"v":1}
{"level":30,"time":1533759951621,"msg":"Middle","pid":4985,"hostname":"henrique-pc",**"requestId":"cjkll2imo0003uhwgf4dutgz3"**,"v":1}
{"level":30,"time":1533759952464,"msg":"After","pid":4985,"hostname":"henrique-pc",**"requestId":"cjkll2grw0002uhwgzz14qyb6"**,"v":1}
{"level":30,"time":1533759953632,"msg":"Before","pid":4985,"hostname":"henrique-pc",**"requestId":"cjkll2qcg0004uhwgnmgztdr7"**,"v":1}
{"level":30,"time":1533759954665,"msg":"Middle","pid":4985,"hostname":"henrique-pc",**"requestId":"cjkll2qcg0004uhwgnmgztdr7"**,"v":1}
{"level":30,"time":1533759955140,"msg":"After","pid":4985,"hostname":"henrique-pc",**"requestId":"cjkll2imo0003uhwgf4dutgz3"**,"v":1}
{"level":30,"time":1533759957183,"msg":"After","pid":4985,"hostname":"henrique-pc",**"requestId":"cjkll2qcg0004uhwgnmgztdr7"**,"v":1}

If you look closely you will see that, even though the call order of the logger function is non-linear, the requestId for each different request is mantained.

spongebob-magic.gif

It’s complete maaaagic!

Now, whenever you want to see the logs from a single request in isolation, you can again use jq and run:

jq 'select(.requestId == "cjkll2qcg0004uhwgnmgztdr7")' <log_file>

The output will be:

{
  "level": 30,
  "time": 1533759953632,
  "msg": "Before",
  "pid": 4985,
  "hostname": "henrique-pc",
  "requestId": "cjkll2qcg0004uhwgnmgztdr7",
  "v": 1
}
{
  "level": 30,
  "time": 1533759954665,
  "msg": "Middle",
  "pid": 4985,
  "hostname": "henrique-pc",
  "requestId": "cjkll2qcg0004uhwgnmgztdr7",
  "v": 1
}
{
  "level": 30,
  "time": 1533759957183,
  "msg": "After",
  "pid": 4985,
  "hostname": "henrique-pc",
  "requestId": "cjkll2qcg0004uhwgnmgztdr7",
  "v": 1
}

Further improvements

While the structure presented in this story works, it’s not practical for everyday use. It would be very tedious having to manually get the namespace and retrieve all values you need like in the example code above:

const namespace = getNamespace('logger');                                                 logger.info({ requestId: namespace.get('requestId') }, 'Before')</span>

Next time we will build a wrapper around pino to handle all of this transparently.


see-you-soon.webp

Bye!

Did you like what you just read? Buy me a beer with tippin.me


Part 2 is now available here:

No Comments Yet