My Take on Picking a Log Level

Logging? What’s there to talk about logging? It’s straightforward. Use the one of the log levels and log it.

The ubiquitous log4j‘s website says …

ERROR level designates error events that might still allow the application to continue running
FATAL level designates very severe error events that will presumably lead the application to abort
INFO level designates informational messages that highlight the progress of the application at coarse-grained level
WARN level designates potentially harmful situations

That is the theory of it. Let me tell you my opinion on a more practical matter.

Let us consider an User service. You know what it does, don’t you? Say the service supports a service method – /user/info?id={id}, to fetch user information.

The implementation of the above service method should be logging various events when a request is received and processed. Something like this ….

INFO | 2017-07-16 10:51:ss,sss | UserResource.java | 86 | Received getUserInfo request
DEBUG | 2017-07-16 10:51:ss,sss | UserResource.java | 87 | Dumping request parameters ....
..... {{more logs}}
INFO | 2017-07-16 10:51:ss,sss | UserService.java | 143 | Processing getUserInfo(id=xxxxx)
..... {{more logs}}
INFO | 2017-07-16 10:51:ss,sss | UserResource.java | 110 | getUserInfo request processed successfully

Suppose you call /user/info for an non-existent or invalid id, what do you think the LogLevel should be for the messages that say it is not a valid/existing user. A lot of code that I have come across log it as ERROR. Also, programmers are not against logging as ERROR. Because you requested the service to do something and it wasn’t able to do. Probably it ran through a bunch of code finally consulting the database (layer) to find that an user with the specified id does not exist. And so, it will not be able to return the user information. Maybe the layer that determined that the id to be invalid or non-existent threw an Exception, which bubbled up the layers to the entry point (UserResource) where it was caught, and status quo, logged as an ERROR.

Here is what I think. I think it should be logged as INFO.

INFO? 😝 He doesn’t know what he is talking about.

The description of the LogLevels by log4j‘s, or any logging library for that matter, is a guideline and explains what the level stands for. It cannot possibly suggest log level for specific situations or when a level may or may not be used. In our case, we can argue that ERROR fits the bill per its description – error events that might still allow the application to continue running. You requested information for an invalid user id which the service was not able to provide but the application is in a state to continue running. Also, Continue running can be interpreted in more than one way.

Why do I think that it should be logged as INFO? To me, ERROR is something the application (or the caller of the API) did not expect or foresee when satisfying a request. In our case, the service is behaving as expected. If you request information for an invalid/non-existing user id, it should not provide user information. Or do you expect to provide arbitrary information? So, it is behaving correctly and as expected. Hence, log it as INFO.

The layer that determines the validity of the specified user id may either throw an exception or return null. If it throws, it should throw an application/scenario specific exception (UserNotFoundException or MyAppException(1024, "User not found")) rather than raw Exception. I prefer1 to have trycatch guards only at the entry point where the app-specific exception should be caught and logged as INFO. I leave all other exceptions – raw and unrecognized, to be caught in the kitchen sink – catch(Exception ex), where it should be logged as an ERROR.

Let us say that the service ran through a bunch of code and is going to consult the database now. For some reason, if the database is not reachable, whatever exception comes out of trying to reach the database should logged as an ERROR. A time when you don’t have to feel bad to wake ops and engineers in the middle of the night. The database is expected to be available and reachable at all times. The service encountering such a scenario is not expected2, and in all likelihood, it wouldn’t be able to do proceed any further. So, ERROR it is.

Let us get back to the original case (/user/info?id={invalid_id}) for a minute. I am going logging it as INFO. That means somebody could bombard our service with requests with invalid ids, and the service will be logging as an INFO. Although the service is behaving as expected, such requests is not good for general health of the service and application. In such case, shouldn’t it be logged as ERROR so that concerned parties are alerted. True. However, it is not the service implementation that should bother. DDOS and such are concerns of a more higher level or lateral parts of the application. Say a traffic monitor. The traffic monitor (like New Relic or App Dynamics) is the right candidate to monitor, inspect and stop further requests with invalid ids from reaching the service implementation, and alert that somebody is trying to break in! Or whatever the scenario is that requires our attention.

What’s wrong in logging /user/info?id={invalid_id} as ERROR? It still works, doesn’t it? It does, no doubt about it. God forbid but when the only way to diagnose problems is to read logs, I don’t like to see false alarms (ERRORs). If you are thinking, Had this been logged as an ERROR, I would’ve spotted that bug. A bug in your code has got nothing to do with LogLevels. A bug is identified from the application behavior, which in most of the cases, can be traced back to the piece of code or library. LogLevels relate the severity of an event that you like to report. It could be informational, bread crumbs that you leave to track flow of control. Or it could be reported as errors indicating that the flow ran into an unexpected situation.

As a side note,

  • Following are candidates for DEBUG – request/response bodies, detailed call stack (entry/exit) logging etc.
  • Here is a scenario for WARN. Suppose the format of timestamp in your log messages is dictated by configuration, also say based on locale. Consider it to mandatory to have these formats defined and available to the application, although the application should assume a default for the worst case scenario. When the application is not able to find the timestamp format because it is not configured, it is good to log a warning that it is going to use the default format.

I chose specific examples, although not extensive, to illustrate how I would go about choosing LogLevels.

logger.info("Bye for now!");

  1. I prefer to guard only the entry point with a trycatch. I wouldn’t catch exceptions somewhere midway in the call stack unless and until I may want to perform specific handling of that exception – retries, translate to a different exception, call another method/end point to continue processing etc. Otherwise, the exception, which could be thrown anywhere in the stack, should be left to bubble up smoothly to wherever it is handled, which I like to keep it at the entry point. This is orthogonal to the exception handlers setup in the request handling pipeline. 
  2. When I say it is not expected, it does not mean such scenarios are not factored into the design. It means the application has encountered a scenario where a pact has been broken. Concerned parties, human and machine, have to come together to resolve the issue. 
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s