Another point from the trenches of the real world:

I've had great success with metrics over logging in many situations.

Logs are really nice for knowing what went wrong in a system. And if the logs have some kind of structure (JSON, S-exps, Erlang Terms, ...) you can also dump all context information relevant to the log entry. Modern systems such as Kibana can read such structure and build a search index over them. It is very useful when trying to figure out what went wrong. In a distributed system, you keep a unique request id in the structural log so you can join log entries from multiple subsystems easily in the centralized logging platform.

Likewise, whenever a given event occurs, you bump a counter or add an entry to a histogram (HdrHistogram comes to mind). This allows you to export counts to a foreign system for plotting of how the system fares internally. It is much cheaper than rendering a log line in the system, and it is almost as informative if you don't need the log line, but rather its occurrence. Live timing histograms also has the advantage that problems tend to show up in the small before a catastrophe. So you can alter the operating point of the system dynamically long before the catastrophe occur in the real world.

I almost never have debug-style logging lines in my code anymore. I much prefer adding a live tracing to the system instead (Erlang has tracing facilities, DTrace is useful on Illumos and FreeBSD, etc). Granted, you can't do this posthumously of an error, but on the other hand, you can tailor it to the situation you have. It also takes care of the need to recompile and redeploy with debug logging (in which case you can't posthumously handle an error).

The right way to handle system failure is to take a snapshot of the systems state (or of part of the system to which the problem pertains). Store this snapshot somewhere (in the cloud) and index them, so you can go back with a debugger, attach to the core dump, and inspect what went wrong. Post-mortem debugging is needed because many of the errors which occur lies outside the imagination of the programmers: people abuse systems in ways nobody thought possible.

On Mon, Jan 16, 2017 at 4:15 PM Anil Madhavapeddy <anil@recoil.org> wrote:

> On 9 Jan 2017, at 18:52, Chet Murthy <chetsky@gmail.com> wrote:
>
> All,
>
> I hope this is the right place to ask this question. I've been
> writing a nontrivial distributed system (well, a number of them over
> the last few years) and have had need of a robust and flexible logging
> framework. Specifically, I've been using "bolt" and its descendant,
> "volt", which provide camlp4 syntax extensions. These extensions make
> the syntax of the logging statements significantly less verbose, and
> that in itself ia a valuable thing.
>
> With the arrival of ppx rewriters, I realize that the camlp4/camlp5
> method of adding syntax to ocaml is deprecated. So I wonder: is there
> some really good logging toolkit out there, that I've overlooked.
>
> I'm aware of a number of different packages, but only bolt/volt have
> syntax extensions, and it's my belief that they're essential to making
> effortless pervasive log-line instrumentation.
>
> But perhaps I just haven't looked hard enough .... So .... before I
> go write my own, I figured I'd ask the list if there were such a
> thing.

Dear Chet,

In MirageOS, we've been moving away from syntax extensions and
towards placing the logging directives as closures directly within the
code. This is slightly slower in the case of debug logging, but in
practise for distributed systems we are finding that having "permanent"
logging at different levels is more valuable than the "recompile with
debug logging" that we used to use.

The basis library for this that our libraries are mostly using now in the
forthcoming MirageOS3 is Daniel Buenzli's Logs library:
http://erratique.ch/software/logs

It does not have a syntax extension out of the box, but it does provide
flexible support for multiple backends, and has an Lwt module included.

Hope that helps!

regards,
Anil

--
Caml-list mailing list. Subscription management and archives:
https://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs