To run this example locally, install Ploomber and execute: ploomber examples -n guides/logging

To start a free, hosted JupyterLab: binder-logo

Found an issue? Let us know.

Have questions? Ask us anything on Slack.

Logging

Tutorial showing how to add logging to a pipeline.

Sample pipeline

The pipeline we’ll be using for this guide contains two tasks (a script and a function):

# Content of basic/pipeline.yaml
tasks:
  - source: script.py
    product: output/nb-log.ipynb
    papermill_params:
      log_output: True

  - source: tasks.function
    product: output/fn-log.txt

Note that the script task contains:

papermill_params:
    log_output: True

This extra configuration is required on each script/notebook task in your pipeline to enable logging. The code on each task isn’t important; they contain a for loop and log a message on each iteration. Let’s see it in action:

[1]:
%%sh
cd basic
ploomber build --log info --force
name      Ran?      Elapsed (s)    Percentage
--------  ------  -------------  ------------
script    True          4.8011          61.49
function  True          3.00683         38.51
INFO:blib2to3.pgen2.driver:Generating grammar tables from /Users/Edu/miniconda3/envs/projects/lib/python3.9/site-packages/blib2to3/Grammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /Users/Edu/Library/Caches/black/21.9b0/Grammar3.9.7.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/Users/Edu/Library/Caches/black/21.9b0/tmpphvqot_8'
INFO:blib2to3.pgen2.driver:Generating grammar tables from /Users/Edu/miniconda3/envs/projects/lib/python3.9/site-packages/blib2to3/PatternGrammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /Users/Edu/Library/Caches/black/21.9b0/PatternGrammar3.9.7.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/Users/Edu/Library/Caches/black/21.9b0/tmpdnjgogxt'
INFO:ploomber.dag.dag:Building DAG DAG("No name")
Building task 'script':   0%|          | 0/2 [00:00<?, ?it/s]INFO:ploomber.tasks.abc.NotebookRunner:Starting execution: NotebookRunner: script -> File('output/nb-log.ipynb')
INFO:papermill:Input Notebook:  /var/folders/3h/_lvh_w_x5g30rrjzb_xnn2j80000gq/T/tmpopci6mj4.ipynb
INFO:papermill:Output Notebook: /Users/Edu/dev/projects-ploomber/guides/logging/basic/output/nb-log.ipynb

Executing:   0%|          | 0/6 [00:00<?, ?cell/s]INFO:papermill:Executing notebook with kernel: python3
INFO:papermill:Executing Cell 1---------------------------------------
INFO:papermill:Ending Cell 1------------------------------------------

Executing:  17%|█▋        | 1/6 [00:01<00:06,  1.31s/cell]INFO:papermill:Executing Cell 2---------------------------------------
INFO:papermill:Ending Cell 2------------------------------------------
INFO:papermill:Executing Cell 3---------------------------------------
INFO:papermill:Ending Cell 3------------------------------------------
INFO:papermill:Executing Cell 4---------------------------------------
INFO:papermill:Ending Cell 4------------------------------------------

Executing:  67%|██████▋   | 4/6 [00:01<00:00,  3.62cell/s]INFO:papermill:Executing Cell 5---------------------------------------
INFO:papermill:Ending Cell 5------------------------------------------
INFO:papermill:Executing Cell 6---------------------------------------
INFO:papermill:INFO:__main__:[script log] Finished step 1...

INFO:papermill:INFO:__main__:[script log] Finished step 2...

INFO:papermill:INFO:__main__:[script log] Finished step 3...

INFO:papermill:INFO:__main__:[script log] Done.

INFO:papermill:Ending Cell 6------------------------------------------

Executing: 100%|██████████| 6/6 [00:04<00:00,  1.27cell/s]
INFO:ploomber.tasks.abc.NotebookRunner:Done. Operation took 4.8 seconds
Building task 'function': 100%|██████████| 2/2 [00:11<00:00,  5.98s/it]
INFO:ploomber.dag.dag: DAG report:
name      Ran?      Elapsed (s)    Percentage
--------  ------  -------------  ------------
script    True          4.8011          61.49
function  True          3.00683         38.51

We can see that the logging statements appear in the console. If you want to take a look at the code, click here.

Why not print?

Note that the snippets above use the logging module instead of print. Although print is a quick and easy way to display messages in the console, the logging module is more flexible. Hence, it is the recommended option.

Logging to a file

It’s common to send all your log records to a file. You can do so with the --log-file/-F option:

ploomber build --log info --log-file my.log

Logging to a file from Python

Alternatively, you can configure logging from Python, which gives you more flexibility:

[2]:
# you may store the contents of this cell in a .py file and then call it from the command line
# e.g., python run_pipeline.py
import logging
from pathlib import Path
import sys

from ploomber.executors import Serial
from ploomber.spec import DAGSpec

if Path('my.log').exists():
    Path('my.log').unlink()

logging.basicConfig(filename='my.log', format='%(levelname)s:%(message)s', level=logging.INFO)

# make sure we can import basic/tasks.py since basic/pipeline.yaml uses it
sys.path.append('basic')

dag = DAGSpec('basic/pipeline.yaml').to_dag()
dag.build(force=True)
[2]:
name Ran? Elapsed (s) Percentage
script True 5.35753 64.0298
functionTrue 3.00971 35.9702

Let’s look at the file contents:

[3]:
print(Path('my.log').read_text())
INFO:Generating grammar tables from /Users/Edu/miniconda3/envs/projects/lib/python3.9/site-packages/blib2to3/Grammar.txt
INFO:Writing grammar tables to /Users/Edu/Library/Caches/black/21.9b0/Grammar3.9.7.final.0.pickle
INFO:Writing failed: [Errno 2] No such file or directory: '/Users/Edu/Library/Caches/black/21.9b0/tmp66_f8p7u'
INFO:Generating grammar tables from /Users/Edu/miniconda3/envs/projects/lib/python3.9/site-packages/blib2to3/PatternGrammar.txt
INFO:Writing grammar tables to /Users/Edu/Library/Caches/black/21.9b0/PatternGrammar3.9.7.final.0.pickle
INFO:Writing failed: [Errno 2] No such file or directory: '/Users/Edu/Library/Caches/black/21.9b0/tmpmmlne02w'
INFO:Building DAG DAG("No name")
INFO:Starting execution: NotebookRunner: script -> File('basic/output/nb-log.ipynb')
INFO:Input Notebook:  /var/folders/3h/_lvh_w_x5g30rrjzb_xnn2j80000gq/T/tmpokdgmfti.ipynb
INFO:Output Notebook: /Users/Edu/dev/projects-ploomber/guides/logging/basic/output/nb-log.ipynb
INFO:Executing notebook with kernel: python3
INFO:Executing Cell 1---------------------------------------
INFO:Ending Cell 1------------------------------------------
INFO:Executing Cell 2---------------------------------------
INFO:Ending Cell 2------------------------------------------
INFO:Executing Cell 3---------------------------------------
INFO:Ending Cell 3------------------------------------------
INFO:Executing Cell 4---------------------------------------
INFO:Ending Cell 4------------------------------------------
INFO:Executing Cell 5---------------------------------------
INFO:Ending Cell 5------------------------------------------
INFO:Executing Cell 6---------------------------------------
INFO:INFO:__main__:[script log] Finished step 1...

INFO:INFO:__main__:[script log] Finished step 2...

INFO:INFO:__main__:[script log] Finished step 3...

INFO:INFO:__main__:[script log] Done.

INFO:Ending Cell 6------------------------------------------
INFO:Done. Operation took 5.4 seconds
INFO: DAG report:
name      Ran?      Elapsed (s)    Percentage
--------  ------  -------------  ------------
script    True          5.35753       64.0298
function  True          3.00971       35.9702

Controlling logging level

The Python’s logging module allows you to filter messages depending on their priority. For example, when running your pipeline, you may only want to display regular messages, but you may allow regular and debugging messages for more granularity when debugging. Since Ploomber runs tasks differently depending on their type (i.e., functions vs. scripts/notebooks), controlling the logging level requires a bit of extra work. Let’s use the same pipeline in the parametrized directory:

[4]:
%%sh
cd parametrized
ploomber build --log info --env--logging_level info --force
name      Ran?      Elapsed (s)    Percentage
--------  ------  -------------  ------------
script    True          4.94898       62.1819
function  True          3.00989       37.8181
INFO:blib2to3.pgen2.driver:Generating grammar tables from /Users/Edu/miniconda3/envs/projects/lib/python3.9/site-packages/blib2to3/Grammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /Users/Edu/Library/Caches/black/21.9b0/Grammar3.9.7.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/Users/Edu/Library/Caches/black/21.9b0/tmpwx_chahw'
INFO:blib2to3.pgen2.driver:Generating grammar tables from /Users/Edu/miniconda3/envs/projects/lib/python3.9/site-packages/blib2to3/PatternGrammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /Users/Edu/Library/Caches/black/21.9b0/PatternGrammar3.9.7.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/Users/Edu/Library/Caches/black/21.9b0/tmpcgm1fy1j'
INFO:ploomber.dag.dag:Building DAG DAG("No name")
Building task 'script':   0%|          | 0/2 [00:00<?, ?it/s]INFO:ploomber.tasks.abc.NotebookRunner:Starting execution: NotebookRunner: script -> File('output/nb-log.ipynb')
INFO:papermill:Input Notebook:  /var/folders/3h/_lvh_w_x5g30rrjzb_xnn2j80000gq/T/tmpnxr4nwnf.ipynb
INFO:papermill:Output Notebook: /Users/Edu/dev/projects-ploomber/guides/logging/parametrized/output/nb-log.ipynb

Executing:   0%|          | 0/6 [00:00<?, ?cell/s]INFO:papermill:Executing notebook with kernel: python3
INFO:papermill:Executing Cell 1---------------------------------------
INFO:papermill:Ending Cell 1------------------------------------------

Executing:  17%|█▋        | 1/6 [00:01<00:07,  1.41s/cell]INFO:papermill:Executing Cell 2---------------------------------------
INFO:papermill:Ending Cell 2------------------------------------------
INFO:papermill:Executing Cell 3---------------------------------------
INFO:papermill:Ending Cell 3------------------------------------------
INFO:papermill:Executing Cell 4---------------------------------------
INFO:papermill:Ending Cell 4------------------------------------------

Executing:  67%|██████▋   | 4/6 [00:01<00:00,  3.34cell/s]INFO:papermill:Executing Cell 5---------------------------------------
INFO:papermill:Ending Cell 5------------------------------------------
INFO:papermill:Executing Cell 6---------------------------------------
INFO:papermill:INFO:__main__:[script log] Finished step 1...

INFO:papermill:INFO:__main__:[script log] Finished step 2...

INFO:papermill:INFO:__main__:[script log] Finished step 3...

INFO:papermill:INFO:__main__:[script log] Done.

INFO:papermill:Ending Cell 6------------------------------------------

Executing: 100%|██████████| 6/6 [00:04<00:00,  1.23cell/s]
INFO:ploomber.tasks.abc.NotebookRunner:Done. Operation took 4.9 seconds
Building task 'function':  50%|█████     | 1/2 [00:04<00:04,  4.95s/it]INFO:tasks:[function log] Finished step 1...
INFO:tasks:[function log] Finished step 2...
INFO:tasks:[function log] Finished step 3...
INFO:tasks:[function log] Done.
INFO:ploomber.tasks.abc.PythonCallable:Done. Operation took 3.0 seconds
Building task 'function': 100%|██████████| 2/2 [00:11<00:00,  5.75s/it]
INFO:ploomber.dag.dag: DAG report:
name      Ran?      Elapsed (s)    Percentage
--------  ------  -------------  ------------
script    True          4.94898       62.1819
function  True          3.00989       37.8181

Let’s now run the pipeline but switch the logging level to debug, this will print the records we saw above, plus the ones with debug level:

[5]:
%%sh
cd parametrized
ploomber build --log debug --env--logging_level debug --force
name      Ran?      Elapsed (s)    Percentage
--------  ------  -------------  ------------
script    True          4.8555        61.7117
function  True          3.01254       38.2883
DEBUG:ploomber.spec.dagspec:DAGSpec enviroment:
EnvDict({'cwd': '/Users/Edu/d.../parametrized', 'here': '/Users/Edu/d.../parametrized', 'logging_level': 'debug', 'now': '2021-10-27T21:25:28.348284', ...})
DEBUG:ploomber.spec.dagspec:Expanded DAGSpec:
{   'tasks': [   {   'papermill_params': {'log_output': True},
                     'params': {'logging_level': '{{logging_level}}'},
                     'product': 'output/nb-log.ipynb',
                     'source': 'script.py'},
                 {   'params': {'logging_level': '{{logging_level}}'},
                     'product': 'output/fn-log.txt',
                     'source': 'tasks.function'}]}
DEBUG:ploomber.tasks.abc.NotebookRunner:Setting "script" status to TaskStatus.WaitingRender
DEBUG:ploomber.tasks.abc.PythonCallable:Setting "function" status to TaskStatus.WaitingRender
DEBUG:ploomber.spec.dagspec:Extracted upstream dependencies for task script: None
DEBUG:ploomber.spec.dagspec:Extracted upstream dependencies for task function: None
INFO:blib2to3.pgen2.driver:Generating grammar tables from /Users/Edu/miniconda3/envs/projects/lib/python3.9/site-packages/blib2to3/Grammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /Users/Edu/Library/Caches/black/21.9b0/Grammar3.9.7.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/Users/Edu/Library/Caches/black/21.9b0/tmpyzeel0ke'
INFO:blib2to3.pgen2.driver:Generating grammar tables from /Users/Edu/miniconda3/envs/projects/lib/python3.9/site-packages/blib2to3/PatternGrammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to /Users/Edu/Library/Caches/black/21.9b0/PatternGrammar3.9.7.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/Users/Edu/Library/Caches/black/21.9b0/tmpihkv4obh'
INFO:ploomber.dag.dag:Building DAG DAG("No name")
Building task 'script':   0%|          | 0/2 [00:00<?, ?it/s]INFO:ploomber.tasks.abc.NotebookRunner:Starting execution: NotebookRunner: script -> File('output/nb-log.ipynb')
INFO:papermill:Input Notebook:  /var/folders/3h/_lvh_w_x5g30rrjzb_xnn2j80000gq/T/tmpevgv_f3k.ipynb
INFO:papermill:Output Notebook: /Users/Edu/dev/projects-ploomber/guides/logging/parametrized/output/nb-log.ipynb

Executing:   0%|          | 0/6 [00:00<?, ?cell/s]INFO:papermill:Executing notebook with kernel: python3
INFO:papermill:Executing Cell 1---------------------------------------
INFO:papermill:Ending Cell 1------------------------------------------

Executing:  17%|█▋        | 1/6 [00:01<00:06,  1.37s/cell]INFO:papermill:Executing Cell 2---------------------------------------
INFO:papermill:Ending Cell 2------------------------------------------
INFO:papermill:Executing Cell 3---------------------------------------
INFO:papermill:Ending Cell 3------------------------------------------
INFO:papermill:Executing Cell 4---------------------------------------
INFO:papermill:Ending Cell 4------------------------------------------

Executing:  67%|██████▋   | 4/6 [00:01<00:00,  3.45cell/s]INFO:papermill:Executing Cell 5---------------------------------------
INFO:papermill:Ending Cell 5------------------------------------------
INFO:papermill:Executing Cell 6---------------------------------------
INFO:papermill:DEBUG:__main__:[script log] This is a message for debugging

INFO:papermill:INFO:__main__:[script log] Finished step 1...

INFO:papermill:INFO:__main__:[script log] Finished step 2...

INFO:papermill:INFO:__main__:[script log] Finished step 3...

INFO:papermill:INFO:__main__:[script log] Done.

INFO:papermill:Ending Cell 6------------------------------------------

Executing: 100%|██████████| 6/6 [00:04<00:00,  1.25cell/s]
INFO:ploomber.tasks.abc.NotebookRunner:Done. Operation took 4.9 seconds
Building task 'function':  50%|█████     | 1/2 [00:04<00:04,  4.86s/it]DEBUG:tasks:[function log] This is a message for debugging
INFO:tasks:[function log] Finished step 1...
INFO:tasks:[function log] Finished step 2...
INFO:tasks:[function log] Finished step 3...
INFO:tasks:[function log] Done.
INFO:ploomber.tasks.abc.PythonCallable:Done. Operation took 3.0 seconds
DEBUG:ploomber.tasks.abc.PythonCallable:Setting "function" status to TaskStatus.Executed
Building task 'function': 100%|██████████| 2/2 [00:11<00:00,  5.75s/it]
INFO:ploomber.dag.dag: DAG report:
name      Ran?      Elapsed (s)    Percentage
--------  ------  -------------  ------------
script    True          4.8555        61.7117
function  True          3.01254       38.2883

To get the code for the previous example, click here.

Implementation details

To keep the tutorial short, we overlooked some technical details. However, if you want to customize logging, they are essential to know.

Function tasks and sub-processes

By default, Ploomber runs function tasks in a child process. However, beginning on version 3.8, Python 3.8 switched to use spawn instead of fork on macOS, this implies that child processes do not inherit the logging configuration of their parents. That’s why you must configure a logger inside the function’s body:

import logging


def some_task(product):
    # the following line is required on Python>=3.8 if using macOS
    logging.basicConfig(level=logging.INFO)

    logger = logging.getLogger(__name__)

    # to log a message, call logger.info
    logger.info('Some message')

Scripts and notebooks

Unlike function tasks, which can run in the same process as Ploomber or in a child process, scripts and notebooks execute independently. Hence, any logging configuration made in the main process is lost, and We have to configure a separate logger at the top of the script/notebook.

Parallel execution

Logging is currently unavailable when using the Parallel executor.