===========================
Notes on ``sys.monitoring``
===========================

.. note:: This documentation was written at the advent of Python 3.12. Future
          versions of Python may behave differently. It is however hoped that
          most of the concepts herein will remain relevant.

Python 3.12 introduced a new monitoring system under ``sys.monitoring``. This
system lets users monitor a selection of events that may be interesting for e.g.
performance profiling or debugging purposes. Event monitoring is set "per tool",
so that multiple tools can be running at the same time. For each tool the events
can be monitored globally per thread or locally per code object (or a mixture of
both). For each tool-event combination a callback can be registered that will be
called on the occurrence of the event. The callbacks are just regular
functions and can do most of the things supported by Python, they also have the
ability to return a special value to tell the monitoring system to disable
triggering future events for the current code location.

What does this mean for Numba?
------------------------------

When the interpreter "encounters" a monitoring event (it actually issues them)
it triggers any callbacks that are associated with that event across all tools
that have registered monitoring for said event. In the case of Numba there are
problems...

Numba has made it so that there's no Python interpreter involved in the
execution of a function, the function is compiled and its execution path
exists only in machine code. To get to the machine code from the interpreter
the Numba dispatcher is invoked, this is the last place in the stack where
(in ``nopython`` mode) the Python interpreter is readily available. The
dispatcher is also in some way part of the execution of the function, without
the dispatcher the call to the machine code cannot easily happen from user
space. As a result of this, the monitoring types and event types that Numba
can support are somewhat limited as there's such limited interpreter
involvement in execution!

Looking at monitoring types in turn. Local monitoring is requested by setting
monitoring on a code object. In practice this instructs the interpreter to
augment the bytecode at runtime by switching certain opcodes for
"instrumented" opcodes. These instrumented opcodes go via a special path in the
interpreter loop whereby they will issue an "event" in association with a
particular instruction at a particular offset. For example, a ``RETURN`` opcode
might be replaced by an ``INSTRUMENTED_RETURN`` and a ``PY_RETURN`` event
would by issued when the instrumented instruction is interpreted. This event and
the offset at which it occurred being forwarded to the monitoring system.
Unfortunately this presents an issue for Numba, there is no interpreter involved
with execution and so events will not be emitted. It does seem like it would be
possible to handle a few types of event, such as ``PY_START`` and ``PY_RETURN``
by analysing the code object at dispatch time. However, it's possible for a user
to de-instrument the code object and/or dynamically disable monitoring at a
particular code location whilst executing, and as a result emulating the
semantics of this would be prohibitively challenging and would likely require
constant interaction with the interpreter. As a result, Numba does not support
local event monitoring, the compiled function will still execute correctly if it
has been set, it just has no effect on ``sys.monitoring``.

Considering per-thread global monitoring, this manifests as the user setting
some global state on the interpreter for a given thread. This state can be
accessed via the ``sys.monitoring`` Python API, it's also accessible via
CPython internals. This kind of monitoring is a little more amenable to working
with Numba as there's no code object involved and state mutation during
execution can only occur via object mode calls.

What does Numba do in practice?
-------------------------------
As there's no Python or C API to issue events (the concept is heavily linked to
the interpreter itself), Numba has to look for tool-event combinations at
appropriate locations in the dispatch sequence and then manually call the
associated callbacks (essentially doing what the interpreter does when it issues
an event). In the case of the Numba dispatcher, only a few events are relevant
and only four are supported, namely

* ``sys.monitoring.events.PY_START`` (Python function starting).
* ``sys.monitoring.events.PY_RETURN`` (Python function returning).
* ``sys.monitoring.events.RAISE`` (Python function raised an exception).
* ``sys.monitoring.events.PY_UNWIND`` (Python function exiting during exception
  unwinding).

These events don't really exist in the machine code, but would exist had the
interpreter interpreted the equivalent bytecode. The dispatcher therefore checks
for monitoring of ``PY_START`` just before control is transferred to the machine
code and calls any associated callbacks. The same is done for ``PY_RETURN`` just
after control is transferred back to the dispatcher from the machine code. This
behaviour essentially emulates the interpreter executing bytecode and lets
tools such as ``cProfile`` be able to "see" the Numba compiled function as part
of the standard interpreted execution. In the case of an exception being raised
in the machine code, the associated error state is handled just after control is
transferred back to the dispatcher, at this point ``RAISE`` and ``PY_UNWIND``
event monitoring is checked and registered callbacks are invoked.

A note on offsets. The callback functions often take an "offset" argument which
is the bytecode offset at which the event triggering the callback was
encountered. In the case of ``PY_START`` this seems to be associated with the
offset of the ``RESUME`` bytecode. In the case of ``PY_RETURN`` this is
associated with the offset of one of the ``RETURN`` bytecodes, most generally
this would only be known at runtime as there could be multiple return paths. As
a result, Numba elects to just set all offsets to zero. It may eventually be
possible to do some analysis and transfer the appropriate runtime information to
the dispatcher from the machine code, however, at the present time the effort to
do this vastly outweighs the gain.