Skip to content

Commit 8e9de79

Browse files
committed
gh-137122: Improve the profiling section in the 3.15 what's new document
1 parent 572c780 commit 8e9de79

File tree

1 file changed

+50
-84
lines changed

1 file changed

+50
-84
lines changed

Doc/whatsnew/3.15.rst

Lines changed: 50 additions & 84 deletions
Original file line numberDiff line numberDiff line change
@@ -96,94 +96,60 @@ performance issues in production environments.
9696
Key features include:
9797

9898
* **Zero-overhead profiling**: Attach to any running Python process without
99-
affecting its performance
100-
* **No code modification required**: Profile existing applications without restart
101-
* **Real-time statistics**: Monitor sampling quality during data collection
102-
* **Multiple output formats**: Generate both detailed statistics and flamegraph data
103-
* **Thread-aware profiling**: Option to profile all threads or just the main thread
104-
105-
Profile process 1234 for 10 seconds with default settings:
106-
107-
.. code-block:: shell
108-
109-
python -m profiling.sampling 1234
110-
111-
Profile with custom interval and duration, save to file:
112-
113-
.. code-block:: shell
114-
115-
python -m profiling.sampling -i 50 -d 30 -o profile.stats 1234
116-
117-
Generate collapsed stacks for flamegraph:
118-
119-
.. code-block:: shell
120-
121-
python -m profiling.sampling --collapsed 1234
122-
123-
Profile all threads and sort by total time:
124-
125-
.. code-block:: shell
126-
127-
python -m profiling.sampling -a --sort-tottime 1234
128-
129-
The profiler generates statistical estimates of where time is spent:
130-
131-
.. code-block:: text
132-
133-
Real-time sampling stats: Mean: 100261.5Hz (9.97µs) Min: 86333.4Hz (11.58µs) Max: 118807.2Hz (8.42µs) Samples: 400001
134-
Captured 498841 samples in 5.00 seconds
135-
Sample rate: 99768.04 samples/sec
136-
Error rate: 0.72%
137-
Profile Stats:
138-
nsamples sample% tottime (s) cumul% cumtime (s) filename:lineno(function)
139-
43/418858 0.0 0.000 87.9 4.189 case.py:667(TestCase.run)
140-
3293/418812 0.7 0.033 87.9 4.188 case.py:613(TestCase._callTestMethod)
141-
158562/158562 33.3 1.586 33.3 1.586 test_compile.py:725(TestSpecifics.test_compiler_recursion_limit.<locals>.check_limit)
142-
129553/129553 27.2 1.296 27.2 1.296 ast.py:46(parse)
143-
0/128129 0.0 0.000 26.9 1.281 test_ast.py:884(AST_Tests.test_ast_recursion_limit.<locals>.check_limit)
144-
7/67446 0.0 0.000 14.2 0.674 test_compile.py:729(TestSpecifics.test_compiler_recursion_limit)
145-
6/60380 0.0 0.000 12.7 0.604 test_ast.py:888(AST_Tests.test_ast_recursion_limit)
146-
3/50020 0.0 0.000 10.5 0.500 test_compile.py:727(TestSpecifics.test_compiler_recursion_limit)
147-
1/38011 0.0 0.000 8.0 0.380 test_ast.py:886(AST_Tests.test_ast_recursion_limit)
148-
1/25076 0.0 0.000 5.3 0.251 test_compile.py:728(TestSpecifics.test_compiler_recursion_limit)
149-
22361/22362 4.7 0.224 4.7 0.224 test_compile.py:1368(TestSpecifics.test_big_dict_literal)
150-
4/18008 0.0 0.000 3.8 0.180 test_ast.py:889(AST_Tests.test_ast_recursion_limit)
151-
11/17696 0.0 0.000 3.7 0.177 subprocess.py:1038(Popen.__init__)
152-
16968/16968 3.6 0.170 3.6 0.170 subprocess.py:1900(Popen._execute_child)
153-
2/16941 0.0 0.000 3.6 0.169 test_compile.py:730(TestSpecifics.test_compiler_recursion_limit)
154-
155-
Legend:
156-
nsamples: Direct/Cumulative samples (direct executing / on call stack)
157-
sample%: Percentage of total samples this function was directly executing
158-
tottime: Estimated total time spent directly in this function
159-
cumul%: Percentage of total samples when this function was on the call stack
160-
cumtime: Estimated cumulative time (including time in called functions)
161-
filename:lineno(function): Function location and name
162-
163-
Summary of Interesting Functions:
164-
165-
Functions with Highest Direct/Cumulative Ratio (Hot Spots):
166-
1.000 direct/cumulative ratio, 33.3% direct samples: test_compile.py:(TestSpecifics.test_compiler_recursion_limit.<locals>.check_limit)
167-
1.000 direct/cumulative ratio, 27.2% direct samples: ast.py:(parse)
168-
1.000 direct/cumulative ratio, 3.6% direct samples: subprocess.py:(Popen._execute_child)
169-
170-
Functions with Highest Call Frequency (Indirect Calls):
171-
418815 indirect calls, 87.9% total stack presence: case.py:(TestCase.run)
172-
415519 indirect calls, 87.9% total stack presence: case.py:(TestCase._callTestMethod)
173-
159470 indirect calls, 33.5% total stack presence: test_compile.py:(TestSpecifics.test_compiler_recursion_limit)
174-
175-
Functions with Highest Call Magnification (Cumulative/Direct):
176-
12267.9x call magnification, 159470 indirect calls from 13 direct: test_compile.py:(TestSpecifics.test_compiler_recursion_limit)
177-
10581.7x call magnification, 116388 indirect calls from 11 direct: test_ast.py:(AST_Tests.test_ast_recursion_limit)
178-
9740.9x call magnification, 418815 indirect calls from 43 direct: case.py:(TestCase.run)
179-
180-
The profiler automatically identifies performance bottlenecks through statistical
181-
analysis, highlighting functions with high CPU usage and call frequency patterns.
99+
affecting its performance. Ideal for production debugging where you can't afford
100+
to restart or slow down your application.
101+
102+
* **No code modification required**: Profile existing applications without restart.
103+
Simply point the profiler at a running process by PID and start collecting data.
104+
105+
* **Flexible target modes**:
106+
107+
* Profile running processes by PID (``attach``) - attach to already-running applications
108+
* Run and profile scripts directly (``run``) - profile from the very start of execution
109+
* Execute and profile modules (``run -m``) - profile packages run as ``python -m module``
110+
111+
* **Multiple profiling modes**: Choose what to measure based on your performance investigation:
112+
113+
* **Wall-clock time** (``--mode wall``, default): Measures real elapsed time including I/O,
114+
network waits, and blocking operations. Use this to understand where your program spends
115+
calendar time, including when waiting for external resources.
116+
* **CPU time** (``--mode cpu``): Measures only active CPU execution time, excluding I/O waits
117+
and blocking. Use this to identify CPU-bound bottlenecks and optimize computational work.
118+
* **GIL-holding time** (``--mode gil``): Measures time spent holding Python's Global Interpreter
119+
Lock. Use this to identify which threads dominate GIL usage in multi-threaded applications.
120+
121+
* **Thread-aware profiling**: Option to profile all threads (``-a``) or just the main thread,
122+
essential for understanding multi-threaded application behavior.
123+
124+
* **Multiple output formats**: Choose the visualization that best fits your workflow:
125+
126+
* ``--pstats``: Detailed tabular statistics compatible with :mod:`pstats`. Shows function-level
127+
timing with direct and cumulative samples. Best for detailed analysis and integration with
128+
existing Python profiling tools.
129+
* ``--collapsed``: Generates collapsed stack traces (one line per stack). This format is
130+
specifically designed for creating flamegraphs with external tools like Brendan Gregg's
131+
FlameGraph scripts or speedscope.
132+
* ``--flamegraph``: Generates a self-contained interactive HTML flamegraph using D3.js.
133+
Opens directly in your browser for immediate visual analysis. Flamegraphs show the call
134+
hierarchy where width represents time spent, making it easy to spot bottlenecks at a glance.
135+
* ``--gecko``: Generates Gecko Profiler format compatible with Firefox Profiler
136+
(https://profiler.firefox.com). Upload the output to Firefox Profiler for advanced
137+
timeline-based analysis with features like stack charts, markers, and network activity.
138+
* ``--heatmap``: Generates an interactive HTML heatmap visualization with line-level sample
139+
counts. Creates a directory with per-file heatmaps showing exactly where time is spent
140+
at the source code level.
141+
142+
* **Live interactive mode**: Real-time TUI profiler with a top-like interface (``--live``).
143+
Monitor performance as your application runs with interactive sorting and filtering.
144+
145+
* **Async-aware profiling**: Profile async/await code with task-based stack reconstruction
146+
(``--async-aware``). See which coroutines are consuming time, with options to show only
147+
running tasks or all tasks including those waiting.
182148

183149
This capability is particularly valuable for debugging performance issues in
184150
production systems where traditional profiling approaches would be too intrusive.
185151

186-
.. seealso:: :pep:`799` for further details.
152+
.. seealso:: :pep:`799` for further details.
187153

188154
(Contributed by Pablo Galindo and László Kiss Kollár in :gh:`135953`.)
189155

0 commit comments

Comments
 (0)