Node.js --prof and --process-prof flags

Published: Oct 27, 2023

Last updated: Oct 27, 2023

Introduction

Node.js --prof and --process-prof flags allow you to profile your Node.js application to identify performance bottlenecks.

The --prof and --prof-process flags in Node.js provide a streamlined way to profile the performance of a JavaScript program. Here are some scenarios and advantages of using these flags:

Performance Analysis:

Identify Bottlenecks: Easily find performance bottlenecks in your code by identifying functions or operations that consume the most CPU time.
Optimize Code: By identifying hot spots, you can focus your optimization efforts where they will have the most impact.

Simplicity:

Easy to Use: The --prof and --prof-process flags are straightforward to use, making it easy to generate and process profiling data without a steep learning curve.
Quick Setup: There's no need for additional setup or external tools; you can start profiling your application with just a couple of commands.

Low Overhead:

Minimal Impact: The --prof flag enables a sampling profiler, which has relatively low overhead compared to instrumenting profilers. This allows for profiling in scenarios where minimizing performance impact is crucial.

Cross-platform Profiling:

Platform Agnostic: The profiling is built into the Node.js runtime, making it a cross-platform solution that works on various operating systems and environments.

Offline Analysis:

Post-Execution Analysis: The --prof-process flag allows for offline processing of profiling data, which can be beneficial for analyzing performance post-execution, sharing profiling data with others, or archiving for later analysis.

Built-in Tooling:

Integrated with V8: The profiling flags leverage the built-in V8 profiler, ensuring tight integration with the JavaScript engine powering Node.js.

Baseline Profiling:

Initial Assessment: It's a good starting point for performance analysis before possibly moving to more sophisticated profiling tools if necessary.

Learning and Education:

Understanding Performance: It's a useful tool for learning about the performance characteristics of JavaScript and the Node.js runtime, especially for educational purposes or self-learning.

However, while --prof and --prof-process provide valuable insights, they might lack some of the more advanced features, user-friendly interfaces, and detailed visualizations offered by other profiling tools and environments such as Chrome DevTools. Depending on the depth of analysis and the level of detail required, developers might choose to use these flags for quick performance assessments or opt for more advanced profiling tools for deeper analysis.

It is important to note that these flags are not the same as the --inspect flag, which opens a debugging session that allows for more interactive profiling and debugging using DevTools.

Profiling Using `--prof` and `--prof-process` Flags

The --prof flag enables V8's built-in sampling CPU profile. Here's how you would use the --prof and --prof-process flags to profile your Node.js application:

Enable Profiling: Run your Node.js script with the --prof flag to generate a log file containing profiling data.

NODE_ENV=production node --prof script.js

This will create a file named isolate-0xnnnnnnnnnnnn-v8.log (the nnnnnnnnnnnn part will be different each time).

Note: it's recommended to run your script in production mode to avoid unnecessary overhead from development-related features.

Process the Profiling Data: Use the --prof-process flag to process the log file and generate a human-readable report.

node --prof-process isolate-0xnnnnnnnnnnnn-v8.log > processed-profile.txt

This will create a file named processed-profile.txt containing the processed profiling data.

Analyze the Data: Open processed-profile.txt and review the profiling data to understand where most of the CPU time is being spent in your script. You'll see function names, file names, and the number of ticks spent in each function.

[Summary]:
   ticks  total  nonlib   name
     100   10.0%   10.5%  JavaScript
      90    9.0%    9.5%  C++
      10    1.0%    1.1%  GC
      20    2.0%          Shared libraries
     780   78.0%          Unaccounted

[Bottom up (heavy) profile]:
  Note: percentage shows a share of a particular caller in the total
  amount of its parent calls.
  Callers occupying less than 1.0% are not shown.

 ticks parent  name
 ^^^^^ ^^^^^^  ^^^^^^^^

Comparison with `--inspect` Profiling

Type of Profiling:
- --prof and --prof-process: These flags enable and process sampling CPU profiling.
- --inspect: This flag opens a debugging session which allows for more interactive profiling and debugging using DevTools.
Ease of Use:
- --prof and --prof-process: These flags are straightforward for generating and processing profiling data quickly.
- --inspect: This may require a bit more setup, but offers a richer, more interactive profiling experience through DevTools.
Detail Level:
- --prof and --prof-process: These provide a high-level overview of where time is being spent in your script.
- --inspect: This provides a more detailed and interactive view, allowing you to drill down into call stacks, view memory allocation, and more.
Output:
- --prof and --prof-process: Output is a text-based report.
- --inspect: Output is viewed in a graphical interface, providing visual insights and allowing for more in-depth analysis.

Example: Profiling a Node.js Application

We are going to use the --prof and --prof-process approach with a de-optimized getUniqueElements function example.

To get started, we will create two files script-slow.js and script-fast.js to compare profile results.

touch script-slow.js script-fast.js

1. Create Your Script:

Add the following to the script-slow.js file:

function getUniqueElements(arr) {
  let uniqueArr = [];
  for (let i = 0; i < arr.length; i++) {
    let isUnique = true;
    for (let j = 0; j < uniqueArr.length; j++) {
      if (arr[i] === uniqueArr[j]) {
        isUnique = false;
        break;
      }
    }
    if (isUnique) {
      uniqueArr.push(arr[i]);
    }
  }
  return uniqueArr;
}

// Create a large array to amplify the issue
const largeArr = Array(10000)
  .fill()
  .map((_, idx) => idx % 100);
const uniqueArr = getUniqueElements(largeArr);
console.log(uniqueArr.length); // Output: 100

2. Run the Profiler:

Run the script with the --prof flag to generate a profiling log:

node --prof script-slow.js

3. Process the Profiling Data:

After running the script, you'll find a log file named isolate-0xnnnnnnnnnnnn-v8.log in your directory. Process this log file using the --prof-process flag:

node --prof-process isolate-0xnnnnnnnnnnnn-v8.log > processed-profile.txt

4. Analyze the Data:

Open the processed-profile.txt file and look for the getUniqueElements function. You'll be able to see the number of ticks spent in this function, which represents the amount of time spent executing it.

// ... other parts of the profile

 ticks parent  name
   ^^^^ ^^^^^^  ^^^^^^^^^^^^^^^^^^^^^^^^^^
   ...   ...%   getUniqueElements ...

// ... other parts of the profile

This simplified analysis should give you an idea of the time spent in the getUniqueElements function, indicating a performance bottleneck.

The biggest thing takeaway that we are going to demonstrate for the slow script is the fact that getUniqueElements shows up at all in this profile.

In this de-optimized version of getUniqueElements, the logic for checking uniqueness is implemented in JavaScript within your function. Before elaborating more on this, let's add the optimized version.

Optimized Version

Add the following to the script-fast.js file:

function getUniqueElements(arr) {
  return [...new Set(arr)];
}

// Create a large array to amplify the issue
const largeArr = Array(10000)
  .fill()
  .map((_, idx) => idx % 100);
const uniqueArr = getUniqueElements(largeArr);
console.log(uniqueArr.length); // Output: 100

In this code, we are using the Set data structure to get unique elements from the array. This and the array spread are both native JavaScript features, so the logic is implemented in C++ and executed by the JavaScript engine.

Follow a similar process from steps 2 to 4 to profile the optimized version of the script.

Once this is done, you'll notice something peculiar in the processed-process.txt file - the getUniqueElements function is no longer present in the profiling data.

This is because the logic for getting unique elements is implemented in C++ and executed by the JavaScript engine, so it doesn't show up in the profiling data.

In fact, JavaScript is 0% of the total ticks in the optimized version of the script:

[Summary]:
   ticks  total  nonlib   name
      0    0.0%    0.0%  JavaScript
    137   68.8%   99.3%  C++
      1    0.5%    0.7%  GC
     61   30.7%          Shared libraries
      1    0.5%          Unaccounted

Compared that to what happened prior:

[Summary]:
   ticks  total  nonlib   name
      1    0.9%    1.2%  JavaScript
     81   73.0%   96.4%  C++
      1    0.9%    1.2%  GC
     27   24.3%          Shared libraries
      2    1.8%          Unaccounted

A More Detailed Analysis

When you optimize the getUniqueElements function by using the Set object, the bulk of the work is offloaded to the underlying Node.js or V8 engine implementations which are highly optimized. The Set object is part of the ECMAScript standard and its operations are implemented in native code within the V8 engine, which powers Node.js. This native code is optimized for performance and is compiled to machine code, making operations like adding elements to a set or checking for uniqueness very fast.

In the de-optimized version of getUniqueElements, the logic for checking uniqueness is implemented in JavaScript within your function. This logic involves nested loops, which have a time complexity of O(n^2) in the worst case. This is why the function shows up in the profiling data when using the --prof and --prof-process flags - the JavaScript engine spends a significant amount of time executing this function.

On the other hand, in the optimized version of getUniqueElements, the logic for checking uniqueness is handled by the Set object, which likely has a time complexity closer to O(n) or even better for these operations due to its native, optimized implementation. As a result, the getUniqueElements function itself does very little work – it merely creates a new Set object and then creates an array from that set. The heavy lifting is done by native code within the V8 engine, outside of the JavaScript execution environment.

When profiling, the --prof and --prof-process flags are capturing the CPU time spent executing JavaScript code. Since the optimized getUniqueElements function offloads most of the work to native code, the function itself consumes very little CPU time in the JavaScript execution environment, and hence, may not show up in the profiling data or may show up with a significantly reduced number of ticks compared to the de-optimized version.

A Contrived Comparison

I used hyperfine just to run a comparison.

hyperfine --warmup 10 --runs 100 'NODE_ENV=production node ./script-slow' 'NODE_ENV=production node ./script-fast'
Benchmark 1: NODE_ENV=production node ./script-slow
  Time (mean ± σ):      84.0 ms ±  42.8 ms    [User: 51.8 ms, System: 13.8 ms]
  Range (min … max):    54.5 ms … 343.4 ms    100 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: NODE_ENV=production node ./script-fast
  Time (mean ± σ):      71.7 ms ±  18.9 ms    [User: 47.2 ms, System: 13.3 ms]
  Range (min … max):    49.7 ms … 238.2 ms    100 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  NODE_ENV=production node ./script-fast ran
    1.17 ± 0.67 times faster than NODE_ENV=production node ./script-slow

Take this with a grain of salt. Our implementation would only make significant impacts over a large number of iterations. This is just to show that the optimized version is faster.

Conclusion

In summary, the --prof and --prof-process flags provide a quick and easy way to identify performance bottlenecks, while --inspect provides a more interactive and detailed profiling experience.

We did this using two versions of a getUniqueElements function, one that was de-optimized and one that was optimized. The de-optimized version showed up in the profiling data due to its implementation details, while the optimized version did not.

At the very least, it is a good reminder that we should opt to use the built-in features of JavaScript and Node.js where possible, as they are likely expected to be more performant than our own implementations.

For a more real-world scenario, you may want to consider exploring other profiling tools or libraries that provide a more detailed and visual representation of the profiling data, such as the Chrome DevTools, node-inspect, or others. Depending on your needs and the level of detail required, you may choose one approach over the other.

References and Further Reading

Photo credit: dynastywind

Dennis O'Keeffe

@dennisokeeffe92

Melbourne, Australia

Hi, I am a professional Software Engineer. Formerly of Culture Amp, UsabilityHub, Present Company and NightGuru.

I am currently working on Visibuild.

Node.js --prof and --process-prof flags

Introduction

Node.js --prof and --process-prof flags

Introduction

Profiling Using --prof and --prof-process Flags

Comparison with --inspect Profiling

Example: Profiling a Node.js Application

1. Create Your Script:

2. Run the Profiler:

3. Process the Profiling Data:

4. Analyze the Data:

Optimized Version

A More Detailed Analysis

A Contrived Comparison

Conclusion

References and Further Reading

Dennis O'Keeffe

Get fresh posts + news direct to your inbox.

Share this post

Profiling Using `--prof` and `--prof-process` Flags

Comparison with `--inspect` Profiling