Den Dribbles

Build a CLI Puppeteer tool to screenshot all the things!

July 01, 2020

If you like this project, be sure to follow me on GitHub and Twitter for other sweet projects and isolation banter.

In tonight’s example, we’re going to use Puppeteer to power a CLI tool that can take valid LaTex as an argument and output a PNG screenshot of that LaTeX converted to valid math!

LaTeX output

This particular example will be on mathematics, however the methods that we are going to use to get there can be applicable to any valid HTML that you wish to pass through into the CLI tool. Get creative!

The expectations for today’s project is that you have the usual suspects for NodeJS setup.

We are going to install both Puppeteer and Yargs Parser. Don’t worry if you haven’t used them before, prerequisite knowledge is not required. However, you may want to check the documentation on each repo for more information or better understanding.

Initialising the project

Create a new project, jump in to initialise it and install the following:

mkdir puppeteer-katex
cd puppeteer-katex
# begin project with basic settings
yarn init -y
yarn add puppeteer yargs-parser

We are using yargs-parser to convert our CLI arguments into something useful.

Puppeteer will be used to run a headless version of Chrome which we can then use to take a screenshot.

The code in its full glory

As opposed to other tutorials, I am going to share the full code and we can break it down to explain what is happening.

In the root of the project directory, add this to index.js:

const puppeteer = require("puppeteer")
const argv = require("yargs-parser")(process.argv.slice(2))

if (!argv.math) {
  console.error("Error: --math value required")
  process.exit(1)
}

const html = `<!DOCTYPE html>
<!-- KaTeX requires the use of the HTML5 doctype. Without it, KaTeX may not render properly -->
<html>
  <head>
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/katex.min.css" integrity="sha384-zB1R0rpPzHqg7Kpt0Aljp8JPLqbXI3bhnPWROx27a9N0Ll6ZP/+DiW/UqRcLbRjq" crossorigin="anonymous">

    <!-- The loading of KaTeX is deferred to speed up page rendering -->
    <script src="https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/katex.min.js" integrity="sha384-y23I5Q6l+B6vatafAwxRu/0oK/79VlbSz7Q9aiSZUvyWYIYsd+qj+o24G5ZU2zJz" crossorigin="anonymous"></script>

    <!-- To automatically render math in text elements, include the auto-render extension: -->
    <script src="https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/contrib/auto-render.min.js" integrity="sha384-kWPLUVMOks5AQFrykwIup5lo0m3iMkkHrD0uJ4H5cjeGihAutqP0yW0J6dpFiVkI" crossorigin="anonymous"
        onload="renderMathInElement(document.body);"></script>
    <style>
      .katex { font-size: 48px !important; } 
    </style>
  </head>
  <span id="mykatex" style="display: inline-block;">...</span>
  <script>
    katex.render(String.raw\`${argv.math}\`, mykatex);
  </script>
</html>`

const main = async () => {
  let browser
  try {
    browser = await puppeteer.launch()
    const page = await browser.newPage()
    await page.goto(`data:text/html,${html}`, { waitUntil: "networkidle0" })
    const element = await page.$("#mykatex")
    await element.screenshot({ path: "math.png" })
  } catch (err) {
    console.error(err)
  } finally {
    await browser.close()
  }
}

main()

So what’s happening here? Let’s take it part by part.

Requiring our npm packages

const puppeteer = require("puppeteer")
const argv = require("yargs-parser")(process.argv.slice(2))

These two lines are fairly straight forward. The first pulls in our Puppeteer package, while the second does the heavy lifting for parsing the CLI arguments.

The docs for yargs-parser can share more, however all you need to know is this.

If we run node index.js, argv will equal the following:

{ _: [] }

If we ran node index.js hello --flag=3 --test friend, argv would equal:

{ _: [ 'hello' ], flag: 3, test: 'friend' }

A few other blog posts I write use Yargs Parser, so I won’t go too deep. You can see the original post here.

Validating our flags

if (!argv.math) {
  console.error("Error: --math value required")
  process.exit(1)
}

Here, we are validating that our CLI call has a --math flag passed to it.

Note: We are not validating the math is valid KaTeX itself in tonight’s example.

The HTML string to be used for Puppeteer

const html = `<!DOCTYPE html>
<!-- KaTeX requires the use of the HTML5 doctype. Without it, KaTeX may not render properly -->
<html>
  <head>
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/katex.min.css" integrity="sha384-zB1R0rpPzHqg7Kpt0Aljp8JPLqbXI3bhnPWROx27a9N0Ll6ZP/+DiW/UqRcLbRjq" crossorigin="anonymous">

    <!-- The loading of KaTeX is deferred to speed up page rendering -->
    <script src="https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/katex.min.js" integrity="sha384-y23I5Q6l+B6vatafAwxRu/0oK/79VlbSz7Q9aiSZUvyWYIYsd+qj+o24G5ZU2zJz" crossorigin="anonymous"></script>

    <!-- To automatically render math in text elements, include the auto-render extension: -->
    <script src="https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/contrib/auto-render.min.js" integrity="sha384-kWPLUVMOks5AQFrykwIup5lo0m3iMkkHrD0uJ4H5cjeGihAutqP0yW0J6dpFiVkI" crossorigin="anonymous"
        onload="renderMathInElement(document.body);"></script>
    <style>
      .katex { font-size: 48px !important; } 
    </style>
  </head>
  <span id="mykatex" style="display: inline-block;">...</span>
  <script>
    katex.render(String.raw\`${argv.math}\`, mykatex);
  </script>
</html>`

This part is us creating a valid HTML file.

What is important to note is that we are using template strings to allows us to use string interpolation for a argv.math variable.

This variable will be our --math flag value, where we will pass in valid LaTeX math equations.

Note: Reiterating that we won’t be validating the LaTeX (please don’t hate me).

Let’s break the HTML down part-by-part.

Loading KaTeX

<link
  rel="stylesheet"
  href="https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/katex.min.css"
  integrity="sha384-zB1R0rpPzHqg7Kpt0Aljp8JPLqbXI3bhnPWROx27a9N0Ll6ZP/+DiW/UqRcLbRjq"
  crossorigin="anonymous"
/>

<!-- The loading of KaTeX is deferred to speed up page rendering -->
<script
  src="https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/katex.min.js"
  integrity="sha384-y23I5Q6l+B6vatafAwxRu/0oK/79VlbSz7Q9aiSZUvyWYIYsd+qj+o24G5ZU2zJz"
  crossorigin="anonymous"
></script>

<!-- To automatically render math in text elements, include the auto-render extension: -->
<script
  src="https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/contrib/auto-render.min.js"
  integrity="sha384-kWPLUVMOks5AQFrykwIup5lo0m3iMkkHrD0uJ4H5cjeGihAutqP0yW0J6dpFiVkI"
  crossorigin="anonymous"
  onload="renderMathInElement(document.body);"
></script>

These scripts basically come straight from the KaTeX starter template.

It is pulling in the required JS and stylesheet links from a CDN that we can have load once we run this JavaScript in Puppeteer. This will save us a bunch of hassle and can be a strategy used for anything else you may want to build in future (ie CDN for D3 etc)!

Upping the font size

<style>
  .katex {
    font-size: 48px !important;
  }
</style>

I’ve added a styles into the HTML to simply up the font size of our output. Nothing special here!

The important KaTeX HTML

<span id="mykatex" style="display: inline-block;">...</span>
<script>
  katex.render(String.raw\`${argv.math}\`, mykatex);
</script>

This last part of the HTML is where the magic happens. I am simply added a span with the ID mykatex.

Then I add some inline styles to that span as a helper for our screenshot to ensure all the text is retained within the span.

Finally, I am running a katex.render function within script tags which I have taken from the KaTeX API docs.

The following is as mentioned earlier (but is the most complex part). As the whole thing is within JS template tags, I can pass an argument of ${argv.math} inside the String.raw template tag. This means that I need to escape the backticks within, so it can look a little funky.

That value argv.math will come from our --math flag that we will pass from the command line ie node index.js --math="f(a,b,c) = (a^2+b^2+c^2)^3"

Our main function

const main = async () => {
  let browser
  try {
    // Launch Puppeteer and setup a new page
    browser = await puppeteer.launch()
    const page = await browser.newPage()

    // Tell Puppeteer to load our HTML variable.
    // Wait until "networkidle0", which from the
    // docs means "consider navigation to be finished
    // when there are no more than 0 network connections
    // for at least 500 ms."
    await page.goto(`data:text/html,${html}`, { waitUntil: "networkidle0" })

    // Wait for the <span id="mykatex" /> element to be visible
    // and assign it to "element".
    const element = await page.$("#mykatex")

    // Create a screenshot and save it locally to "math.png"
    await element.screenshot({ path: "math.png" })
  } catch (err) {
    console.error(err)
  } finally {
    await browser.close()
  }
}

main()

Finally, we have a main function. Once called, the following happens in main:

  1. Launch Puppeteer and setup a new page
  2. Tell Puppeteer to load our HTML variable. Wait until “networkidle0”, which from the docs means “consider navigation to be finished when there are no more than 0 network connections for at least 500 ms”.
  3. Wait for the <span id="mykatex" /> element to be visible and assign it to “element”.
  4. Create a screenshot and save it locally to “math.png”

Easy right? It all comes together in ~50 lines of JavaScript!

We are keeping them all within a try-catch-finally block. For those who have not seen the finally, it is a block of code that will always execute in the block. It allows us to close the Puppeteer browser everytime, regardless of any errors.

Running the code

Here is the fun part! Let’s run some code.

I’ve added a few examples of valid LaTeX that we can try out! It doesn’t matter if you’ve never LaTeX before. It can be funky, but I’ll provide some to try out!

node index.js

When we try running the program without the --math flag, we get back Error: --math value required. Great!

Let’s try an example with a math flag.

node index.js --math="f(a,b,c) = (a^2+b^2+c^2)^3"

Katex one

Eureka!

“Great success.” - Borat.

Let’s run a few more examples and see what comes out.

node index.js --math="u=\frac{-y}{x^2+y^2}\,,\quad v=\frac{x}{x^2+y^2}\,,\quad w=0\,."

Katex two

node index.js --math="e^x=1+x+\frac{x^2}{2}+\frac{x^3}{6}+\cdots=\sum_{n\geq0}\frac{x^n}{n!}"

Katex three

node index.js --math="\int_a^bu\frac{d^2v}{dx^2}\,dx=\left.u\frac{dv}{dx}\right|_a^b-\int_a^b\frac{du}{dx}\frac{dv}{dx}\,dx."

Katex four

BAM! We have successfully used Puppeteer to load valid HTML and screenshot some sweet mathematics!

Conclusion

In today’s post, we used Puppeteer and Yargs Parser to build a small CLI tool to screenshot the output of valid LaTeX for us!

This example can be used and applied to anything else you want really. I do it for D3 graphs, generating static maps, generative art… the options are limited only by your creativity!

The final code can be found on my GitHub

Final Code

/**
 * Usage from CLI:
 *
 * node index.js --math="f(a,b,c) = (a^2+b^2+c^2)^3"
 * node index.js --math="u=\frac{-y}{x^2+y^2}\,,\quad v=\frac{x}{x^2+y^2}\,,\quad w=0\,."
 * node index.js --math="e^x=1+x+\frac{x^2}{2}+\frac{x^3}{6}+\cdots=\sum_{n\geq0}\frac{x^n}{n!}"
 * node index.js --math="\int_a^bu\frac{d^2v}{dx^2}\,dx=\left.u\frac{dv}{dx}\right|_a^b-\int_a^b\frac{du}{dx}\frac{dv}{dx}\,dx."
 */

const puppeteer = require("puppeteer")
const argv = require("yargs-parser")(process.argv.slice(2))

if (!argv.math) {
  console.error("Error: --math value required")
  process.exit(1)
}

const html = `<!DOCTYPE html>
<!-- KaTeX requires the use of the HTML5 doctype. Without it, KaTeX may not render properly -->
<html>
  <head>
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/katex.min.css" integrity="sha384-zB1R0rpPzHqg7Kpt0Aljp8JPLqbXI3bhnPWROx27a9N0Ll6ZP/+DiW/UqRcLbRjq" crossorigin="anonymous">

    <!-- The loading of KaTeX is deferred to speed up page rendering -->
    <script src="https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/katex.min.js" integrity="sha384-y23I5Q6l+B6vatafAwxRu/0oK/79VlbSz7Q9aiSZUvyWYIYsd+qj+o24G5ZU2zJz" crossorigin="anonymous"></script>

    <!-- To automatically render math in text elements, include the auto-render extension: -->
    <script src="https://cdn.jsdelivr.net/npm/katex@0.11.1/dist/contrib/auto-render.min.js" integrity="sha384-kWPLUVMOks5AQFrykwIup5lo0m3iMkkHrD0uJ4H5cjeGihAutqP0yW0J6dpFiVkI" crossorigin="anonymous"
        onload="renderMathInElement(document.body);"></script>
    <style>
      .katex { font-size: 48px !important; } 
    </style>
  </head>
  <span id="mykatex" style="display: inline-block;">...</span>
  <script>
    katex.render(String.raw\`${argv.math}\`, mykatex);
  </script>
</html>`

const main = async () => {
  let browser
  try {
    // Launch Puppeteer and setup a new page
    browser = await puppeteer.launch()
    const page = await browser.newPage()

    // Tell Puppeteer to load our HTML variable.
    // Wait until "networkidle0", which from the
    // docs means "consider navigation to be finished
    // when there are no more than 0 network connections
    // for at least 500 ms."
    await page.goto(`data:text/html,${html}`, { waitUntil: "networkidle0" })

    // Wait for the <span id="mykatex" /> element to be visible
    // and assign it to "element".
    const element = await page.$("#mykatex")

    // Create a screenshot and save it locally to "math.png"
    await element.screenshot({ path: "math.png" })
  } catch (err) {
    console.error(err)
  } finally {
    await browser.close()
  }
}

main()

Resources and Further Reading

  1. Final project code
  2. KaTeX from example
  3. Yargs Parser
  4. Original Yargs Parser blog post
  5. Puppeteer
  6. SixthForm - KaTeX examples
  7. CSS in HTML tags - w3schools
  8. Puppeteer Docs - page.goto

A personal blog on all things of interest. Written by Dennis O'Keeffe, Follow me on Twitter