Save requests and responses with puppeteer

From apify’s blog post, they intercept requests from puppeteer using the following code snippet:

// On top of your code
const cache = {};

// The code below should go between newPage function and goto function

await page.setRequestInterception(true);
page.on('request', async(request) => {
    const url = request.url();
    if (cache[url] && cache[url].expires > Date.now()) {
        await request.respond(cache[url]);
        return;
    }
    request.continue();
});
page.on('response', async(response) => {
    const url = response.url();
    const headers = response.headers();
    const cacheControl = headers['cache-control'] || '';
    const maxAgeMatch = cacheControl.match(/max-age=(\d+)/);
    const maxAge = maxAgeMatch && maxAgeMatch.length > 1 ? parseInt(maxAgeMatch[1], 10) : 0;
    if (maxAge) {
        if (!cache[url] || cache[url].expires > Date.now()) return;
        
        let buffer;
        try {
            buffer = await response.buffer();
        } catch (error) {
            // some responses do not contain buffer and do not need to be catched
            return;
        }

        cache[url] = {
            status: response.status(),
            headers: response.headers(),
            body: buffer,
            expires: Date.now() + (maxAge * 1000),
        };
    }
});

We took the code and modified it to save requests, responses, and files to our local cache.

Intercept requests to backend with puppeteer

From the stackoverflow answer, we could do something like this to respond to requests that we have cached.

page.setRequestInterception(true)
page.on('request', (req) => {
  if (req.url() == 'your_url' && <we have a cached value>) {
    return request.respond({
      status: 200,
      body: <your_cached_body>,
      <your_cached_headers>
    });
  }
});

Tasks completed

  • Typed on a gigantic keyboard (thanks xmetrix)
  • Wrote blog in colemak
  • Saved requests, responses, and files to our local cache with puppeteer.
  • Progress made with intercepting and respond to all requests with puppeteer interceptor; however, more work is required. I believe the regex is broken.

Future tasks

  • Intercept and respond to all requests with puppeteer interceptor. Please clean up the code and fix the regex.
  • Chromium and chromedriver downloader.