What’s Next For Speedometer

I have some thoughts about what’s next for the Speedometer benchmark, after the Speedometer 3 launch a few months back.

First: I’m proud of the role Mozilla played on Speedometer 3. We made significant contributions to the benchmark itself as well as to the structure of the group, enabling it to work towards consensus on many difficult decisions with principles aimed at getting the incentives right, such that competing on score would make real-world pages more responsive. With buy-in from each major engine, browsing is now a bit more pleasant for everyone regardless of their browser choice. This is a practical example of the strength of the Web as an open platform with multiple competitive implementations.

In parallel to developing the benchmark itself, there’s a much larger engineering effort still ongoing within Gecko — fixing nearly 500 bugs and counting, with dozens of engineers sweating the details to make typical web pages in Firefox Desktop and Android more responsive for users.

Incremental improvements for the next version

I don’t think we should change the underlying goals for the project, and largely want to build on the recent success. But we could make the benchmark better reflect the real-world Web and continue pushing forward engine performance by making some changes. Here’s what I’d like to see.

Faster releases

The longer the benchmark stands still, the worse it becomes at making the web faster for users. It needs to evolve over time, because:

  • The ecosystem changes, which makes the current set of tests less representative.
  • Engines run out of low hanging fruit, and end up needing to get juice out of changes tuned to quirks of tests or the benchmark methodology, which are less generally applicable to the web.

While there’s no release date set for Speedometer 4, it should be significantly shorter than the 6 years between versions 2 and 3.

More tests

Speedometer 2 had 15 tests, and Speedometer 3 has 20. More importantly, they test a much more diverse set of content than before — the posts on the Chromium Blog and the MS Edge blog do a good job outlining this in more detail.

Speedometer 4 should have a bunch more, and they should be as distinct from each other as possible to cast a wide net for engine improvements. Practically speaking, at this stage it would be best to have a near zero friction process for engines and open source contributors to create and share experimental tests, even if they’ll never be suitable for inclusion in an official release. That’ll help iterate on ideas and tests to include, and provide internal targets for engines who want to find optimizations outside of what’s measured in the official score.

For example, an engine could turn test cases in bug reports into Speedometer tests, which they could then run in their own CI as a regression test, and which may be useful for other engines to optimize. A mechanism for sharing these unofficial test cases - similar to Web Platform Tests - could facilitate a broader ecosystem of Speedometer tools and tests, in parallel with an official release cadence.

Asynchronous measurements

The core measurement loop in Speedometer 3 does a better job at capturing layout and paint than previous versions (see more detail on the WebKit blog), but the actual work being measured still has to be synchronous. This makes it impossible to measure functionality that relies on Promises or asynchronous events which are very common online. For example, when I researched text editing it was clear that it’d be good to add a test for the popular Monaco editor, but we were unable to do so because it had some initialization happen in Workers that we couldn’t deterministically pull into measured time.

Making the measurement loop asynchronous will open up all sorts of technical questions, like the effects of CPU throttling that we’ll need to carefully evaluate. But we can add the ability to the core runner as a developerMode feature without affecting current tests. Combined with a low friction development process for experiments we can build out a set of tests that use it to inform the evaluation.

Test against remote content

All tests are currently embedded in the main repo and hosted in iframes on the same domain: with the test logic itself being defined in the parent frame and accessing elements in child frames.

However, since engines with site isolation separate subframes from different domains into different processes, there are all sorts of real-world performance considerations that are not exercised today. While we may not ever ship cross-origin hosting to browserbench.org, we’d like the ability to test this ourselves in a variety of configurations (e.g., separate origin per test) in order to optimize Gecko.

There are technical details to figure out with the runner design here — the Chrome team has a promising suggestion which pushes the test logic into each individual test that’s worth exploring further. A design like this would also make it easier to include additional metrics like networking and page load — which aren’t feasible to include in official scores due to variability, but would be extremely useful signals for measurement in Firefox.

Writing JS Functions in BigQuery

While querying Custom Metrics in HTTP Archive it can be convenient to use JavaScript instead of SQL. Luckily there's a way to do this with BigQuery, which I learned reading the queries for the Web Almanac (link to all the SQL used to generate reports for the 2022 edition).

Here's a simple example that takes in a JSON string and returns an array of strings:

CREATE TEMPORARY FUNCTION
  helloWorld(input STRING)
  RETURNS ARRAY<STRING>
  LANGUAGE js AS '''
try {
  input = JSON.parse(input);
  return ["hello", input.str];
} catch (e) {
  return [];
}
''';

SELECT helloWorld(input) AS result
FROM 
  UNNEST([
    ('{"str": "world"}')
]) AS input;

This will return

-----------------
| Row  | result |
|------|--------|
| 1    | hello  |
|      | world  |
-----------------

Putting this together with some of the HTTP Archive datasets, here's a more real-world example based on the previous HTTP Archive Analysis of News Homepages:

-- #standardSQL
-- Taking an object like { html: 1, body: 1, div: 2 } and returning 4
CREATE TEMPORARY FUNCTION
  getTotalElements(element_count STRING)
  RETURNS STRING
  LANGUAGE js AS '''
try {
  element_count = JSON.parse(element_count);
  if (Array.isArray(element_count)) {
    return -1;
  }
  let c = 0;
  for (let k of Object.keys(element_count)) {
    c += element_count[k];
  }
  return c;
} catch (e) {
  return -1;
}
''';

-- Taking a JSON object like { "summary": { "--hjFeedbackAccentTextColor": { "get": [...], "set": [...] }, ... } }
-- and returning an array of strings like [ "--hjFeedbackAccentTextColor", ... ]
CREATE TEMPORARY FUNCTION
  getCssVariables(css_variables STRING)
  RETURNS ARRAY<STRING>
  LANGUAGE js AS '''
try {
  css_variables = JSON.parse(css_variables).summary;
  return Object.keys(css_variables);
} catch (e) {
  return [];
}
''';

SELECT
  url,
  _TABLE_SUFFIX AS client,
  getTotalElements(JSON_EXTRACT_SCALAR(payload, '$._element_count')) as total_elements,
  getCssVariables(JSON_EXTRACT_SCALAR(payload, '$._css-variables')) as css_variables,
  payload,
FROM
-- TABLESAMPLE SYSTEM helps make testing the query cheap, since the pages tables are very large
`httparchive.pages.2023_03_01_*` TABLESAMPLE SYSTEM (.1 PERCENT)

-- Restrict to only relatively popular hosts
WHERE NET.HOST(url) in (
  SELECT DISTINCT NET.HOST(origin)
  FROM
    `chrome-ux-report.all.202303`
  WHERE experimental.popularity.rank <= 10000
)

This is querying the payload field (see an example row at this gist) by extracting keys off the JSON string and passing the value into the JS function, running some field-specific logic, and then returning the results (in one case as a string and in another as an array of strings).

HTTP Archive Analysis of News Homepages

Here's a quick analysis I did for Speedometer 3 about the size of the DOM for some common news sites, in order to support the development of a test covering interactions within a complex news-like site.

How this was done

  • Gather a list of sites from https://en.wikipedia.org/wiki/Wikipedia:News_sources and child pages.
  • Run some BigQuery SQL against the HTTP Archive desktop and mobile crawls from 2023_03_01 to generate the average DOM depth, total element count, number of css variables, and a few other data points. These rely heavily Custom Metrics (with links to the specific metric in the table caption below).

Some more details and the source code to reproduce the results are available at this gist.

The raw and aggregated data is available in this spreadsheet.

Results

DOM Depth
Average Median Max Standard Deviation
mobile 11.99 11 128 7.17
desktop 12.15 11 131 7.36
Total Elements
Average Median Max Standard Deviation
mobile 1856.04 1621 10444 1357.67
desktop 1911.26 1714 10444 1351.03
Total CSS Variables
Average Median Max Standard Deviation
mobile 31.54 0 887 99.59
desktop 34.43 0 887 101.19

Automatically Publishing a Vite Project to GitHub Pages

Here are my notes from publishing https://bgrins.github.io/editor-tests/ from https://github.com/bgrins/editor-tests. There's an action at https://github.com/peaceiris/actions-gh-pages that makes this easy without setting up additional API tokens, but there are a few Vite specific steps worth noting.

Create the project

If you don't already have one npm create vite@latest, and cd into the directory. Then:

git init
git add .
git commit -m "initial commit"
git branch -M main
git remote add origin http://github.com/username/repo-name.git
git push -u origin main

Create gh-pages branch

If you don't already have a gh-pages branch you can create an empty one like so:

git switch --orphan gh-pages
git commit --allow-empty -m "gh-pages"
git push origin gh-pages

Add files to the repo

Create a vite.config.js with:

import { defineConfig } from "vite";
export default defineConfig({
  base: "./",
});

Create a .github/workflows/node.js.yml with:

name: Node.js CI

on:
  push:
    branches: ["main"]
  pull_request:
    branches: ["main"]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Use Node.js 18.x
        uses: actions/setup-node@v3
        with:
          node-version: 18.x
          cache: "npm"
      - run: npm install
      - run: npm run build --if-present

      - name: Deploy
        uses: peaceiris/actions-gh-pages@v3
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          publish_dir: ./dist

Update GitHub settings

At this point there's an action running at https://github.com/bgrins/editor-tests/actions which builds the project from scratch and publishes to the root of the gh-pages branch, making it available at something like https://bgrins.github.io/editor-tests/.

Fetch the latest copy of a page from the Wayback Machine

Simple way to fetch the latest copy of a given URL from the wayback machine, without including the banner.

  1. Fetch the metadata (like https://archive.org/wayback/available?url=https://example.com)
  2. Prepend id_ after the timestamp in the returned URL to remove the banner (http://web.archive.org/web/20221212143457/https://example.com/ becomes http://web.archive.org/web/20221212143457id_/https://example.com/).

Here's an example function with vanilla JS:

/**
 * Given a URL, return the latest copy of that URL in the Wayback machine.
 * To do this, find the last available response for the URL, then modify it
 * to omit the default banner in the UI.
 *
 * To do this for https://example.com, fetch https://archive.org/wayback/available?url=https://example.com
 * and get a response like:
 *
 *    url: "https://example.com",
 *    archived_snapshots: {
 *      closest: {
 *        status: "200",
 *        available: true,
 *        url: "http://web.archive.org/web/20221212143457/https://example.com/",
 *        timestamp: "20221212143457",
 *      },
 *    }
 *
 *  Then add "_id" before the URL portion and fetch that. In this case
 *  http://web.archive.org/web/20221212143457id_/https://example.com/
 */
async function get_wayback_response(url) {
  if (!url) {
    throw new Error("No URL provided");
  }
  const result = {
    timings: {},
  };
  const start = performance.now();
  const resp = await fetch(`http://archive.org/wayback/available?url=${url}`);
  result.metadata = await resp.json();
  result.timings.fetch_metadata = performance.now() - start;

  const closest = result.metadata?.archived_snapshots?.closest;
  if (!closest) {
    throw new Error(`No snapshot available from wayback server for ${url}`);
  }

  // Adding "id_" before the URL excludes the banner.
  const constructed_url = closest.url.replace(
    new RegExp(/(.*\/web\/[0-9]*)/),
    `$1id_`
  );
  result.response = await fetch(constructed_url);
  result.text = await result.response.text();
  result.timings.fetch_wayback = performance.now() - start;

  return result;
}