Shrinking Visual Web Arena: A 93% Size Reduction With Image Optimization

July 2025

The Visual Web Arena classifieds app is a comprehensive reproduction of a classifieds website, with realistic content and a functional backend. These sorts of web-environments-in-a-box are useful for both agentic evaluations and browser testing, where test cases often resist isolation into a static page, and testing on the live web is not reliable or reproducible.

But the first thing I noticed when I set up the environment: it's 78GB! I've managed to shrink this down to 5.7GB while maintaining functionality.

Why So Big?

As the paper explains, there's a lot of stuff inside of it:

The Classifieds environment contains 65,955 listings, each with a title, text description, and a product image of the item being sold. To populate the site with realistic content, we scraped data across a variety of categories on Craigslist over 3 weeks, focusing on the Northeastern States of the US (similar to the geographic region in the Reddit site).

Here's what the site looks like:

A screenshot of a classifieds webpage

Unfortunately the images are mostly PNG, and the application stores 4 copies of each (preview, thumbnail, etc). By my count there are 84,141 posts in the latest docker image, with 336,564 total images. So when following the instructions here to pull the image, you end up with this:

An in-progress docker pull showing '1.654GB/76.64GB'

PNG → AVIF

PNG is terribly inefficient for this photos-of-stuff use case, but I was still surprised by just how much compression AVIF was able to achieve. Converting these images to led to dramatic space savings while maintaining visual quality and cross-browser support.

  • Original PNG files: 73GB
  • Converted AVIF files: 4.9GB
  • Space saved: 68.1GB (93.3% compression ratio, nearly 1/15th the original size)

Here's an overview of the conversion process:

  1. Extraction & convestion: Copied all images from the running container to the host system, then ran libvips with default settings to convert PNG files to AVIF format (took a while, just let it run overnight)
  2. Database updates: In addition to the files themselves (e.g., oc-content/uploads/1/1_thumbnail.png), there were two necessary database updates:
    • UPDATE oc_t_preference SET s_value = 'png,gif,jpg,jpeg,avif' WHERE s_section = 'osclass' AND s_name = 'allowedExt'
    • UPDATE oc_t_item_resource SET s_extension = 'avif', s_content_type = 'image/avif' WHERE s_extension IN ('png', 'jpg', 'jpeg');
  3. Create new Docker images: vwa_classifieds_web is based on metadata from the official web image (via docker history) — but with the much smaller upload directory and a bit more cleanup. The vwa_classifieds_db container needed more work, detailed below.

As far as I can tell, the environment maintains compatibility with the original VWA app (allowing login, comments, posting with images, etc), and I expect existing scripts and automation should work without changes. One caveat: I haven't attempted to reproduce the original paper's results, so it's possible this changes the behavior. This seems unlikely, since there's no obvious perceptual difference and the agents are likely screenshotting the entire page before processing it. If it does affect performance, it's probably a bug worth fixing in agents, since the web has contains many lossy images.

Additional Changes

For more details and code, check out the vwa_classifieds_optimized repo.

Reproducible Database State

The original environment required manual reset scripts to return to a known state and required moving around SQL files for it to boot correctly the first time. I've baked the initial database state directly into the MySQL container image, so it's easier to set up without a volume and automatically resets to a clean state on restart.

There was ambiguity about what was supposed to actually happen here since the reset script actually modifies the environment slightly from the initial state. And as far as I can tell the reset script does not run on init in recommended recommended docker compose environment, so it wasn't clear if it's meant to run both at startup and when the web endpoint was hit or just the latter. Made sense to me that it should be run (i.e., the initial state should match the restore state exactly), so I went ahead and baked it into the container.

Pre-configured Containers

Both the web and database containers are available as pre-built images on GitHub Container Registry (with an example docker compose setup that runs them):

  • ghcr.io/bgrins/vwa_classifieds_web:1
  • ghcr.io/bgrins/vwa_classifieds_db:1

Compression Stats

See stats below. I also put a few samples of the most compressed images in the repo.

=== BEST 10 FILES BY COMPRESSION RATIO ===

AVIF: 39368.png - 233.39 KB → 3.19 KB (73.24x compression)
AVIF: 23183.png - 315.76 KB → 4.37 KB (72.22x compression)
AVIF: 33699.png - 203.29 KB → 2.86 KB (71.05x compression)
AVIF: 14284.png - 279.91 KB → 4.00 KB (69.91x compression)
AVIF: 73976.png - 149.40 KB → 2.16 KB (69.19x compression)
AVIF: 50581.png - 174.40 KB → 2.54 KB (68.69x compression)
AVIF: 65885.png - 131.10 KB → 1.93 KB (68.07x compression)
AVIF: 15820.png - 136.12 KB → 2.01 KB (67.76x compression)
AVIF: 53833.png - 300.25 KB → 4.43 KB (67.72x compression)
AVIF: 75187.png - 142.86 KB → 2.12 KB (67.42x compression)

=== WORST 10 FILES BY COMPRESSION RATIO ===

AVIF: 84145_thumbnail.jpg - 14.81 KB → 9.71 KB (1.53x compression)
AVIF: 84146_thumbnail.jpg - 13.73 KB → 8.31 KB (1.65x compression)
AVIF: 84147_original.jpg - 184.44 KB → 110.48 KB (1.67x compression)
AVIF: 84147_thumbnail.jpg - 11.14 KB → 6.42 KB (1.73x compression)
AVIF: 84145_preview.jpg - 42.69 KB → 24.70 KB (1.73x compression)
AVIF: 84147.jpg - 64.89 KB → 36.82 KB (1.76x compression)
AVIF: 84145.jpg - 76.21 KB → 42.79 KB (1.78x compression)
AVIF: 84152_thumbnail.jpg - 10.92 KB → 6.07 KB (1.80x compression)
AVIF: 84147_preview.jpg - 32.97 KB → 17.72 KB (1.86x compression)
AVIF: 84146.jpg - 69.06 KB → 36.85 KB (1.87x compression)

=== COMPRESSION RATIO HISTOGRAM ===

AVIF Distribution:
 1.5- 5.1x:                                              60
 5.1- 8.7x:                                           1,638
 8.7-12.3x: ███████████████████                      44,603
12.3-15.9x: ████████████████████████████████████████ 90,647
15.9-19.5x: ███████████████████████████████████      81,017
19.5-23.0x: ███████████████████████                  53,569
23.0-26.6x: ██████████████                           31,848
26.6-30.2x: ███████                                  17,254
30.2-33.8x: ███                                       8,728
33.8-37.4x: █                                         3,970
37.4-41.0x:                                           1,846
41.0-44.6x:                                             714
44.6-48.1x:                                             367
48.1-51.7x:                                             180
51.7-55.3x:                                              94
55.3-58.9x:                                              35
58.9-62.5x:                                              25
62.5-66.1x:                                               6
66.1-69.7x:                                               9
69.7-73.2x:                                               4