Shrinking Visual Web Arena: Reddit
September 2025The Visual Web Arena reddit app is a comprehensive reproduction of a forum website, with realistic content and a functional backend using Postmill. These sorts of web-environments-in-a-box are useful for both agentic evaluations and browser testing, where test cases often resist isolation into a static page, and testing on the live web is not reliable or reproducible.
Similar to my reproduction of the VWA classifieds app, there was some room to optimize images. This time, though, there was much less low hanging fruit with image compression, but some additional Docker build optimizations available.
To run the environment, you download a 49.8GB tar file and load it into Docker.
The reproduction is available as a package at ghcr.io/bgrins/vwa-reddit-optimized-bundled:latest
.
Here’s what the site looks like:

Why So Big?
As the paper explains, there's a lot of content inside of it, with disc space being driven by images.
The Reddit site also follows the same environment from WebArena, and represents a social forum platform. The site contains 31,464 posts containing a diverse set of images across different subreddits and forums, such as natural images, memes, consumer electronics, and charts.
Images
Most of the space within the container (38.5 GB) is in /var/www/html/public/submission_images
, across over 31K images (26,169 JPG, 4,130 PNG, 1,166 GIF). There's not nearly as much low-hanging fruit as in Classifieds (which shrunk from 73 GB → 4.9 GB), in part because Postmill doesn't maintain multiple copies for each image, instead using dynamic thumbnail generation with LiipImagineBundle. Changing image formats felt too invasive, so I used gifsicle
, optipng
, and jpegoptim
to process the images in-place. The filenames are content-hashed with the original upload, but changing the files doesn't seem to cause any problems.
This shrunk the image directory, but only a bit, to 34 GB.
Other Cleanup
Everything else was somewhat minor, like ignoring caches, node modules, and other folders not needed at runtime and using a minimal base image.
Docker layer caching misbehaved when trying to build it in a single multistage Dockerfile, causing the entire app directory to get re-copied even if none of the files changed and filling the hard drive. It turned out to be easiest to create a separate base image with the large app directory, and two separate images for the container, with and without Postgres bundled. The published tar includes the database in the container, which is convenient, but in some test environments (like The Zoo) it's better to use a shared database server.
Current sizes are 38.2 GB with the database bundled, and 36.3 GB without.
New Tests and Observations
I've added integration tests to make sure basic functionality (logging in, creating posts, etc) worked consistently in the original and reproduction.
A bug I noticed in both environments is that when upvoting a post, the score flips back to 1. I may have missed something, but it seems to be an issue with the data import where individual votes (submission_votes
) were not counted but the score (submissions.net_score
) was. Then once a vote is cast, Postmill recalculates the net score based on the single recorded vote.
This isn’t a huge deal for basic testing, but it does mean some research use cases are limited (e.g., test cases requiring accurate interaction counts or reconstructing synthetic user profiles). Regardless, I've captured this behavior as an expected failure in the test case.
