The simplest way to mirror Wikipedia is using the Kiwix tools and a Zim format snapshot.
Getting a snapshot
Downloading one of these snapshots is fairly simple - the Kiwix site maintains pages with links to “zim” files for various facets of Wikipedia. They’re broken down by language (“en”, in the filename), topic-specific (eg Chemistry ) and other variations (options like “mini”, “maxi”, “nopic” allow us to further control how much data we’re willing download.)
So to download that Chemistry snapshot, we could use a command like this:
curl -L https://download.kiwix.org/zim/wikipedia/wikipedia_en_chemistry_mini_2024-06.zim -o wikipedia_en_chemistry_mini_2024-06.zim
(The -L
flag tells curl
to follow redirects from one URL to another (which the Kiwix links are) and the -o
allows us to specify the filename for the downloaded file.)
Serving the snapshot
On a Debian-based system (like Ubuntu or RaspberryPi OS) you can install the Kiwix Tools (including their server) like this:
sudo apt-get install kiwix-tools
…and then run the serving app with our snapshot like this:
kiwix-serve -p 8080 wikipedia_en_chemistry_mini_2024-06.zim
…which will print out a URL (something like “http://192.168.1.223:8080/”) where we can view our site.
(The -p
flag allows us to specify the port we’ll find our site on – if we omit this option, kiwix-serve
will try to run on the normal HTTP port (80) which will fail, unless we also run the command as root, by prepending sudo
to it.)