How to use zim-tools
This guide explains how to compile zim-tools
on an Ubuntu workstation and use zimdump
to extract the files from a ZIM archive.
The Kiwix project is an open-source initiative that makes online content available for offline use. Kiwix allows users to download entire websites, such as Project Gutenberg, in a compressed file format known as ZIM. Kiwix also maintains OpenZIM, a set of open-source tools to used manipulate ZIM archives.
Install prerequisites
On a freshly-installed Ubuntu 22.04 server, install the prerequisite packages.
$ sudo apt-get install liblzma-dev \
libicu-dev \
libzstd-dev \
libxapian-dev \
meson \
libdocopt-dev \
libkainjow-mustache-dev \
libmagic-dev \
zlib1g-dev \
libgumbo-dev \
libicu-dev \
cmake
Install libzim
Create a project directory and clone the libzim repo.
$ mkdir ~/OpenZIM
$ cd ~/OpenZIM
$ git clone https://github.com/openzim/libzim.git
$ cd libzim
Compile and install libzim
.
$ meson . build
$ ninja -C build
$ sudo ninja -C build install
Install zim-tools
Clone the zim-tools repo.
$ cd /home/username/pg-files/OpenZIM
$ git clone https://github.com/openzim/zim-tools.git
$ cd zim-tools
Compile and install zim-tools
.
$ meson . build
$ ninja -C build
$ sudo ninja -C build install
Test zimdump
.
$ zimdump --version
zim-tools 3.2.0
libzim 8.2.0
+ libzstd 1.4.8
+ liblzma 5.2.5
+ libxapian 1.4.18
+ libicu 70.1.0
Dump a ZIM archive
As a test, download the Project Gutenberg ZIM file from Kiwix and verify the sha-256 sum matches.
This example is from May 2023. See the Kiwix website for the latest version.
$ curl -O --progress-bar https://download.kiwix.org/zim/gutenberg/gutenberg_en_all_2023-05.zim
$ sha256sum gutenberg_en_all_2023-05.zim
c57133c971c7cf82df907e8fe037e84d7ee2d54ec6bd72af97b6ba509e33d9cf gutenberg_en_all_2023-05.zim
Dump the ZIM to a dump
directory.
$ mkdir dump
$ zimdump dump --dir=dump gutenberg_en_all_2023-05.zim
Wait about 20 minutes for it to complete, then check the directory. There are 1,423,653 total files in this example.
$ ls -l dump | wc -l
1423653