How did I make the interpolation at the end? A magician never reveals his secrets. Luckily I’m no magician. The answer is quite simple: I trained a StyleGAN2 model. When I trained the model I had, and as of writing still have, a single 1080 GTX. The 8GB memory on this machine has caused me quite some trouble with training. The stats on Nvidia’s original GitHub page also aren’t very encouraging. They estimate that it would cost 69 days and 23 hours of training on the FFHQ dataset in configuration-f with a single Tesla V100 GPU. The 1080 GTX has less memory and 69 days is a bit much. What to do? How about transfer learning?
The concept of transfer learning is simple. A network spends most of its time learning about low-level features. These low-level features are similar across many imaging tasks. So instead of training a network from scratch, we leverage the effort that has already been put in training the low-level features on another dataset, in this case, FFHQ, and train on another dataset. This dataset can anything you like, but keep in mind that you’ll need quite a bit of data. FFHQ has 70k images! Recently a new paper call ‘Training Generative Adversarial Networks with Limited Data‘ came out, which might help if you don’t have a lot of data. This will likely be the topic of a future blog.
In my case, the dataset came from a paper called ‘A shell dataset, for shell features extraction and recognition‘ by Zhang et al. Hats of to Springer Nature for making science publicly accessible for a change. The data can be downloaded here. The dataset needs some cleaning. Using some simple thresholding I centered each image and removed the background. There’s another problem: FFHQ images have a resolution of 1024×1024, but these images are way smaller. Even in this day and age, people are taking low-resolution photos, unfortunately, presumably to save disk space or to annoy data scientists. I ended up upscaling the images with an AI technique, I don’t remember which one but any will do. Now that we have the data I’ll introduce the code.
Nvidia’s ProGAN/StyleGAN papers are brilliantly executed but an eye-sore to look at code-wise (to me at least). The codebase is fairly involved compared to other ML projects. It’s long, has some custom CUDA functions, and is written in TensorFlow. I tried TensorFlow in 2016, had a terrible time, switched to Pytorch, and never looked back. If TensorFlow is your thing then go to Nvidia’s official repository (you will need to clone this repository anyway so you might as well check it out) and follow the instructions there. I will be using Rosalinity’s implementation. You can read through Rosalinity’s instructions instead if you want, besides some minor tweaks it says the exact same.
First, you need to create a Lightning Memory-Mapped Database (LMDB) file from your data. As described in the repository you can generate multiple image sizes. I only need 1024×1024 images. The dataset path is the path to your dataset (who would have thought) and the LMDB path is where the resulting file will be stored. All this information is also available on Rosalinity’s repository. One thing it doesn’t say though is that your dataset folder must contain subfolders with data. If you pass a folder without another folder inside it, it will think the folder has no images.
python prepare_data.py --out LMDB_PATH size 1024 DATASET_PATH
Download the FFHQ f configuration .pkl file from the Nvidia repository and also clone the repository. Use the following command to convert the weight from pickle to Pytorch format. ~/stylegan2 refers to the Nvidia repository.
python convert_weight.py --repo ~/stylegan2 stylegan2-ffhq-config-f.pkl
Normally you’d now start generating faces and nothing is holding you back from doing so, but I want shells not faces. To get shells we train on the shell LMDB file. Compared to the training command from Rosalinity you’ll only need to change three things: the checkpoint, model size, and batch size. You can also choose to augment your dataset by mirroring the image over the y-axis. Shells are mostly dextral or right-handed, so mirroring the images may bias the data a little, but not many people will notice hopefully.
python train.py --batch 2 --size 1024 --checkpoint stylegan2-ffhq-config-f.pt LMDB_PATH
I set the batch size to 2 because that is all my poor 1080 GTX can handle before it runs out of memory. From here on out it’s a waiting game. The longer you train the better the model becomes up to a certain point. After training, you can generate samples with the following command. Note that the checkpoint is not stylegan2-ffhq-config-f.pt but your own checkpoint!
python generate.py --sample N_FACES --pics N_PICS --ckpt PATH_CHECKPOINT
This video shows what happens during training. It was trained on a dataset of shell illustrations. Some of you might have noticed Richard Dawkins pop up at the start. This wasn’t a coincidence. Take a lot at projection.py to project your own images into the latent space.