I was recently training CNN on the openimages dataset using Keras. I am using a custom batch generator together with the .fit_generator() method in Keras, and observed super slow training progress.
My code looks something like this:
from keras.utils import Sequence
import numpy as np
def __init__(self, image_filepaths, y, batch_size):
self.image_filepaths = image_filepaths
self.y = y
self.batch_size = batch_size
# do something
def __getitem__(self, idx):
indices = np.random.choice(range(len(self.y)), size=self.batch_size)
X = np.array([self._read_and_aug_image(self.image_filepaths[i]) for i in indices])
y = y[indices]
return X, y
generator = BatchGenerator(image_filepaths=image_filepaths, y=y, batch_size=32)
history = model.fit_generator(
I wasted a lot of time debugging the model structure, loss, and optimizer, but the problem is much simpler. I eventually found it by printing out the indices been sampled.
The problem with the code is that when the generator gets duplicated on multiple workers, the random states also get copied, so the 8 workers have the same random state. As a result, during training, the model will see the exact same batch 8 times before seeing a new batch. The fix is easy, just insert a
np.random.seed() before sampling the indices.