Data Pipeline

class DataPipeline(hr_img_path, scale, resize_filter=None, antialias=True, train_val_split=0.1, validationset_path=None, batch_size=8, augmentations=None, test_img_paths=None, crop=True, crop_size=80, 80, 3, num_crops=8, crop_naive=True, minimum_variation_patch=0.8, minimum_variation_batch=0.05, random_seed=None, shuffle_buffer_size=4096, jpg_noise=False, jpg_noise_level=50)[source]

Data pipeline based on tensorflow.data API. The high-resolution images from the supplied data path are read into a tf.data dataset, augmented according to supplied augmentations and then paired with a downscaled low-resolution version.

Optionally instead of using an image as whole, a number of patches can be cropped out of each HR sample. In regards to efficiency, cropping patches out of a smaller (a few thousand samples) high resolution dataset seems to yield better performance.

One drawback of randomly cropping patches from images, is that there might be batches which contain a lot of simple structures that are not very helpful for training. An example would be an image with a blue sky in the background and a more complex structure in the foreground. Randomly cropping might yield a batch of just simple blue samples cropped from the sky. To mitigate this there is an option to validate the diversity in a cropped patch and the diversity across a batch of crops. As of now this feature is highly experimental though and will negatively affect efficiency.

Parameters
  • hr_img_path – Path to high-resolution training images.

  • scale – Resize factor for obtaining low-resolution images from supplied high-resolution images.

  • resize_filter – Resize filter to use for downsampling high-resolution images, defaults to bicubic. See tensorflow.image.ResizeMethod for available methods.

  • antialias – Whether to use antialiasing during downsampling.

  • train_val_split – Factor to split supplied training images into training and validation set. E.g. 0.1 means that 10% of training images will be hold back in validation set.

  • validationset_path – Optional Path to validation data, overrides validationset_size -> no splitting will occur.

  • batch_size – Number of samples per batch.

  • augmentations – List of augmentations to perform, see simple_sr.utils.image.image_transforms for available augmentations.

  • test_img_paths – Path to test image data.

  • crop – Whether patches should be cropped from HR training images, only applies to training and validation sets.

  • crop_size – Tuple of (height, width, channels) to specify dimensions of cropped patches.

  • num_crops – Number of patches to crop for each HR sample.

  • crop_naive – If true cropped patches are always accepted, which might yield batches of very similar patches or batches only containing very simple structures. If crop_naive is false batches will only be accepted if the diversity is above the supplied threshold. This feature is currently experimental and comes with a performance penalty in regards to speed.

  • minimum_variation_patch – Threshold for variation inside one patch to be accepted into the batch. Only applies if crop_naive is False.

  • minimum_variation_batch – Threshold for variation across batch of patches for batch to be accepted. Only applies if crop_naive is False.

  • random_seed – Random seed for cropping, should only be used for testing as every cropped patch will be the same.

  • shuffle_buffer_size – Size of buffer for tf.data shuffling mechanism.

  • jpg_noise – Whether to apply jpg noise to LR samples.

  • jpg_noise_level – JPG noise level, 100 means max jpg degradation, 0 mean no degradation.

train_batch_generator()[source]

yields a tf.data prefetched dataset containing batched tuples of (lr, hr) training images

validation_batch_generator()[source]

yields a tf.data prefetched dataset containing batched tuples of (lr, hr) validation images

test_batch_generator(batch_size=8)[source]

Yields a tf.data dataset containing batched tuples of (test image, test image path). If no path for test images was supplied an empty list is returned

File paths will be used for matching crops with their whole original images when evaluating models.

Parameters

batch_size – number of (test image, test image path) per batch

static from_config(config)[source]

Convenience method to initialize a DataPipeline from a config.

Parameters

config – Initialized ConfigUtil object from simple_sr.utils.config module

Returns

Initialized DataPipeline object

static eval_pipeline(config)[source]
Convenience method to initialize a DataPipeline in evaluation mode.
Evaluation mode means that images will be read from supplied paths in config, and tuples of (downsampled, ground truth) images will be available via DataPipeline::validation_batch_generator.
Parameters

config – Initialized ConfigUtil object from simple_sr.utils.config module

Returns

Initialized DataPipeline object with supplied images available in validation batch generator

static inference_pipeline(config)[source]
Convenience method to initialize DataPipeline in inference mode.
Inference mode means that supplied images will be read from config and
tuples (image, image_path) will be available in DataPipeline::test_batch_generator.
Parameters

config – Initialized ConfigUtil object from simple_sr.utils.config module

Returns

Initialized DataPipeline object with supplied images available in test batch generator