Data Pipeline¶
-
class
DataPipeline(hr_img_path, scale, resize_filter=None, antialias=True, train_val_split=0.1, validationset_path=None, batch_size=8, augmentations=None, test_img_paths=None, crop=True, crop_size=80, 80, 3, num_crops=8, crop_naive=True, minimum_variation_patch=0.8, minimum_variation_batch=0.05, random_seed=None, shuffle_buffer_size=4096, jpg_noise=False, jpg_noise_level=50)[source]¶ Data pipeline based on tensorflow.data API. The high-resolution images from the supplied data path are read into a tf.data dataset, augmented according to supplied augmentations and then paired with a downscaled low-resolution version.
Optionally instead of using an image as whole, a number of patches can be cropped out of each HR sample. In regards to efficiency, cropping patches out of a smaller (a few thousand samples) high resolution dataset seems to yield better performance.
One drawback of randomly cropping patches from images, is that there might be batches which contain a lot of simple structures that are not very helpful for training. An example would be an image with a blue sky in the background and a more complex structure in the foreground. Randomly cropping might yield a batch of just simple blue samples cropped from the sky. To mitigate this there is an option to validate the diversity in a cropped patch and the diversity across a batch of crops. As of now this feature is highly experimental though and will negatively affect efficiency.
- Parameters
hr_img_path – Path to high-resolution training images.
scale – Resize factor for obtaining low-resolution images from supplied high-resolution images.
resize_filter – Resize filter to use for downsampling high-resolution images, defaults to bicubic. See tensorflow.image.ResizeMethod for available methods.
antialias – Whether to use antialiasing during downsampling.
train_val_split – Factor to split supplied training images into training and validation set. E.g. 0.1 means that 10% of training images will be hold back in validation set.
validationset_path – Optional Path to validation data, overrides
validationset_size-> no splitting will occur.batch_size – Number of samples per batch.
augmentations – List of augmentations to perform, see
simple_sr.utils.image.image_transformsfor available augmentations.test_img_paths – Path to test image data.
crop – Whether patches should be cropped from HR training images, only applies to training and validation sets.
crop_size – Tuple of (height, width, channels) to specify dimensions of cropped patches.
num_crops – Number of patches to crop for each HR sample.
crop_naive – If true cropped patches are always accepted, which might yield batches of very similar patches or batches only containing very simple structures. If crop_naive is false batches will only be accepted if the diversity is above the supplied threshold. This feature is currently experimental and comes with a performance penalty in regards to speed.
minimum_variation_patch – Threshold for variation inside one patch to be accepted into the batch. Only applies if
crop_naiveis False.minimum_variation_batch – Threshold for variation across batch of patches for batch to be accepted. Only applies if
crop_naiveis False.random_seed – Random seed for cropping, should only be used for testing as every cropped patch will be the same.
shuffle_buffer_size – Size of buffer for tf.data shuffling mechanism.
jpg_noise – Whether to apply jpg noise to LR samples.
jpg_noise_level – JPG noise level, 100 means max jpg degradation, 0 mean no degradation.
-
train_batch_generator()[source]¶ yields a tf.data prefetched dataset containing batched tuples of (lr, hr) training images
-
validation_batch_generator()[source]¶ yields a tf.data prefetched dataset containing batched tuples of (lr, hr) validation images
-
test_batch_generator(batch_size=8)[source]¶ Yields a tf.data dataset containing batched tuples of (test image, test image path). If no path for test images was supplied an empty list is returned
File paths will be used for matching crops with their whole original images when evaluating models.
- Parameters
batch_size – number of (test image, test image path) per batch
-
static
from_config(config)[source]¶ Convenience method to initialize a DataPipeline from a config.
- Parameters
config – Initialized ConfigUtil object from simple_sr.utils.config module
- Returns
Initialized DataPipeline object
-
static
eval_pipeline(config)[source]¶ - Convenience method to initialize a DataPipeline in evaluation mode.Evaluation mode means that images will be read from supplied paths in config, and tuples of (downsampled, ground truth) images will be available via
DataPipeline::validation_batch_generator.- Parameters
config – Initialized ConfigUtil object from simple_sr.utils.config module
- Returns
Initialized DataPipeline object with supplied images available in validation batch generator
-
static
inference_pipeline(config)[source]¶ - Convenience method to initialize
DataPipelinein inference mode.Inference mode means that supplied images will be read from config andtuples (image, image_path) will be available inDataPipeline::test_batch_generator.- Parameters
config – Initialized ConfigUtil object from simple_sr.utils.config module
- Returns
Initialized DataPipeline object with supplied images available in test batch generator