TY - GEN
T1 - Machine-Learning Based Solutions For Automatic Image Enhancement In Real Estate
AU - Vega, Juan Francisco Marin
PY - 2023/6/29
Y1 - 2023/6/29
N2 - The use of machine learning techniques for image enhancement is an activeacademic field of research that has been rapidly evolving thanks to the increase in computational capabilities, the improvement of neural approachesand the use of ever-expanding datasets. This thesis aims to study and improve such techniques, focusing on real-world scenarios where human photoeditors are required. The content of this thesis has been developed withinEsoft Systems A/S, a company focused on delivering media solutions in thehighly challenging and demanding domain of real estate.The initial part of this thesis pivots around single image enhancement with emphasis on high resolution images. Two different approaches are proposed for this task: i) making use of Perceptual Losses for adjusting general image color and brightness; ii) extending this with Generative Adversarial Networks for image synthesis. For the first framework, a multi-step approach is proposed: first we consider operating with low-resolution representations, and later on, the low-resolution transformations are transferred to the original inputs, which are expected to be in a much higher resolution. For the second framework, or generative approach, in which new content can be hallucinated to complete missing parts of the input image, we take the opposite route. The original image is split and processed in independent tiles that are eventually combined to generate the high-resolution output.The second block of this work focuses on multi-frame fusion, in particular ghost-artifact-free approaches. Ghost artifacts are produced after blending multiple frames that suffer camera movement or capture objects in motion within frames. Special architectures and neural frameworks have been defined to deal with these complex situations. This work approaches this areawith two different proposals: i) we set the focus on efficient architectures for High Dynamic Range (HDR) imaging from Low Dynamic Range (LDR) inputs; ii) a framework for multi-frame fusion and retouching in LDR domain is proposed.The experimentation carried out in this work demonstrates that for image enhancement, high quality and consistency can be achieved in moderate resolutions. Moreover, these already enhanced images can be utilized for transferring the transformations to full-resolution outputs achieving stateof-the-art results. Based on the proposed solutions, images with a resolution beyond 8K can be processed with ease under moderate hardware requirements.For challenging inputs, when extreme image transformations are required or new content needs to be synthesized, Generative Adversarial Networks (GANs) can be utilized with promising results. This work also demonstrates that these techniques, despite being more resource-heavy, can be extended to achieve effective grid image processing.Our results also indicate the benefit of the right architectural decisions in order to obtain low-resource high-quality multi-frame fusion with ghostartifact prevention. This work suggests that different approaches must be taken when performing multi-frame fusion, especially when the target images contain additional human retouches.Lastly, this work demonstrates the feasibility of the different studied and developed techniques within a highly skilled and challenging domain such as real estate imaging. Techniques that serve within Esoft Systems A/S to scale production without needing to increase human resources.
AB - The use of machine learning techniques for image enhancement is an activeacademic field of research that has been rapidly evolving thanks to the increase in computational capabilities, the improvement of neural approachesand the use of ever-expanding datasets. This thesis aims to study and improve such techniques, focusing on real-world scenarios where human photoeditors are required. The content of this thesis has been developed withinEsoft Systems A/S, a company focused on delivering media solutions in thehighly challenging and demanding domain of real estate.The initial part of this thesis pivots around single image enhancement with emphasis on high resolution images. Two different approaches are proposed for this task: i) making use of Perceptual Losses for adjusting general image color and brightness; ii) extending this with Generative Adversarial Networks for image synthesis. For the first framework, a multi-step approach is proposed: first we consider operating with low-resolution representations, and later on, the low-resolution transformations are transferred to the original inputs, which are expected to be in a much higher resolution. For the second framework, or generative approach, in which new content can be hallucinated to complete missing parts of the input image, we take the opposite route. The original image is split and processed in independent tiles that are eventually combined to generate the high-resolution output.The second block of this work focuses on multi-frame fusion, in particular ghost-artifact-free approaches. Ghost artifacts are produced after blending multiple frames that suffer camera movement or capture objects in motion within frames. Special architectures and neural frameworks have been defined to deal with these complex situations. This work approaches this areawith two different proposals: i) we set the focus on efficient architectures for High Dynamic Range (HDR) imaging from Low Dynamic Range (LDR) inputs; ii) a framework for multi-frame fusion and retouching in LDR domain is proposed.The experimentation carried out in this work demonstrates that for image enhancement, high quality and consistency can be achieved in moderate resolutions. Moreover, these already enhanced images can be utilized for transferring the transformations to full-resolution outputs achieving stateof-the-art results. Based on the proposed solutions, images with a resolution beyond 8K can be processed with ease under moderate hardware requirements.For challenging inputs, when extreme image transformations are required or new content needs to be synthesized, Generative Adversarial Networks (GANs) can be utilized with promising results. This work also demonstrates that these techniques, despite being more resource-heavy, can be extended to achieve effective grid image processing.Our results also indicate the benefit of the right architectural decisions in order to obtain low-resource high-quality multi-frame fusion with ghostartifact prevention. This work suggests that different approaches must be taken when performing multi-frame fusion, especially when the target images contain additional human retouches.Lastly, this work demonstrates the feasibility of the different studied and developed techniques within a highly skilled and challenging domain such as real estate imaging. Techniques that serve within Esoft Systems A/S to scale production without needing to increase human resources.
U2 - 10.21996/jwbw-d914
DO - 10.21996/jwbw-d914
M3 - Ph.D. thesis
PB - Syddansk Universitet. Det Naturvidenskabelige Fakultet
ER -