RMIT University
Chen.pdf (5.71 MB)

Enhancing Occupational Safety through Vision-based Integrative Technologies in the Construction and Building Industry

Download (5.71 MB)
posted on 2024-06-03, 00:31 authored by Haosen Chen
This thesis investigates the pivotal role of vision-based integrative technologies such as Deep Learning (DL), Augmented Reality (AR)/Virtual Reality (VR), the Internet of Things (IoT), and Building Information Modelling (BIM) in enhancing occupational safety in construction sites and building emergency management within the Architecture, Engineering, and Construction (AEC) industry. Recognising that both construction activities and emergency responses in buildings present distinct, complex arrays of risks, it becomes evident that traditional safety methods are no longer sufficient. By integrating advanced technologies, this thesis argues for the potential to significantly improve safety measures, offering comprehensive, adaptable solutions that not only protect onsite workers but also enhance the capabilities of emergency responders. This research aims to bridge the gap between outdated safety practices and the pressing need for innovative, real-time safety monitoring and response mechanisms tailored specifically to the construction and building sectors. The investigation begins by highlighting the critical role of advanced vision-based technologies in high-risk environments, stressing their significance in the AEC industry for enhancing occupational safety. It outlines the necessity of these technologies in addressing the intricate challenges faced on construction sites and during emergency situations in buildings. The importance of transitioning from conventional safety practices to more innovative and real-time solutions is underscored, advocating for a paradigm shift towards integrating cutting-edge technologies for safety management. A substantial portion of the thesis presented in Chapter 2 is dedicated to a comprehensive review of the application and effectiveness of DL in AEC safety management from 2010 to 2020. This analysis delves into the methodologies, applications, and outcomes of DL in the AEC industry, pinpointing the research gaps and potential future directions. It emphasises the crucial need for more effective DL applications to tackle the prevailing challenges in AEC safety management. By offering an in-depth examination of DL technologies, the research aims to showcase their vast potential in revolutionising safety practices within the industry, addressing both current limitations and future possibilities. The real-world application developments begin in Chapter 3. A lightweight DL model, namely YOLOv4-EfficientNet-B0 (YOLOv4-EFNB0) was proposed to reduce the high computational burden of DL implementation. To address the data scarcity problem, a context-guided data augmentation method was proposed, resulting in the creation of the augmented dataset MOCS-DA. The YOLOv4-EFNB0 model, when trained on MOCS-DA, displays significant improvements in detection capabilities. Notably, F1 scores for worker detection increased from 0.68 to 0.85, and the model saw a rise in Mean Average Precision from 0.41 to 0.52. In terms of computational efficiency, YOLOv4-EFNB0 stands out with its reduced weight size of 136 MB, parameters decreased to 3.76 × 107, and a computational load minimised to 1.21 × 1010. This model also excels in proximity detection, achieving an impressive accuracy rate of 96.76% and an average processing speed of about 25 frames per second, enabling near real-time performance. The integration of data augmentation and the use of the MOCS-DA dataset were pivotal in achieving these enhanced results, demonstrating the model’s effectiveness in occluded environments and its potential for practical applications in real-time construction site safety monitoring. Chapter 4 of the thesis presents the Visual Construction Safety Query (VCSQ) system, a groundbreaking integration of immersive AR and generative DL technologies, developed with the aim of enhancing the safety knowledge of construction workers. The motivation for developing the VCSQ system arises from the need to provide real-time guidance for the complex and dynamic safety challenges in construction sites, where traditional safety measures often fall short. By leveraging AR and DL, the system offers a more interactive and engaging approach to safety query, enabling workers to better understand and navigate the potential hazards of their surrounding environment. The VCSQ system features three core functionalities: real-time Image Captioning (IC), safety-centric Visual Question Answering (VQA), and keyword-based Image-Text Retrieval (ITR). These are powered by a vision-language model architecture, fine-tuned for accuracy in response to queries and integrated with a head-mounted AR device, providing an immersive experience that enhances situational awareness and decision-making capabilities in real-time. Performance evaluations of the VCSQ system demonstrate its effectiveness: the ITR module achieved high recall rates of 0.801, 0.835, 0.863, and 0.885 for Recall@5, @10, @50, and @100 respectively, and the VQA module recorded an average accuracy of 89%. These results underscore the system’s capability in accurately interpreting and responding to safety-related queries in a construction setting. Finally, the practical application and effectiveness of the VCSQ system are showcased through the examination of three use scenarios and the incorporation of survey feedback. Chapter 5 of the thesis presents a novel framework integrating BIM, IoT, and AR/VR to enhance fire safety and emergency response in modern buildings. The chapter outlines the development of a BIM-based fire alarming system, VR training modules, and an AR navigation system. A pilot study in a simulated fire scenario shows the framework effectiveness in decision-making and situational awareness. The study includes a controlled experiment comparing two groups: one with pathfinding assistance and another without. The quantitative data showcases significant improvements in training efficiency. Specifically, the experimental group, aided by pathfinding indications, completed their training in an average time of 436.1 seconds, significantly faster than the control group’s average of 828.5 seconds. This difference, statistically significant with a p-value of less than 0.05, highlights the effectiveness of the pathfinding technology in reducing the time required for planning rescue routes. Moreover, the standard deviation in the control group (166.1) was about twice that of the experimental group (90.7), indicating a more consistent performance among trainees using the pathfinding system. This suggests that the digital pathfinding indications not only expedited the training process but also provided more intuitive and effective guidance, particularly in navigating smoke-filled environments. Chapter 6 concludes the thesis by summarising its overarching findings and suggesting future research directions in the field of occupational safety in the AEC industry. It discusses the contributions made by the thesis in utilising vision-based technologies for safety enhancement and outlines recommendations for future advancements in the field.


Degree Type

Doctorate by Research


© Haosen Chen 2024

School name


Usage metrics



    Ref. manager