复杂视网膜显微手术中器械检测方法

何昱昊,陈一巍,樊金宇,等. 复杂视网膜显微手术中器械检测方法[J]. 光电工程,2025,52(2): 240269. doi: 10.12086/oee.2025.240269
引用本文: 何昱昊,陈一巍,樊金宇,等. 复杂视网膜显微手术中器械检测方法[J]. 光电工程,2025,52(2): 240269. doi: 10.12086/oee.2025.240269
He Y H, Chen Y W, Fan J Y, et al. An instrument detection method for complex retinal microsurgery[J]. Opto-Electron Eng, 2025, 52(2): 240269. doi: 10.12086/oee.2025.240269
Citation: He Y H, Chen Y W, Fan J Y, et al. An instrument detection method for complex retinal microsurgery[J]. Opto-Electron Eng, 2025, 52(2): 240269. doi: 10.12086/oee.2025.240269

复杂视网膜显微手术中器械检测方法

  • 基金项目:
    国家重点研发计划(2021YFF0700503, 2022YFC2404201);中国科学院稳定支持基础研究领域青年团队计划(YSBR-067);江苏省科技计划项目(BK20220263);苏州市姑苏创新创业领军人才(ZXL2021425);苏州市基础研究试点项目(SSD2023018)
详细信息
    作者简介:
    *通讯作者: 史国华,shigh@sibet.ac.cn。
  • 中图分类号: TP391.4

  • CSTR: 32245.14.oee.2025.240269

An instrument detection method for complex retinal microsurgery

  • Fund Project: National Key R&D Program of China (2021YFF0700503, 2022YFC2404201), CAS Project for Young Scientists in Basic Research (YSBR-067), Jiangsu Science and Technology Plan Program (BK20220263), Gusu Innovation and Entrepreneurship Leading Talents in Suzhou City (ZXL2021425), and Suzhou Basic Research Pilot Project (SSD2023018)
More Information
  • 针对视网膜显微手术中的复杂干扰情况,本文利用深度学习的方法提出一种手术器械检测算法。首先,构建并手动标注了RET1数据集,并以YOLO框架为基础,针对部分图像退化,提出利用SGConv和RGSCSP特征提取模块增强模型对图像细节特征的提取能力。针对IoU损失函数收敛速度慢以及边界框回归不准确的问题,提出DeltaIoU边界框损失函数。最后,运用动态头部和解耦头部的集成对特征融合的目标进行检测。实验结果表明,提出的方法在RET1数据集上mAP50-95达到72.4%,相较原有算法提升了3.8%,并能在复杂手术场景中对器械有效检测,为后续手术显微镜自动跟踪以及智能化手术导航提供有效帮助。

  • Overview: The integration of computer vision into ophthalmic surgical procedures, particularly in digital navigation microscopes, has opened new avenues for real-time instrument tracking. Accurate localization of surgical instruments during retinal surgery presents unique challenges, such as reflections, motion artifacts, and obstructions, which impede precise detection. To address these challenges, this study introduces RM-YOLO, a specialized deep learning-based detection algorithm tailored for retinal microsurgery. The model is designed to ensure accurate instrument localization in real-time, offering substantial advancements over existing approaches.

    Given the scarcity of annotated data specific to retinal microsurgery, the RET1 dataset was constructed, derived from high-resolution surgical videos and manually annotated for three primary instruments: vitrectomy cutter, light pipe, and peeling forceps. This dataset encompasses various surgical conditions, including occlusions, low-light environments, and reflections, ensuring robust model training and evaluation.

    The proposed algorithm leverages a customized YOLO framework and incorporates novel modules to enhance performance. The SGConv and RGSCSP modules were specifically designed to improve feature extraction capabilities, addressing the limitations of conventional convolutional layers by employing channel shuffling and re-parameterization techniques to maximize feature diversity and minimize parameter count. Additionally, a dynamic head architecture was implemented to integrate multi-scale, spatial, and task-specific attention mechanisms, enhancing the model's ability to capture complex features across varying scales. For bounding box regression, DeltaIoU loss was introduced as a refined metric that improves convergence speed and accuracy, particularly in ambiguous annotation scenarios.

    Extensive experiments on the RET1 dataset demonstrate that RM-YOLO achieves an mAP50-95 of 72.4%, outperforming existing models in precision and recall with only 7.4 million parameters and 20.7 GFLOPs. Comparative analysis with traditional and modern detection models, including Faster R-CNN, YOLO series, and RT-DETR, reveals that RM-YOLO not only achieves superior accuracy but also addresses the high rate of missed detections common in retinal microsurgery applications.

    The ablation studies underscore the contributions of each module, with dynamic head and RSGCSP modules providing significant boosts in model performance by enhancing the robustness of feature representation. DeltaIoU loss further complements these improvements by ensuring precise bounding box regression in challenging visual conditions.

  • 加载中
  • 图 1  RET1数据集中的三种主要手术器械。(a)玻璃体切割器;(b)内界膜剥离镊;(c)光导管

    Figure 1.  Three main surgical instruments in RET1 dataset. (a) Vitrectomy cutter; (b) Internal limiting membrane peeling forceps; (c) Light pipe

    图 2  提出的RM-YOLO网络模型整体架构

    Figure 2.  Overall network architecture of the proposed RM-YOLO model

    图 3  检测头中三种注意力机制的级联

    Figure 3.  Cascading of three attention mechanisms in the detection head

    图 4  提出的动态头部与解耦头部集成结构图

    Figure 4.  Integrated structure of the proposed dynamic head and decoupled head

    图 5  SGConv卷积层示意图

    Figure 5.  Diagram of the SGConv convolution layer

    图 6  RSGCSP特征提取模块

    Figure 6.  RSGCSP feature extraction modules

    图 7  重参数化部分的转化过程

    Figure 7.  Transformation process of the reparameterization part

    图 8  提出的DeltaIoU示意图

    Figure 8.  Schematic diagram of proposed DeltaIoU

    图 9  损失函数实验结果。(a)不同α对应的mAP;(b)不同IoU对应的边界框回归损失;(c)不同IoU对应的mAP

    Figure 9.  Results of the loss function experiments. (a) mAP at different values of α; (b) Bounding box regression loss corresponding to different IoUs; (c) mAP corresponding to different IoUs

    图 10  实验结果中三种主要手术器械的指标

    Figure 10.  The results of three main types of surgical instruments in the experiment

    图 11  复杂情况下不同算法对单个手术器械的检测结果。(a)器械严重反光;(b)显微镜成像严重畸变

    Figure 11.  Detection results of different algorithms for a single surgical instrument in complex scenarios. (a) Severe instrument reflection; (b) Severe distortion in microscope imaging

    图 12  复杂情况下不同算法对多个手术器械的检测结果。(a)器械相似度过高;(b)显微镜失焦;(c)器械之间相互遮挡

    Figure 12.  Detection results of different algorithms for multiple surgical instruments in complex scenarios. (a) High instrument similarity; (b) Microscope out of focus; (c) Occlusion between instruments

    图 13  提出算法在玻璃体切除术中不同光照条件下对手术器械的检测

    Figure 13.  Detection of surgical instruments during vitrectomy under different lighting conditions by the proposed algorithm

    表 1  实验环境配置

    Table 1.  Experimental environment configuration

    ConfigurationConfiguration parameters
    Operating systemWindows 11
    GPUNvidia Geforce RTX 4070 Super
    Programming languagePython 3.11
    FrameworkPytorch 2.1
    GPU computing frameworkCuda 12.1
    GPU acceleration libraryCudnn 8.0
    Learning rate0.001
    Momentum0.9
    Weight rate decay0.0005
    Batch size32
    Epochs300
    下载: 导出CSV

    表 2  消融实验结果

    Table 2.  Ablation experiments results

    Model Dynamic head RSGCSP(SGConv) DeltaIoU loss P R mAP50-95 FPS
    A 0.960 0.925 0.686 206
    B 0.966 0.923 0.706 128
    C 0.977 0.929 0.691 210
    D 0.970 0.928 0.680 216
    E 0.960 0.931 0.702 136
    F 0.980 0.926 0.707 198
    G 0.985 0.930 0.711 115
    H 0.975 0.941 0.724 143
    下载: 导出CSV

    表 3  对比实验结果

    Table 3.  Comparison experiments results

    ModelPRmAP50-95Parameters/MGFLOPsFPS
    Faster R-CNN0.9610.9190.652//85
    YOLOv3s[24]0.9810.9310.68715.3243.8147
    YOLOv5s0.9650.9270.6839.1123.8194
    YOLOv6s[25]0.9600.9140.68116.344.0192
    RT-DETR0.9640.8950.62310.5623.9131
    YOLOv9m[26]0.9520.9270.68520.0276.589
    YOLOv10s[27]0.9470.8760.6848.0424.5182
    DBH-YOLO0.9750.9180.64320.8647.9128
    RM-YOLO(ours)0.9750.9410.7247.420.7143
    下载: 导出CSV
  • [1]

    Ma L, Fei B W. Comprehensive review of surgical microscopes: technology development and medical applications[J]. J Biomed Opt, 2021, 26(1): 010901. doi: 10.1117/1.JBO.26.1.010901

    [2]

    Ehlers J P, Dupps W J, Kaiser P K, et al. The prospective intraoperative and perioperative ophthalmic ImagiNg with optical CoherEncE TomogRaphy (PIONEER) study: 2-year results[J]. Am J Ophthalmol, 2014, 158(5): 999−1007. e1 doi: 10.1016/j.ajo.2014.07.034

    [3]

    Ravasio C S, Pissas T, Bloch E, et al. Learned optical flow for intra-operative tracking of the retinal fundus[J]. Int J Comput Assist Radiol Surg, 2020, 15(5): 827−836. doi: 10.1007/s11548-020-02160-9

    [4]

    李云耀, 樊金宇, 蒋天亮, 等. 光学相干层析技术在眼科手术导航方面的研究进展[J]. 光电工程, 2023, 50(1): 220027. doi: 10.12086/oee.2023.220027

    Li Y Y, Fan J Y, Jiang T L, et al. Review of the development of optical coherence tomography imaging navigation technology in ophthalmic surgery[J]. Opto-Electron Eng, 2023, 50(1): 220027. doi: 10.12086/oee.2023.220027

    [5]

    杨建文, 黄江杰, 何益, 等. 线聚焦谱域光学相干层析成像的分段色散补偿像质优化方法[J]. 光电工程, 2024, 51(6): 240042. doi: 10.12086/oee.2024.240042

    Yang J W, Huang J J, He Y, et al. Image quality optimization of line-focused spectral domain optical coherence tomography with subsection dispersion compensation[J]. Opto-Electron Eng, 2024, 51(6): 240042. doi: 10.12086/oee.2024.240042

    [6]

    Bouget D, Allan M, Stoyanov D, et al. Vision-based and marker-less surgical tool detection and tracking: a review of the literature[J]. Med Image Anal, 2017, 35: 633−654. doi: 10.1016/j.media.2016.09.003

    [7]

    Allan M, Ourselin S, Thompson S, et al. Toward detection and localization of instruments in minimally invasive surgery[J]. IEEE Trans Biomed Eng, 2013, 60(4): 1050−1058. doi: 10.1109/TBME.2012.2229278

    [8]

    Alsheakhali M, Yigitsoy M, Eslami A, et al. Real time medical instrument detection and tracking in microsurgery[C]//Proceedings of the Algorithmen-Systeme-Anwendungen on Bildverarbeitung für die Medizin, Lübeck, 2015: 185–190. https://doi.org/10.1007/978-3-662-46224-9_33.

    [9]

    Sznitman R, Richa R, Taylor R H, et al. Unified detection and tracking of instruments during retinal microsurgery[J]. IEEE Trans Pattern Anal Mach Intell, 2013, 35(5): 1263−1273. doi: 10.1109/TPAMI.2012.209

    [10]

    Sun Y W, Pan B, Fu Y L. Lightweight deep neural network for articulated joint detection of surgical instrument in minimally invasive surgical robot[J]. J Digit Imaging, 2022, 35(4): 923−937. doi: 10.1007/s10278-022-00616-9

    [11]

    Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, 2015: 1440–1448. https://doi.org/10.1109/ICCV.2015.169.

    [12]

    Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137−1149. doi: 10.1109/TPAMI.2016.2577031

    [13]

    Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. https://doi.org/10.1109/CVPR.2016.91.

    [14]

    Sarikaya D, Corso J J, Guru K A. Detection and localization of robotic tools in robot-assisted surgery videos using deep neural networks for region proposal and detection[J]. IEEE Trans Med Imaging, 2017, 36(7): 1542−1549. doi: 10.1109/TMI.2017.2665671

    [15]

    Zhang B B, Wang S S, Dong L Y, et al. Surgical tools detection based on modulated anchoring network in laparoscopic videos[J]. IEEE Access, 2020, 8: 23748−23758. doi: 10.1109/ACCESS.2020.2969885

    [16]

    Pan X Y, Bi M R, Wang H, et al. DBH-YOLO: a surgical instrument detection method based on feature separation in laparoscopic surgery[J]. Int J Comput Assist Radiol Surg, 2024, 19(11): 2215−2225. doi: 10.1007/s11548-024-03115-0

    [17]

    Zhao Z J, Chen Z R, Voros S, et al. Real-time tracking of surgical instruments based on spatio-temporal context and deep learning[J]. Comput Assist Surg, 2019, 24(S1): 20−29. doi: 10.1080/24699322.2018.1560097

    [18]

    Dai X Y, Chen Y P, Xiao B, et al. Dynamic head: unifying object detection heads with attentions[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 2021: 7373–7382. https://doi.org/10.1109/CVPR46437.2021.00729.

    [19]

    Han K, Wang Y H, Tian Q, et al. GhostNet: more features from cheap operations[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 2020: 1580–1589. https://doi.org/10.1109/CVPR42600.2020.00165.

    [20]

    Li H L, Li J, Wei H B, et al. Slim-neck by GSConv: a lightweight-design for real-time detector architectures[J]. J Real Time Image Process, 2024, 21(3): 62. doi: 10.1007/s11554-024-01436-6

    [21]

    Ding X H, Zhang X Y, Ma N N, et al. RepVGG: making VGG-style ConvNets great again[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 2021: 13733–13742. https://doi.org/10.1109/CVPR46437.2021.01352.

    [22]

    Ma S L, Xu Y. MPDIoU: a loss for efficient and accurate bounding box regression[Z]. arXiv: 2307.07662, 2023. https://doi.org/10.48550/arXiv.2307.07662.

    [23]

    Zhao Y, Lv W Y, Xu S L, et al. DETRs beat YOLOs on real-time object detection[C]//Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 2024: 16965–16974. https://doi.org/10.1109/CVPR52733.2024.01605.

    [24]

    Redmon J, Farhadi A. YOLOv3: an incremental improvement[Z]. arXiv: 1804.02767, 2018. https://doi.org/10.48550/arXiv.1804.02767.

    [25]

    Li C Y, Li L L, Jiang H L, et al. YOLOv6: a single-stage object detection framework for industrial applications[Z]. arXiv: 2209.02976, 2022. https://doi.org/10.48550/arXiv.2209.02976.

    [26]

    Wang C Y, Yeh I H, Liao H Y M. YOLOv9: learning what you want to learn using programmable gradient information[C]//Proceedings of the 18th European Conference on Computer Vision, Milan, 2025. https://doi.org/10.1007/978-3-031-72751-1_1.

    [27]

    Wang A, Chen H, Liu L H, et al. YOLOv10: real-time end-to-end object detection[Z]. arXiv: 2405.14458, 2024. https://doi.org/10.48550/arXiv.2405.14458.

  • 加载中

(14)

(3)

计量
  • 文章访问数: 
  • PDF下载数: 
  • 施引文献:  0
出版历程
收稿日期:  2024-11-20
修回日期:  2025-01-14
录用日期:  2025-01-15
刊出日期:  2025-02-28

目录

/

返回文章
返回