Laryngeal endoscopic image analysis with segmentation techniques has great potential in detecting various diseases in the glottic area, which is essential for early diagnosis and proper treatment. This study proposes developing the U-Net architecture by integrating the VGG-16 model, aiming to improve the accuracy in detecting glottic areas. VGG-16 is applied to the encoder and bridge sections so that the model can take advantage of previously learned knowledge. This modification is expected to improve segmentation performance compared to standard U-Net, especially in handling variations in laryngeal image complexity. The dataset used consisted of 1,200 images taken randomly from the BAGLS website, a collection of laryngeal endoscopic image data rich in variation. The training results show that the standard U-Net produces an accuracy of 0.9995, IoU 0.6744, and DSC 0.7814. The improved U-Net showed a significant performance improvement, with an accuracy of 0.9998, an IoU of 0.8223, and a DSC of 0.9153. This improvement confirms that modifying the U-Net architecture using VGG-16 provides superior results in detecting glottic areas precisely. VGG-16 also helps model performance in overcoming the problem of smaller datasets. In addition, both models were tested using relevant evaluation metrics, and the test results showed that the improved U-Net consistently outperformed other CNN-based segmentation methods. These advantages show that the proposed approach improves accuracy and contributes significantly to developing glottic disease detection methods through laryngeal endoscopic image analysis, which can ultimately support clinical practice in detecting abnormalities in glottis more effectively.