This paper addresses the video summarization problem. For the given video goal is to find the subset of frames that capture the important events of the input video and produce a small concise summary. We formulate video summarization as a sequence labeling problem, where for a given input video a subset of frames are selected as a summary video. Based on the principle of semantic segmentation, here each pixel within a frame is assigned to one of the labels, where each frame is assigned a binary label indicating whether it will be included in the summary video or not. We propose a SegNet sequence network (SegNetSN) for video summarization and further extend the work by applying various feature fusion techniques to enhance the input. We performed experiments on the benchmark dataset TVSum.