Remove Caption

How to remove caption from the video data?

Task

  • Goal
    • I want to remove caption area
    • bbox, or non-rectangular mask only for the text area… type doesn’t matter but the later one is better
  • Issue
    • prefer to use generalized pretrained model
    • without human (without prompt)
    • speed

How?

OCR

First of all, optical character recognition would be a nice and trustful approach because it has been a long-lasting computer vision task.

Generalized Segmentation Tools

These days, “Segment Anything” stuffs are trending issue in this field.

  • check the performance of SAM for masking text area
  • survey on follow-up studies

Step 1. Let’s try SAM

language segment anything link

pip install torch torchvision
pip install -U git+https://github.com/luca-medeiros/lang-segment-anything.git

Outputs: masks, boxes, phrases, logits. image

References

Written on June 23, 2023