Remove Caption

How to remove caption from the video data?

Task

Goal
- I want to remove caption area
- bbox, or non-rectangular mask only for the text area… type doesn’t matter but the later one is better
Issue
- prefer to use generalized pretrained model
- without human (without prompt)
- speed

First of all, optical character recognition would be a nice and trustful approach because it has been a long-lasting computer vision task.

Scene Text Detection (Localization): to get bbox area of the text in the image
- e.g., CRAFT https://github.com/clovaai/CRAFT-pytorch
Scene Text Recognition: I don’t need this part

These days, “Segment Anything” stuffs are trending issue in this field.

language segment anything link

pip install torch torchvision
pip install -U git+https://github.com/luca-medeiros/lang-segment-anything.git

Outputs: masks, boxes, phrases, logits.

References

Written on June 23, 2023