Remove Caption
How to remove caption from the video data?
Task
- Goal
- I want to remove caption area
- bbox, or non-rectangular mask only for the text area… type doesn’t matter but the later one is better
- Issue
- prefer to use generalized pretrained model
- without human (without prompt)
- speed
How?
OCR
First of all, optical character recognition would be a nice and trustful approach because it has been a long-lasting computer vision task.
- Scene Text Detection (Localization): to get bbox area of the text in the image
- e.g., CRAFT https://github.com/clovaai/CRAFT-pytorch
- Scene Text Recognition: I don’t need this part
Generalized Segmentation Tools
These days, “Segment Anything” stuffs are trending issue in this field.
- check the performance of SAM for masking text area
- survey on follow-up studies
Step 1. Let’s try SAM
language segment anything link
pip install torch torchvision
pip install -U git+https://github.com/luca-medeiros/lang-segment-anything.git
Outputs: masks, boxes, phrases, logits.
References
Written on June 23, 2023