Transformer with OCR — From molecule to manga
Original Source Here I have recently joined Kaggle competition, Bristol-Myers Squibb — Molecular Translation (BMS competition) . Unfortunately, I missed solo gold, but I could get some interesting findings, which are also generally useful for common OCR tasks. I would like to share them on this post by taking up Manga OCR as a subject. About BMS competition In BMS competition, participants predict InChI text, which is uniquely defined for each molecule, from printed molecule image. Predict InChI from Image — BMS competition Some people call it as image captioning. Others call it as OCR. Anyway, we predict a sequence from an image. Top solutions are listed here . The most commonly used architecture is a typical transformer encoder-decoder model, which is a kind of combination of Vision Transformer and BART . It seems that Swin Transformer has shown good performance as an encoder. Vision Transformer + Autoregressive Decoder General OCR and Deep Learning The typical e...