Abstract: Multi-modal relation extraction (MRE) aims to extract semantic relations between two textual entities with the help of visual information. Existing studies typically leverage visual ...