# MINIMA **Repository Path**: weiwei16/MINIMA ## Basic Information - **Project Name**: MINIMA - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-02-13 - **Last Updated**: 2026-02-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

MINIMA: Modality Invariant Image Matching

[Jiangwei Ren](https://github.com/LSXI7)¹, [Xingyu Jiang](https://scholar.google.com/citations?user=h2W90MQAAAAJ&hl=en&oi=ao)^1†, [Zizhuo Li](https://scholar.google.com/citations?user=bxuEALEAAAAJ&hl=en&oi=ao)², [Dingkang Liang](https://dk-liang.github.io/)¹, [Xin Zhou](https://lmd0311.github.io/)¹ and [Xiang Bai](https://scholar.google.com/citations?user=UeltiQ4AAAAJ&hl=en)¹ ¹ Huazhong University of Science & Technology, ² Wuhan University. (†) Corresponding author.

[//]: # (This repository represents the official implementation of the paper titled "MINIMA: Modality Invariant Image Matching".)

## 📣 News - **[19/Apr/2025]** [MINIMA Training Code](./train_orders/README.md) is released. - **[14/Mar/2025]** [MINIMA Data Engine](./data_engine/README.md) is released. - **[27/Feb/2025]** Our MINIMA is accepted to CVPR 2025. - **[27/Dec/2024]** [Arvix version](https://arxiv.org/abs/2412.19412) is released. - **[26/Dec/2024]** Release the code and checkpoint. ## Abstract Image matching for both cross-view and cross-modality plays a critical role in multimodal perception. In practice, the modality gap caused by different imaging systems/styles poses great challenges to the matching task. Existing works try to extract invariant features for specific modalities and train on limited datasets, showing poor generalization. In this paper, we present MINIMA, a unified image matching framework for multiple cross-modal cases. Without pursuing fancy modules, our MINIMA aims to enhance universal performance from the perspective of data scaling up. For such purpose, we propose a simple yet effective data engine that can freely produce a large dataset containing multiple modalities, rich scenarios, and accurate matching labels. Specifically, we scale up the modalities from cheap but rich RGB-only matching data, by means of generative models. Under this setting, the matching labels and rich diversity of the RGB dataset are well inherited by the generated multimodal data. Benefiting from this, we construct MD-syn, a new comprehensive dataset that fills the data gap for general multimodal image matching. With MD-syn, we can directly train any advanced matching pipeline on randomly selected modality pairs to obtain cross-modal ability. Extensive experiments on in-domain and zero-shot matching tasks, including 19 cross-modal cases, demonstrate that our MINIMA can significantly outperform the baselines and even surpass modality-specific methods.

## Our Framework

## Online Demo * Visit our demo

, and test our MINIMA models. * Special thanks to [Hugging Face](https://huggingface.co/) for providing the ZeroGPU support for this project! ## Full MegaDepth-Syn Dataset * The MegaDepth-Syn Dataset is generated from the [MegaDepth dataset](https://www.cs.cornell.edu/projects/megadepth/) using our MINIMA data engine, which contains for extra 6 modalities: infrared, depth, event, normal, sketch, and paint. * Full dataset is released in

. * You can download the dataset using the following command: ```bash pip install openxlab --no-dependencies #Install openxlab login # Log in and enter the corresponding AK/SK. Please view AK/SK at usercenter openxlab dataset info --dataset-repo lsxi7/MINIMA # Dataset information viewing and View Dataset File List openxlab dataset get --dataset-repo lsxi7/MINIMA #Dataset download openxlab dataset download --dataset-repo lsxi7/MINIMA --source-path /README.md --target-path /path/to/local/folder #Dataset file download ``` And more details can be found in

. * The MegaDepth-Syn Dataset is now uploaded to [Hugging Face](https://huggingface.co/datasets/lsxi77777/MegaDepth-Syn). ## MINIMA Multi-Modal Data Engine

See [Data Engine](./data_engine/README.md) for more details. ## Weights Download * We provide our `minima_lightglue`,`minima_loftr`,`minima_roma`,`minima_eloftr` and `minima_xoftr` model weights in [Google Drive](https://drive.google.com/drive/folders/16kZfehtXeIu6fJjUoYDzYb-3FkZBpkNd?usp=sharing). * Also we provide github link for the weights: [minima_lightglue](https://github.com/LSXI7/storage/releases/download/MINIMA/minima_lightglue.pth), [minima_loftr](https://github.com/LSXI7/storage/releases/download/MINIMA/minima_loftr.ckpt), [minima_roma](https://github.com/LSXI7/storage/releases/download/MINIMA/minima_roma.pth), [minima_eloftr](https://github.com/LSXI7/storage/releases/download/MINIMA/minima_eloftr.ckpt) and [minima_xoftr](https://github.com/LSXI7/storage/releases/download/MINIMA/minima_xoftr.ckpt). * Please download the weights files and put it in the `weights` folder. * Or you can directly run: ```bash bash weights/download.sh ``` ## Data Preparation for Evaluation We are grateful to the authors for their contribution of the testing datasets of the real multimodal scenarios.

MegaDepth-1500-Syn

We provide a bash command to download the dataset and organize the MegaDepth-1500-Syn dataset directly: ```bash bash data/test_data_preparation.sh ``` Additional, please download the original [megadepth-1500](https://drive.google.com/drive/folders/1nTkK1485FuwqA0DbZrK2Cl0WnXadUZdc), and run: ```bash tar xf megadepth_test_1500.tar ln -s /path/to/megadepth_test_1500/Undistorted_SfM /path/to/MINMA/data/megadepth/test ```

RGB-Infrared Test Dataset

The METU-VisTIR dataset comes from [XoFTR](https://github.com/OnderT/XoFTR?tab=readme-ov-file), and is available at its official [Google Drive](https://drive.google.com/file/d/1Sj_vxj-GXvDQIMSg-ZUJR0vHBLIeDrLg/view). For more information, please refer to the [XoFTR](https://github.com/OnderT/XoFTR?tab=readme-ov-file).

MMIM Test Dataset

MMIM Dataset is sourced from [Multi-modality-image-matching-database-metrics-methods](https://github.com/StaRainJ/Multi-modality-image-matching-database-metrics-methods). We prepare necessary JSON files with Multi-modality-image-matching-database-metrics-methods.zip file, located in the data directory. . To set up the MMIM test dataset, please follow these steps: ```bash cd data git clone https://github.com/StaRainJ/Multi-modality-image-matching-database-metrics-methods.git unzip -o Multi-modality-image-matching-database-metrics-methods.zip ```

RGB-Depth Test Dataset

The Depth Dataset comes from the [DIODE](https://diode-dataset.org/) dataset. You can directly download the dataset from its official [Amazon Web Service](http://diode-dataset.s3.amazonaws.com/val.tar.gz) or [Baidu Cloud Storage](https://pan.baidu.com/s/18IoX7f9W3F7acP0hjl7NSA).

RGB-Event Test Dataset

The aligned RGB-Event test dataset is generated from [DSEC](https://dsec.ifi.uzh.ch/). Our test data can be downloaded from [Google Drive](https://drive.google.com/drive/folders/1rYKwI4Jmw1WAw_zRfHndph8AgyHZqdss?usp=sharing).

### Data Structure

Organizing the Dataset

We recommend organizing the datasets in the following folder structure: ``` data/ ├── METU-VisTIR/ │ ├── index/ │ └── ... ├── Multi-modality-image-matching-database-metrics-methods/ │ ├── Multimodal_Image_Matching_Datasets/ │ └── ... ├── megadepth/ │ ├── train/[modality]/Undistorted_SfM/ │ └── test/Undistorted_SfM/ # MegaDepth-1500 └── DIODE/ │ └── val/ └── DSEC/ ├── vent_list.txt ├── thun_01_a/ └── ... ```

# Installation and Environment Setup * Clone the repository: ```bash git https://github.com/LSXI7/MINIMA.git cd MINIMA conda env create -f environment.yaml conda activate minima ``` * Initialize the external submodule dependencies with: ```bash git submodule update --init --recursive git submodule update --recursive --remote sed -i '1s/^/from typing import Tuple as tuple\n/' third_party/RoMa_minima/romatch/models/model_zoo/__init__.py ``` * Run demo code after downloading the [weights](#weights-download): ```bash python demo.py --method sp_lg --fig1 demo/vis_test.png --fig2 demo/depth_test.png --save_dir ./demo ``` # Multimodal Image Matching Evaluation We provide the multi-modality image matching benchmark commands for our MINIMA models. Choose the method from `sp_lg`, `loftr`, `roma` and `xoftr` for the multimodal evaluation. ### Test on Real Multimodal Datasets ```bash python test_relative_pose_infrared.py --method <--ckpt model_path> <--save_figs> <--save_dir save_dir> # Infrared-RGB python test_relative_homo_depth.py --method <--ckpt model_path> <--save_figs> <--save_dir save_dir> # Depth-RGB python test_relative_homo_event.py --method <--ckpt model_path> <--save_figs> <--save_dir save_dir> # Event-RGB # --choose_model: 0 for medical test, 1 for remote sensing test python test_relative_homo_mmim.py --method <--ckpt model_path> --choose_model 0/1 <--save_figs> <--save_dir save_dir> ``` ### Test on MD-syn Dataset ```bash python test_relative_pose_mega_1500_syn.py --method <--ckpt ckpt> --multi_model <--save_figs> <--save_dir save_dir> # --modality: Choose from [infrared, depth, event, normal, sketch, paint] ``` ### Test on Origin MegaDepth-1500 Dataset ```bash python test_relative_pose_mega_1500.py --method <--ckpt model_path> <--save_figs> <--save_dir save_dir> ``` Note: By default, the checkpoint is initialized from the MINIMA models in the `weights` folder, and you can specify a custom checkpoint using the `--ckpt` argument. # Training See [Training](./train_orders/README.md) for details. ## TODO List - [x] MD-Syn Full Dataset - [x] Real Multimodal Evaluation Benchmark - [x] Synthetic Multimodal Evaluation Benchmark - [x] Training Code - [x] Our MINIMA Data Engine for Multimodal Data Generation - [ ] More Modalities Addition ## Acknowledgement We sincerely thank the [SuperPoint](https://github.com/magicleap/SuperPointPretrainedNetwork), [LightGlue](https://github.com/cvg/LightGlue), [Glue Factory](https://github.com/cvg/glue-factory), [LoFTR](https://github.com/zju3dv/LoFTR), [RoMa](https://github.com/Parskatt/RoMa) for their contribution of methodological development. Additionally, we appreciate the support of [MegaDepth](https://www.cs.cornell.edu/projects/megadepth/) and [SCEPTER](https://github.com/modelscope/scepter), [Depth-Anything-V2](https://github.com/DepthAnything/Depth-Anything-V2), [DSINE](https://github.com/baegwangbin/DSINE), [PaintTransformer](https://github.com/Huage001/PaintTransformer), [Anime2Sketch](https://github.com/Mukosame/Anime2Sketch) for their role in data generation. ## Citation If you find our work useful in your research, please consider giving a star ⭐ and a citation ```bibtex @inproceedings{ren2025minima, title={MINIMA: Modality Invariant Image Matching}, author={Ren, Jiangwei and Jiang, Xingyu and Li, Zizhuo and Liang, Dingkang and Zhou, Xin and Bai, Xiang}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year={2025} } ``` ## License This repository is under the [Apache-2.0 license](./LICENSE). `minima_lightglue` uses `SuperPoint` from as a feature extractor. `SuperPoint` follows a different, restrictive license for academic or non-commercial use only. See the [license here](https://github.com/magicleap/SuperPointPretrainedNetwork/blob/master/LICENSE) and its [inference file](https://github.com/cvg/LightGlue/blob/main/lightglue/superpoint.py) for details. Please review and comply with its license if you intend to use this component.