# MINIMA
**Repository Path**: weiwei16/MINIMA
## Basic Information
- **Project Name**: MINIMA
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-02-13
- **Last Updated**: 2026-02-13
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
MINIMA: Modality Invariant Image Matching
[Jiangwei Ren](https://github.com/LSXI7)1,
[Xingyu Jiang](https://scholar.google.com/citations?user=h2W90MQAAAAJ&hl=en&oi=ao)1†,
[Zizhuo Li](https://scholar.google.com/citations?user=bxuEALEAAAAJ&hl=en&oi=ao)2,
[Dingkang Liang](https://dk-liang.github.io/)1,
[Xin Zhou](https://lmd0311.github.io/)1
and [Xiang Bai](https://scholar.google.com/citations?user=UeltiQ4AAAAJ&hl=en)1
1 Huazhong University of Science & Technology, 2 Wuhan University.
(†) Corresponding author.
[//]: # (This repository represents the official implementation of the paper titled "MINIMA: Modality Invariant Image Matching".)
## 📣 News
- **[19/Apr/2025]** [MINIMA Training Code](./train_orders/README.md) is released.
- **[14/Mar/2025]** [MINIMA Data Engine](./data_engine/README.md) is released.
- **[27/Feb/2025]** Our MINIMA is accepted to CVPR 2025.
- **[27/Dec/2024]** [Arvix version](https://arxiv.org/abs/2412.19412) is released.
- **[26/Dec/2024]** Release the code and checkpoint.
## Abstract
Image matching for both cross-view and cross-modality plays a critical role in multimodal perception. In practice, the
modality gap caused by different imaging systems/styles poses great challenges to the matching task. Existing works try
to extract invariant features for specific modalities and train on limited datasets, showing poor generalization. In
this paper, we present MINIMA, a unified image matching framework for multiple cross-modal cases. Without pursuing fancy
modules, our MINIMA aims to enhance universal performance from the perspective of data scaling up. For such purpose, we
propose a simple yet effective data engine that can freely produce a large dataset containing multiple modalities, rich
scenarios, and accurate matching labels. Specifically, we scale up the modalities from cheap but rich RGB-only matching
data, by means of generative models. Under this setting, the matching labels and rich diversity of the RGB dataset are
well inherited by the generated multimodal data. Benefiting from this, we construct MD-syn, a new comprehensive dataset
that fills the data gap for general multimodal image matching. With MD-syn, we can directly train any advanced matching
pipeline on randomly selected modality pairs to obtain cross-modal ability. Extensive experiments on in-domain and
zero-shot matching tasks, including 19 cross-modal cases, demonstrate that our MINIMA can significantly outperform the
baselines and even surpass modality-specific methods.
## Our Framework
## Online Demo
* Visit our demo
,
and test our MINIMA models.
* Special thanks to [Hugging Face](https://huggingface.co/) for providing the ZeroGPU support for this project!
## Full MegaDepth-Syn Dataset
* The MegaDepth-Syn Dataset is generated from the [MegaDepth dataset](https://www.cs.cornell.edu/projects/megadepth/)
using our
MINIMA data engine, which contains for extra 6 modalities: infrared, depth, event, normal, sketch, and paint.
* Full dataset is released
in
.
* You can download the dataset using the following command:
```bash
pip install openxlab --no-dependencies #Install
openxlab login # Log in and enter the corresponding AK/SK. Please view AK/SK at usercenter
openxlab dataset info --dataset-repo lsxi7/MINIMA # Dataset information viewing and View Dataset File List
openxlab dataset get --dataset-repo lsxi7/MINIMA #Dataset download
openxlab dataset download --dataset-repo lsxi7/MINIMA --source-path /README.md --target-path /path/to/local/folder #Dataset file download
```
And more details can be found
in
.
* The MegaDepth-Syn Dataset is now uploaded to [Hugging Face](https://huggingface.co/datasets/lsxi77777/MegaDepth-Syn).
## MINIMA Multi-Modal Data Engine
See [Data Engine](./data_engine/README.md) for more details.
## Weights Download
* We provide our `minima_lightglue`,`minima_loftr`,`minima_roma`,`minima_eloftr` and `minima_xoftr` model weights
in [Google Drive](https://drive.google.com/drive/folders/16kZfehtXeIu6fJjUoYDzYb-3FkZBpkNd?usp=sharing).
* Also we provide github link for the
weights: [minima_lightglue](https://github.com/LSXI7/storage/releases/download/MINIMA/minima_lightglue.pth),
[minima_loftr](https://github.com/LSXI7/storage/releases/download/MINIMA/minima_loftr.ckpt),
[minima_roma](https://github.com/LSXI7/storage/releases/download/MINIMA/minima_roma.pth),
[minima_eloftr](https://github.com/LSXI7/storage/releases/download/MINIMA/minima_eloftr.ckpt) and
[minima_xoftr](https://github.com/LSXI7/storage/releases/download/MINIMA/minima_xoftr.ckpt).
* Please download the weights files and put it in the `weights` folder.
* Or you can directly run:
```bash
bash weights/download.sh
```
## Data Preparation for Evaluation
We are grateful to the authors for their contribution of the testing datasets of the real multimodal scenarios.
MegaDepth-1500-Syn
We provide a bash command to download the dataset and organize the MegaDepth-1500-Syn dataset directly:
```bash
bash data/test_data_preparation.sh
```
Additional, please download the
original [megadepth-1500](https://drive.google.com/drive/folders/1nTkK1485FuwqA0DbZrK2Cl0WnXadUZdc), and run:
```bash
tar xf megadepth_test_1500.tar
ln -s /path/to/megadepth_test_1500/Undistorted_SfM /path/to/MINMA/data/megadepth/test
```
RGB-Infrared Test Dataset
The METU-VisTIR dataset comes from [XoFTR](https://github.com/OnderT/XoFTR?tab=readme-ov-file), and is available
at its official [Google Drive](https://drive.google.com/file/d/1Sj_vxj-GXvDQIMSg-ZUJR0vHBLIeDrLg/view).
For more information, please refer to the [XoFTR](https://github.com/OnderT/XoFTR?tab=readme-ov-file).
MMIM Test Dataset
MMIM Dataset is sourced
from [Multi-modality-image-matching-database-metrics-methods](https://github.com/StaRainJ/Multi-modality-image-matching-database-metrics-methods).
We prepare necessary JSON files with Multi-modality-image-matching-database-metrics-methods.zip file, located in the
data directory.
.
To set up the
MMIM test dataset, please follow these steps:
```bash
cd data
git clone https://github.com/StaRainJ/Multi-modality-image-matching-database-metrics-methods.git
unzip -o Multi-modality-image-matching-database-metrics-methods.zip
```
RGB-Depth Test Dataset
The Depth Dataset comes from the [DIODE](https://diode-dataset.org/) dataset.
You can directly download the dataset from its
official [Amazon Web Service](http://diode-dataset.s3.amazonaws.com/val.tar.gz)
or [Baidu Cloud Storage](https://pan.baidu.com/s/18IoX7f9W3F7acP0hjl7NSA).
RGB-Event Test Dataset
The aligned RGB-Event test dataset is generated from [DSEC](https://dsec.ifi.uzh.ch/).
Our test data can be downloaded
from [Google Drive](https://drive.google.com/drive/folders/1rYKwI4Jmw1WAw_zRfHndph8AgyHZqdss?usp=sharing).
### Data Structure
Organizing the Dataset
We recommend organizing the datasets in the following folder structure:
```
data/
├── METU-VisTIR/
│ ├── index/
│ └── ...
├── Multi-modality-image-matching-database-metrics-methods/
│ ├── Multimodal_Image_Matching_Datasets/
│ └── ...
├── megadepth/
│ ├── train/[modality]/Undistorted_SfM/
│ └── test/Undistorted_SfM/ # MegaDepth-1500
└── DIODE/
│ └── val/
└── DSEC/
├── vent_list.txt
├── thun_01_a/
└── ...
```
# Installation and Environment Setup
* Clone the repository:
```bash
git https://github.com/LSXI7/MINIMA.git
cd MINIMA
conda env create -f environment.yaml
conda activate minima
```
* Initialize the external submodule dependencies with:
```bash
git submodule update --init --recursive
git submodule update --recursive --remote
sed -i '1s/^/from typing import Tuple as tuple\n/' third_party/RoMa_minima/romatch/models/model_zoo/__init__.py
```
* Run demo code after downloading the [weights](#weights-download):
```bash
python demo.py --method sp_lg --fig1 demo/vis_test.png --fig2 demo/depth_test.png --save_dir ./demo
```
# Multimodal Image Matching Evaluation
We provide the multi-modality image matching benchmark commands for our MINIMA models.
Choose the method from `sp_lg`, `loftr`, `roma` and `xoftr` for the multimodal evaluation.
### Test on Real Multimodal Datasets
```bash
python test_relative_pose_infrared.py --method <--ckpt model_path> <--save_figs> <--save_dir save_dir> # Infrared-RGB
python test_relative_homo_depth.py --method <--ckpt model_path> <--save_figs> <--save_dir save_dir> # Depth-RGB
python test_relative_homo_event.py --method <--ckpt model_path> <--save_figs> <--save_dir save_dir> # Event-RGB
# --choose_model: 0 for medical test, 1 for remote sensing test
python test_relative_homo_mmim.py --method <--ckpt model_path> --choose_model 0/1 <--save_figs> <--save_dir save_dir>
```
### Test on MD-syn Dataset
```bash
python test_relative_pose_mega_1500_syn.py --method <--ckpt ckpt> --multi_model <--save_figs> <--save_dir save_dir>
# --modality: Choose from [infrared, depth, event, normal, sketch, paint]
```
### Test on Origin MegaDepth-1500 Dataset
```bash
python test_relative_pose_mega_1500.py --method <--ckpt model_path> <--save_figs> <--save_dir save_dir>
```
Note: By default, the checkpoint is initialized from the MINIMA models in the `weights` folder, and you can specify a
custom checkpoint using the `--ckpt` argument.
# Training
See [Training](./train_orders/README.md) for details.
## TODO List
- [x] MD-Syn Full Dataset
- [x] Real Multimodal Evaluation Benchmark
- [x] Synthetic Multimodal Evaluation Benchmark
- [x] Training Code
- [x] Our MINIMA Data Engine for Multimodal Data Generation
- [ ] More Modalities Addition
## Acknowledgement
We sincerely thank the
[SuperPoint](https://github.com/magicleap/SuperPointPretrainedNetwork),
[LightGlue](https://github.com/cvg/LightGlue),
[Glue Factory](https://github.com/cvg/glue-factory),
[LoFTR](https://github.com/zju3dv/LoFTR),
[RoMa](https://github.com/Parskatt/RoMa)
for their contribution of methodological development.
Additionally, we appreciate the support of [MegaDepth](https://www.cs.cornell.edu/projects/megadepth/) and
[SCEPTER](https://github.com/modelscope/scepter),
[Depth-Anything-V2](https://github.com/DepthAnything/Depth-Anything-V2),
[DSINE](https://github.com/baegwangbin/DSINE),
[PaintTransformer](https://github.com/Huage001/PaintTransformer),
[Anime2Sketch](https://github.com/Mukosame/Anime2Sketch) for their role in data generation.
## Citation
If you find our work useful in your research, please consider giving a star ⭐ and a citation
```bibtex
@inproceedings{ren2025minima,
title={MINIMA: Modality Invariant Image Matching},
author={Ren, Jiangwei and Jiang, Xingyu and Li, Zizhuo and Liang, Dingkang and Zhou, Xin and Bai, Xiang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}
```
## License
This repository is under the [Apache-2.0 license](./LICENSE).
`minima_lightglue` uses `SuperPoint` from as a feature extractor. `SuperPoint` follows a different, restrictive license
for academic or non-commercial use only. See
the [license here](https://github.com/magicleap/SuperPointPretrainedNetwork/blob/master/LICENSE) and
its [inference file](https://github.com/cvg/LightGlue/blob/main/lightglue/superpoint.py) for details.
Please review and comply with its license if you intend to use this component.