# Edit-Banana **Repository Path**: corffee/Edit-Banana ## Basic Information - **Project Name**: Edit-Banana - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-02-25 - **Last Updated**: 2026-02-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

Edit Banana Logo

๐ŸŒ Edit Banana

Universal Content Re-Editor: Make the Uneditable, Editable

Break free from static formats. Our platform empowers you to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.

Python License CUDA WeChat GitHub stars

---

Try It Now!

Try Online Demo

๐Ÿ‘† Click above or https://editbanana.anxin6.cn/ to try Edit Banana online! Upload an image or pdf, get editable DrawIO (XML) or PPTX in seconds. Please note: Our GitHub repository currently trails behind our web-based service. For the most up-to-date features and performance, we recommend using our web platform.

## ๐Ÿ’ฌ Join WeChat Group Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:

WeChat Group QR Code
Scan to join the Edit Banana community

> ๐Ÿ’ก If the QR code has expired, please submit an [Issue](https://github.com/XiangjianYi/Image2DrawIO/issues) to request an updated one. --- ## ๐Ÿ“ธ Effect Demonstration ### High-Definition Input-Output Comparison (3 Typical Scenarios) To demonstrate the high-fidelity conversion effect, we provides one-to-one comparisons between 3 scenarios of "original static formats" and "editable reconstruction results". All elements can be individually dragged, styled, and modified. #### Scenario 1: Figures to Drawio(xml, svg, pptx) | Example No. | Original Static Diagram (Input ยท Non-editable) | DrawIO Reconstruction Result (Output ยท Fully Editable) | |--------------|-----------------------------------------------|--------------------------------------------------------| | Example 1: Basic Flowchart | Original Diagram 1 | Reconstruction Result 1 | | Example 2: Multi-level Architecture Diagram | Original Diagram 2 | Reconstruction Result 2 | | Example 3: Technical Schematic | Original Diagram 3 | Reconstruction Result 3 | | Example 4: Scientific Formula Diagram | Original Diagram 4 | Reconstruction Result 4 | #### Scenario 2: PDF to PPTX #### Scenario 3: Human in the Loop Modification > โœจ Conversion Highlights: > 1. Preserves the layout logic, color matching, and element hierarchy of the original diagram > 2. 1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness) > 3. Accurate text recognition, supporting direct subsequent editing and format adjustment > 4. All elements are independently selectable, supporting native DrawIO template replacement and layout optimization ## Key Features * **Advanced Segmentation**: Using our fine-tuned **SAM 3 (Segment Anything Model 3)** for segmentation of diagram elements. * **Fixed Multi-Round VLM Scanning**: An extraction process guided by **Multimodal LLMs (Qwen-VL/GPT-4V)**. * **High-Quality OCR**: * **Azure Document Intelligence** for precise text localization. * **Fallback Mechanism**: Automatically switches to VLM-based end-to-end OCR if Azure services are unreachable. * **Mistral Vision/MLLM** for correcting text and converting mathematical formulas to **LaTeX** ($\int f(x) dx$). * **Crop-Guided Strategy**: Extracts text/formula regions and sends high-res crops to LLMs for pixel-perfect recognition. * **User System**: * **Registration**: New users receive **10 free credits**. * **Credit System**: Pay-per-use model prevents resource abuse. * **Multi-User Concurrency**: Built-in support for concurrent user sessions using a **Global Lock** mechanism for thread-safe GPU access and an **LRU Cache** (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability. * **Web Interface**: A React-based frontend + FastAPI backend for easy uploading and editing. ## Architecture Pipeline 1. **Input**: Image (PNG/JPG) or PDF. 2. **Segmentation (SAM3)**: Using our fine-tuned SAM3 mask decoder. 4. **Text Extraction (Parallel)**: * Azure OCR detects text bounding boxes. * High-res crops of text regions are sent to Mistral/LLM. * Latex conversion for formulas. 5. **XML/PPTX Generation**: Merging spatial data from our fine-tuned SAM3 and Text OCR. ## Project Structure ``` sam3_workflow/ โ”œโ”€โ”€ config/ # Configuration files โ”œโ”€โ”€ flowchart_text/ # OCR & Text Extraction Module โ”‚ โ”œโ”€โ”€ src/ # OCR Source Code (Azure, Mistral, Alignment) โ”‚ โ””โ”€โ”€ main.py # OCR Entry point โ”œโ”€โ”€ frontend/ # React Web Application โ”œโ”€โ”€ input/ # [Manual] Input images directory โ”œโ”€โ”€ models/ # [Manual] Model weights (SAM3) โ”œโ”€โ”€ output/ # [Manual] Results directory โ”œโ”€โ”€ sam3/ # SAM3 Model Library โ”œโ”€โ”€ scripts/ # Utility Scripts โ”‚ โ””โ”€โ”€ merge_xml.py # XML Merging & Orchestration โ”œโ”€โ”€ main.py # CLI Entry point (Modular Pipeline) โ”œโ”€โ”€ server_pa.py # FastAPI Backend Server (Service-based) โ””โ”€โ”€ requirements.txt # Python dependencies ``` ## Installation & Setup Follow these steps to set up the project locally. ### 1. Prerequisites * **Python 3.10+** * **Node.js & npm** (for the frontend) * **CUDA-capable GPU** (Highly recommended) ### 2. Clone Repository ```bash git clone https://github.com/BIT-DataLab/Edit-Banana.git cd Image2DrawIO ``` ### 3. Initialize Directory Structure After cloning, you must **manually create** the following resource directories (ignored by Git): ```bash # Create input/output directories mkdir -p input mkdir -p output mkdir -p sam3_output ``` ### 4. Download Model Weights Download the required models and place them in the correct paths: | Model | Download | Target Path | | :--- | :--- | :--- | | **SAM 3** | https://modelscope.cn/models/facebook/sam3 | `models/sam3.pt` (or as configured) | > **Note**: For SAM 3 (or the specific segmentation checkpoint used), place the `.pt` file in `models/` and update `config.yaml`. ### 5. Install Dependencies **Backend:** ```bash pip install -r requirements.txt ``` **Frontend:** ```bash cd frontend npm install cd .. ``` ### 6. Configuration 1. **Config File**: Copy the example config. ```bash cp config/config.yaml.example config/config.yaml ``` 2. **Environment Variables**: Create a `.env` file in the root directory. ```env AZURE_ENDPOINT=your_azure_endpoint AZURE_API_KEY=your_azure_key # Add other keys as needed ``` ## Usage ### 1. Web Interface (Recommended) Start the Backend: ```bash python server_pa.py # Server runs at http://localhost:8000 ``` Start the Frontend: ```bash cd frontend npm install npm run dev # Frontend runs at http://localhost:5173 ``` Open your browser, upload an image, and view the result in the embedded DrawIO editor. ### 2. Command Line Interface (CLI) To process a single image: ```bash python main.py -i input/test_diagram.png ``` The output XML will be saved in the `output/` directory. ## Configuration `config.yaml` Customize the pipeline behavior in `config/config.yaml`: * **sam3**: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops. * **paths**: Set input/output directories. * **dominant_color**: Fine-tune color extraction sensitivity. ## ๐Ÿ“Œ Development Roadmap | Feature Module | Status | Description | |--------------------------|--------------|---------------------------------| | Core Conversion Pipeline | โœ… Completed | Full pipeline of segmentation, reconstruction and OCR | | Intelligent Arrow Connection | โš ๏ธ In Development | Automatically associate arrows with target shapes | | DrawIO Template Adaptation | ๐Ÿ“ Planned | Support custom template import | | Batch Export Optimization | ๐Ÿ“ Planned | Batch export to DrawIO files (.drawio) | | Local LLM Adaptation | ๐Ÿ“ Planned | Support local VLM deployment, independent of APIs | ## ๐Ÿค Contribution Guidelines Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions): 1. Fork this repository 2. Create a feature branch (`git checkout -b feature/xxx`) 3. Commit your changes (`git commit -m 'feat: add xxx'`) 4. Push to the branch (`git push origin feature/xxx`) 5. Open a Pull Request Bug Reports: [Issues](https://github.com/XiangjianYi/Image2DrawIO/issues) Feature Suggestions: [Discussions](https://github.com/XiangjianYi/Image2DrawIO/discussions) ## ๐Ÿคฉ Contributors Thanks to all developers who have contributed to the project and promoted its iteration! | Name/ID | Email | |---------|-------| | Chai Chengliang | ccl@bit.edu.cn | | Zhang Chi | zc315@bit.edu.cn | | Deng Qiyan | | | Rao Sijing | | | Yi Xiangjian | | | Li Jianhui | | | Shen Chaoyuan | | | Zhang Junkai | | | Han Junyi | | | You Zirui | | | Xu Haochen | | | An Minghao | | | Yu Mingjie | | | Yu Xinjiang| | | Chen Zhuofan| | | Li Xiangkun| | ## ๐Ÿ“„ License This project is open-source under the [Apache License 2.0](LICENSE), allowing commercial use and secondary development (with copyright notice retained). --- ## ๐ŸŒŸ Star History ๐ŸŒŸ If this project helps you, please star it to show your support! ![Star History Chart](https://api.star-history.com/svg?repos=bit-datalab/edit-banana&type=date&legend=top-left)(https://www.star-history.com/#bit-datalab/edit-banana&type=date&legend=top-left)