# rusty-pageindex **Repository Path**: daoos_admin/rusty-pageindex ## Basic Information - **Project Name**: rusty-pageindex - **Description**: No description available - **Primary Language**: Rust - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-26 - **Last Updated**: 2026-01-26 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ๐Ÿฆ€ RustyPageIndex ![Rusty Page Indexer Cover](assets/cover.png) **RustyPageIndex** is a high-performance Rust implementation of the PageIndex pattern. It transforms complex documents into hierarchical "Table-of-Contents" (TOC) trees for vectorless, reasoning-based RAG. This project is inspired by [VectifyAI/PageIndex](https://github.com/VectifyAI/PageIndex) but has diverged significantly with multi-repo support, parallel processing, and a unified tree architecture. ## ๐Ÿš€ Key Features ### Java Integration - **Complete Configuration API**: Set LLM API keys, models, and endpoints from Java - **Multiple Providers**: Support for OpenAI, Ollama, and custom LLM APIs - **Configuration Persistence**: Save and load configuration files programmatically - **Native Performance**: Rust core with Java convenience ### Performance - **Parallel Indexing**: Uses Rayon for parallel file parsing (238 files in ~0.04s) - **Rust-Native Parsing**: `pdf-extract` and `pulldown-cmark` for fast document processing - **Incremental Updates**: Hash-based caching skips unchanged files ### Multi-Repository Support - **Index multiple repos**: Each indexed folder is tracked separately - **Query across all**: Search spans all indexed repositories by default - **Manage indices**: List, filter, and clean up indices easily ### Unified Tree Architecture - **Folder โ†’ File โ†’ Section** hierarchy preserves document structure - **Single tree per repo**: Efficient storage and navigation - **Smart search**: Auto-unwraps folder roots for better LLM context --- ## ๐Ÿ”„ Divergence from Original PageIndex | Feature | Original PageIndex | RustyPageIndex | |---------|-------------------|----------------| | Language | Python | Rust | | Indexing | Per-file indices | Unified folder tree | | Multi-repo | Not supported | Full support with `list`/`clean` | | Parallelism | Sequential | Rayon parallel processing | | Storage | Cloud-based (MCP) | Local filesystem | | Tree Structure | Flat sections | Folder โ†’ File โ†’ Section hierarchy | | Headerless Markdown | Empty tree | Auto-creates "Document" node | --- ## ๐Ÿ› ๏ธ Getting Started ### Installation **One-liner Install (Unix/macOS):** ```bash curl -fsSL https://raw.githubusercontent.com/Algiras/rusty-pageindex/main/install.sh | bash ``` **One-liner Install (Windows PowerShell):** ```powershell irm https://raw.githubusercontent.com/Algiras/rusty-pageindex/main/install.ps1 | iex ``` **Via Cargo:** ```bash cargo install rusty-page-indexer ``` ### ๐Ÿง™ Use as an Agent Skill ```bash npx skills add https://github.com/Algiras/rusty-pageindex --skill rusty-page-indexer ``` ### ๐Ÿ”‘ Authentication ```bash # For OpenAI rusty-page-indexer auth --api-key "your-key-here" # For Ollama (local LLM) rusty-page-indexer auth --api-key "ollama" --api-base "http://localhost:11434/v1" --model "llama3.2" ``` --- ## ๐ŸŒฒ Usage ### Indexing Documents ```bash # Index a repository rusty-page-indexer index ./my-project # Index with LLM-generated summaries rusty-page-indexer index ./my-project --enrich # Force re-index (ignores cache) rusty-page-indexer index ./my-project --force # Preview what would be indexed rusty-page-indexer index ./my-project --dry-run ``` ### Managing Multiple Repositories ```bash # Index multiple repos rusty-page-indexer index ./repo-a rusty-page-indexer index ./repo-b # List all indexed repositories rusty-page-indexer list # Example output: # ๐Ÿ“‹ Indexed Repositories # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ # ๐Ÿ“ repo-a (125.3 KB) # /Users/you/projects/repo-a # ๐Ÿ“ repo-b (89.7 KB) # /Users/you/projects/repo-b # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ # Total: 2 indices ``` ### Querying ```bash # Search across ALL indexed repositories rusty-page-indexer query "how does authentication work" # Search within a specific repository rusty-page-indexer query "kafka messaging" --path repo-a ``` ### Cleanup ```bash # Remove a specific index rusty-page-indexer clean repo-a # Remove all indices rusty-page-indexer clean --all ``` ### Status Information ```bash rusty-page-indexer info ``` --- ## ๐Ÿค– Model Compatibility ### OpenAI Models (Remote) | Model | Cost | Speed | Notes | |-------|------|-------|-------| | `gpt-4o` | $$$ | Fast | Best accuracy, recommended for complex queries | | `gpt-4o-mini` | $ | Very Fast | Great balance of cost and quality โญ | | `gpt-4.1-mini` | $ | Very Fast | Latest mini variant | | `gpt-4-turbo` | $$ | Fast | Good for detailed reasoning | | `gpt-3.5-turbo` | ยข | Very Fast | Budget option, decent accuracy | ```bash # Configure for OpenAI rusty-page-indexer auth --api-key "sk-..." --model "gpt-4o-mini" # Override model per query rusty-page-indexer query "question" --model gpt-4o ``` ### Local Models (Ollama) | Model | Size | Works | Notes | |-------|------|-------|-------| | `gemma3:1b` | 1B | โœ… | Minimum recommended for local use | | `llama3.2:latest` | 3B | โœ… | Good balance of speed and accuracy โญ | | `qwen2.5:7b` | 7B | โœ… | Reliable, slightly conservative | | `llama3.1:latest` | 8B | โœ… | Excellent accuracy | | `mistral:7b` | 7B | โœ… | Fast and capable | | `phi3:mini` | 3.8B | โœ… | Microsoft's compact model | | `qwen2.5:0.5b` | 0.5B | โŒ | Too small, unreliable responses | | `tinyllama:1.1b` | 1.1B | โŒ | Doesn't follow output format | ```bash # Configure for Ollama rusty-page-indexer auth --api-key "ollama" --api-base "http://localhost:11434/v1" --model "llama3.2" # Make sure Ollama is running ollama serve ``` ### OpenAI-Compatible APIs Works with any OpenAI-compatible endpoint: ```bash # Azure OpenAI rusty-page-indexer auth --api-key "your-key" --api-base "https://your-resource.openai.azure.com/v1" --model "gpt-4" # Together AI rusty-page-indexer auth --api-key "your-key" --api-base "https://api.together.xyz/v1" --model "meta-llama/Llama-3-70b-chat-hf" # Groq rusty-page-indexer auth --api-key "your-key" --api-base "https://api.groq.com/openai/v1" --model "llama3-70b-8192" ``` **Recommendation**: Use `gpt-4o-mini` for remote or `llama3.2` for local. Add `--enrich` during indexing for better search quality. --- ## ๐Ÿ” How Search Works 1. **Repository Selection**: Query matches against all indexed repos (or filtered by `--path`) 2. **Tree Navigation**: LLM navigates the Folder โ†’ File โ†’ Section hierarchy 3. **Content Retrieval**: Matching leaf nodes return full text content The unified tree structure allows the LLM to see file names within folders, making navigation more accurate than flat file lists. --- ## ๐Ÿ“ Storage Structure ``` ~/.rusty-page-indexer/ โ”œโ”€โ”€ config.toml # API credentials and settings โ”œโ”€โ”€ manifest.json # Index registry with all repos โ””โ”€โ”€ indices/ โ”œโ”€โ”€ {hash-a}.json # Unified tree for repo-a โ””โ”€โ”€ {hash-b}.json # Unified tree for repo-b ``` --- ## ๐Ÿ“ Supported Document Types ### Markdown (`.md`) - Parses heading structure (`#`, `##`, `###`) into hierarchical tree - Headerless files auto-create a "Document" node with full content ### PDF (`.pdf`) - Extracts text using `pdf-extract` - Creates single root node with document text - Works best with text-based PDFs (not scanned images) --- ## โ˜• Java Integration RustyPageIndexer now includes Java bindings via JNI, allowing you to use the high-performance Rust indexing engine from Java applications. ### Quick Start ```bash # Build JAR with one command ./build-jar.sh # Linux/macOS # or .\build-jar.ps1 # Windows ``` ### Usage in Java ```java import com.rusty.pageindexer.RustyPageIndexer; RustyPageIndexer indexer = new RustyPageIndexer(); try { indexer.initialize(null); indexer.index("/path/to/documents", false, false); String results = indexer.query("how does authentication work", null, null); System.out.println(results); } finally { indexer.destroy(); } ``` ### Documentation - **Build Instructions:** [BUILD_JAR.md](BUILD_JAR.md) - **Java API Reference:** [JAVA_INTEGRATION.md](JAVA_INTEGRATION.md) - **Implementation Details:** [IMPLEMENTATION_SUMMARY.md](IMPLEMENTATION_SUMMARY.md) ### Features - โœ… Native performance with Java convenience - โœ… Automatic native library loading from JAR - โœ… Multi-platform support (Linux, macOS, Windows) - โœ… One-click JAR packaging - โœ… Complete Maven integration --- ## ๐Ÿ“„ License MIT