# embedius **Repository Path**: liwen_test_sync_group/embedius ## Basic Information - **Project Name**: embedius - **Description**: Local vector database with document indexer - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-03-21 - **Last Updated**: 2026-02-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Embedius [![GoReportCard](https://goreportcard.com/badge/github.com/viant/embedius)](https://goreportcard.com/report/github.com/viant/embedius) [![GoDoc](https://godoc.org/github.com/viant/embedius?status.svg)](https://godoc.org/github.com/viant/embedius) Embedius is a SQLite-backed vector indexing service for local files and upstream data, designed for semantic search and retrieval workflows. ## Features - SQLite storage with `sqlite-vec` shadow tables - Per-root SCN tracking and upstream sync - File system indexing with include/exclude and size filters - Config-driven multi-root indexing - CLI and reusable service package ## Installation ```bash go get github.com/viant/embedius ``` ## SQLite Storage & CLI Embedius stores vectors in SQLite via `sqlite-vec`. For the indexer package, the default database file is `/embedius.sqlite`. Schema DDLs: - SQLite: `db/schema/sqlite/schema.ddl` - MySQL upstream: `db/schema/mysql/schema.ddl` - SCN triggers: `db/schema/sqlite/shadow_sync.ddl`, `db/schema/mysql/shadow_sync.ddl` CLI (from `cmd/embedius`): ```bash # Index a root (dataset) into SQLite embedius index --root abc --path /abs/path/to/abc --db /tmp/vec.sqlite --progress # Per-run include/exclude filters (comma-separated) embedius index --root abc --path /abs/path/to/abc --db /tmp/vec.sqlite \\ --include \"**/*.go,**/*.md\" --exclude \"**/vendor/**\" --max-size 10485760 # Index multiple roots from config embedius index --config /path/to/roots.yaml --all --db /tmp/vec.sqlite # Index all roots from config embedius index --config /path/to/roots.yaml --all # --db overrides config store.dsn: embedius index --config /path/to/roots.yaml --all --db /tmp/vec.sqlite # roots.yaml (or default ~/embedius/config.yaml if present) # store: # driver: sqlite # dsn: /tmp/vec.sqlite # upstreamStore: # driver: postgres # dsn: postgres://user:pass@host:5432/db?sslmode=disable # roots: # abc: # path: /abs/path/to/abc # include: # - "**/*.go" # - "**/*.md" # exclude: # - "**/node_modules/**" # - "**/.git/**" # max_size_bytes: 10485760 # # Optional sync controls (when serving with upstreamStore/upstreams): # # syncEnabled: false # # upstreamRef: custom # overrides default upstreamStore # # minIntervalSeconds: 60 # # batch: 200 # # shadow: shadow_vec_docs # # force: false # def: # path: /abs/path/to/def # etl: # path: /abs/path/to/etl # include: # - "**/*.go" # - "**/*.sql" # - "**/*.dql" # - "**/*.mod" # exclude: # - "**/*_test.go" # max_size_bytes: 1048576 # Query embedius search --db /tmp/vec.sqlite --root abc --query "how does auth work?" # Query using config (defaults to ~/embedius/config.yaml when present) embedius search --config /path/to/roots.yaml --root abc --query "how does auth work?" embedius search --root abc --query "how does auth work?" # Debug hold (for gops/pprof attach) embedius search --db /tmp/vec.sqlite --root abc --query "auth" --embedder simple --debug-sleep 60 # Show root metadata embedius roots --db /tmp/vec.sqlite embedius roots --config /path/to/roots.yaml # Upstream sync (optional) embedius index --root abc --path /abs/path/to/abc --db /tmp/vec.sqlite \\ --upstream-driver mysql --upstream-dsn 'user:pass@tcp(host:3306)/db' \\ --upstream-shadow shadow_vec_docs --sync-batch 200 # Dedicated sync embedius sync --root abc --db /tmp/vec.sqlite \\ --upstream-driver mysql --upstream-dsn 'user:pass@tcp(host:3306)/db' \\ --upstream-shadow shadow_vec_docs --sync-batch 200 # Sync multiple roots from config embedius sync --config /path/to/roots.yaml --all --db /tmp/vec.sqlite \\ --upstream-driver mysql --upstream-dsn 'user:pass@tcp(host:3306)/db' # Sync with include/exclude filters embedius sync --root abc --db /tmp/vec.sqlite \\ --include "**/*.go" --exclude "**/vendor/**" --max-size 10485760 \\ --upstream-driver mysql --upstream-dsn 'user:pass@tcp(host:3306)/db' Note: sync filters apply only to insert/update; deletes are always applied. # --db overrides config store.dsn: embedius sync --config /path/to/roots.yaml --all --db /tmp/vec.sqlite \\ --upstream-driver mysql --upstream-dsn 'user:pass@tcp(host:3306)/db' # Admin tasks embedius admin --db /tmp/vec.sqlite --root abc --action rebuild embedius admin --db /tmp/vec.sqlite --root abc --action invalidate embedius admin --db /tmp/vec.sqlite --root abc --action prune embedius admin --db /tmp/vec.sqlite --root abc --action prune --scn 12345 --force embedius admin --db /tmp/vec.sqlite --root abc --action check # Admin across all roots from config embedius admin --config /path/to/roots.yaml --all --action rebuild --db /tmp/vec.sqlite # Build with upstream drivers go build -tags mysql ./cmd/embedius go build -tags postgres ./cmd/embedius ``` ## MCP Server (serve + remote search) Start the MCP server (metrics logging is off by default; enable with `--metrics-log`): ```bash embedius serve --config /path/to/roots.yaml --db /tmp/vec.sqlite # With metrics logging enabled embedius serve --config /path/to/roots.yaml --db /tmp/vec.sqlite --metrics-log ``` Query the MCP server from the CLI: ```bash time ./embedius search -all -query='supply constrants' -mcp-addr='127.0.0.1:6061' ``` ## Config Reference Embedius looks for `~/embedius/config.yaml` by default if `--config` is not provided. Example: ```yaml store: driver: sqlite dsn: /tmp/vec.sqlite upstreamStore: driver: mysql dsn: user:pass@tcp(host:3306)/db # Optional named upstreams for per-root overrides. # upstreams: # - name: custom # driver: mysql # dsn: user:pass@tcp(host:3306)/otherdb # shadow: shadow_vec_docs # batch: 200 # force: false # enabled: true # minIntervalSeconds: 60 roots: etl: path: /abs/path/to/etl include: - "**/*.go" - "**/*.sql" - "**/*.dql" - "**/*.mod" exclude: - "**/*_test.go" max_size_bytes: 1048576 # Optional sync controls (default uses upstreamStore) # syncEnabled: true # upstreamRef: custom # minIntervalSeconds: 60 # batch: 200 # shadow: shadow_vec_docs # force: false newui: path: /abs/path/to/newui include: - "**/*.ts" - "**/*.html" - "**/*.scss" - "**/*.css" - "**/*.json" exclude: - "**/*.spec.ts" - "**/*.test.ts" - "**/node_modules/**" - "**/dist/**" - "**/build/**" - "**/.angular/**" - "**/.cache/**" max_size_bytes: 1048576 ``` Notes: - When `upstreamStore` is configured, `embedius serve` will start background sync loops for all roots unless `syncEnabled: false` is set. - A root may set `upstreamRef` to use a named entry from `upstreams` instead of the default `upstreamStore`. ## Endly E2E Endly E2E entry point is `e2e/run.yaml` (runs sequentially): ```bash cd e2e endly run.yaml ``` ## Concepts - Root (dataset): named logical source (e.g., `abc`) mapped to a local path. - Assets: files under a root, tracked with `md5`, size, and `scn`. - Documents: chunks derived from assets; stored in `_vec_emb_docs`. - SCN: per-root sequence for change tracking and upstream sync. - Shadow tables: `_vec_*` tables used by `sqlite-vec` for vector search. ## Service Package (Library Use) Use `embedius/service` when you want to embed indexing/search/sync/admin flows directly in your program without invoking the CLI. ```go package main import ( "context" "log" "github.com/viant/embedius/embeddings/openai" "github.com/viant/embedius/service" ) func main() { ctx := context.Background() embedder := &openai.Embedder{C: openai.NewClient("KEY", "text-embedding-3-small")} svc, err := service.NewService( service.WithDSN("/tmp/embedius.sqlite"), service.WithEmbedder(embedder), ) if err != nil { log.Fatal(err) } defer svc.Close() roots, _, err := service.ResolveRoots(service.ResolveRootsRequest{ Root: "abc", RootPath: "/abs/path/to/abc", RequirePath: true, Include: []string{"**/*.go"}, }) if err != nil { log.Fatal(err) } if err := svc.Index(ctx, service.IndexRequest{ Roots: roots, Model: "text-embedding-3-small", Logf: log.Printf, Prune: true, BatchSize: 64, }); err != nil { log.Fatal(err) } results, err := svc.Search(ctx, service.SearchRequest{ Dataset: "abc", Query: "how does auth work?", Limit: 5, }) if err != nil { log.Fatal(err) } log.Printf("matches=%d", len(results)) } ``` ## License The source code is made available under the terms of the Apache License, Version 2, as stated in the file `LICENSE`. Individual files may be made available under their own specific license, all compatible with Apache License, Version 2. Please see individual files for details. ## Credits Developed and maintained by [Viant](https://github.com/viant).