# SDP **Repository Path**: yeahsj/smart-data-platform ## Basic Information - **Project Name**: SDP - **Description**: 智数平台 (SmartData Platform) 是一个简化的 Palantir Foundry 风格的数据平台 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2026-03-06 - **Last Updated**: 2026-03-06 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 智数平台 - AI Agent Documentation ## Project Overview 智数平台 (SmartData Platform) 是一个简化的 Palantir Foundry 风格的数据平台,采用**四层横向架构**: ``` ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ 数据接入层 │ → │ 数据存储层 │ → │ 数据服务层 │ → │ 智能应用层 │ │ Ingestion │ │ Storage │ │ Service │ │Intelligence │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ ``` **横向流动,每层单一职责:** - **数据接入层**:多源采集 → ETL清洗 → 数据资产沉淀 - **数据存储层**:双轨存储引擎(OLAP + RDF)+ 数据漏斗调度 - **数据服务层**:本体元数据与数据融合 → 统一查询(SQL + NL2SQL) - **智能应用层**:智能推荐、异常检测、预测分析、数字孪生 ## System Architecture ### Data Flow 平台采用**四层横向架构**,数据从左到右流动: ``` ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ 数据接入层 │ → │ 数据存储层 │ → │ 数据服务层 │ → │ 智能应用层 │ │ (Ingestion) │调度 │ (Storage) │融合 │ (Service) │调用 │ (Intelligence) │ ├──────────────────┤ ├──────────────────┤ ├──────────────────┤ ├──────────────────┤ │ │ │ ┌────────────┐ │ │ ┌────────────┐ │ │ ┌───┐ ┌───┐ │ │ 多源采集 │ │ │ 列式存储 │ │ │ │ 本体元数据 │ │ │ │推荐│ │检测│ │ │ ETL清洗 │─────→│ │ (OLAP) │ │─────→│ │ + │ │─────→│ └───┘ └───┘ │ │ 资产沉淀 │ │ └────+──────┘ │ │ │ 统一查询 │ │ │ ┌───┐ ┌───┐ │ │ │ │ + │ │ │ (SQL/NL) │ │ │ │预测│ │孪生│ │ │ Dataset Pool │ │ ┌────+──────┐ │ │ └────────────┘ │ │ └───┘ └───┘ │ │ │ │ │ 图数据库 │ │ │ │ │ │ │ │ │ │ (RDF) │ │ │ │ │ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ └──────────────────┘ ``` ### Architecture Layers | 层级 | 核心组件 | 主要职责 | |------|----------|----------| | **数据接入层** | 源端系统 + 数据管道 + 数据集 | 多源采集、清洗转换、资产沉淀 | | **数据存储层** | 列式存储 + 图数据库 | 双轨存储,支撑分析与图谱两类场景 | | **数据服务层** | 本体元数据 + 统一查询 | Schema与数据融合,提供智能查询接口 | | **智能应用层** | 推荐/检测/预测/孪生 | 面向业务场景的智能化应用 | ### Storage Layer Comparison (数据存储层) | 存储引擎 | 技术 | 数据形态 | 适用查询 | 同步方式 | |---------|------|----------|----------|----------| | **列式存储** | MySQL Columnar | 宽表 (Flattened) | SQL分析查询 | 数据漏斗调度 | | **图数据库** | Apache Jena TDB2 | 三元组 (S-P-O) | SPARQL图谱查询 | 数据漏斗调度 | > **Dataset Pool** (数据接入层) 作为源数据,通过数据漏斗同步到上述两种存储引擎 ### Data Object Service Layer (数据服务层) 数据对象服务是平台的核心能力层,实现**本体元数据**与**底层数据**的融合: **核心组件:** 1. **本体元数据 (Ontology Schema)** - ObjectType:业务对象类型定义 - Property:属性定义 - LinkType:关系定义 - Rule:业务规则 2. **统一查询服务 (Unified Query)** - **SQL接口**:标准结构化查询 - **NL2SQL**:自然语言转查询(智能问数能力) - **语义化查询**:基于本体的实体导航与关联查询 **服务优势:** - 应用层无需关心底层存储(OLAP或RDF) - 统一查询接口,支持SQL和自然语言 - 基于本体的语义理解,实现智能数据访问 ## Technology Stack ### Current Stack (v2.0 - Spring Boot + Vue3) | Layer | Technology | Version | |-------|------------|---------| | **Frontend** | Vue 3 + TypeScript | 3.4.0 | | **Build Tool** | Vite | 5.4.0 | | **UI Framework** | Element Plus + Tailwind CSS | 2.5.0 | | **State Management** | Pinia | 2.1.0 | | **Backend** | Spring Boot | 3.2.0 | | **ORM** | Spring Data JPA / Hibernate | 6.4.0 | | **Database** | MySQL 8.0 / SQLite (dev) | 8.0.35 | | **RDF Store** | Apache Jena TDB2 | 4.10.0 | | **Language** | Java | 17 | ### Legacy Stack (v1.x - FastAPI + React) - Deprecated - Python 3.11 + FastAPI + SQLAlchemy + React 18 - Located in `foundry-platform/` directory (backup only) ## Project Structure ``` foundry-platform-final/ ├── foundry-platform-springboot/ # Spring Boot Backend │ ├── backend/ │ │ ├── src/main/java/com/foundry/platform/ │ │ │ ├── config/ # Configuration classes │ │ │ ├── controller/ # REST API Controllers │ │ │ │ ├── DatasetController.java │ │ │ │ ├── DatasetManagementController.java # 独立数据集管理(V2) │ │ │ │ ├── DatasetOLAPController.java │ │ │ │ ├── OntologyController.java │ │ │ │ ├── ObjectInstanceController.java │ │ │ │ ├── DataFunnelController.java │ │ │ │ ├── CsvImportController.java # CSV 导入 │ │ │ │ ├── RDFController.java │ │ │ │ └── DebugController.java │ │ │ ├── entity/ # JPA Entities │ │ │ │ ├── Dataset.java # 独立数据集(objectTypeId 可选) │ │ │ │ ├── DatasetField.java # 数据集字段定义 │ │ │ │ ├── ObjectType.java │ │ │ │ ├── ObjectInstance.java │ │ │ │ ├── LinkType.java │ │ │ │ ├── OntologyRule.java │ │ │ │ └── OntologyDomain.java │ │ │ ├── repository/ # Spring Data Repositories │ │ │ ├── service/ # Business Logic │ │ │ │ ├── ECommerceDomainInitService.java │ │ │ │ ├── ObjectInstanceOLAPService.java │ │ │ │ ├── DataFunnelService.java │ │ │ │ ├── DatasetManagementService.java # 独立数据集管理 │ │ │ │ ├── CsvImportService.java # CSV 导入服务 │ │ │ │ └── rdf/ # RDF Services │ │ │ ├── olap/ # OLAP Storage Layer │ │ │ │ ├── common/AbstractOLAPStore.java │ │ │ │ └── mysql/MySQLStore.java │ │ │ └── rdf/ # RDF Storage Layer │ │ │ ├── RDFStore.java (interface) │ │ │ └── jena/JenaRDFStore.java │ │ └── src/main/resources/ │ │ └── application.yml # Multi-profile config │ └── pom.xml │ ├── foundry-platform-vue3/ # Vue3 Frontend │ ├── src/ │ │ ├── views/ # Page Views │ │ │ ├── DashboardView.vue # 首页 - 四层架构图 │ │ │ ├── DatasetsView.vue # 数据集管理 │ │ │ ├── OntologyView.vue # 本体管理 │ │ │ ├── FunnelManagementView.vue # 数据漏斗 │ │ │ └── ... │ │ ├── components/ # Vue Components │ │ ├── stores/ # Pinia Stores │ │ ├── api/ # API Clients │ │ └── router/ # Vue Router │ └── package.json │ ├── sample-data/ # 示例数据文件 ├── session_summary/ # 开发会话记录 ├── AGENTS.md # 项目文档 └── DEPLOY.md # 部署文档 ``` ## Core Domain Models ``` Dataset (数据集) - 独立数据池,位于本体域之外 ├── id, name, description ├── sourceType: manual | file | api | database | stream ├── connectionConfig (JSON) ├── syncConfig (JSON) ├── recordCount (缓存计数) ├── status: active | paused | error ├── objectTypeId (可选,关联 ObjectType) ├── createdAt, updatedAt └── DatasetField[] (字段定义) - 自带 Schema ├── name, displayName, description ├── dataType: string | integer | float | boolean | datetime | date ├── isPrimaryKey, isRequired ├── orderIndex └── length, precision, scale OntologyDomain (域) ├── id, name, displayName, description, status └── ObjectType[] (对象类型) ├── name, displayName, description, icon ├── domainId (所属域ID) ├── Property[] (属性定义) │ ├── name, displayName, dataType │ ├── isRequired, isPrimaryKey │ └── validationRules (JSON) └── LinkType[] (关系类型) ├── sourceTypeId, targetTypeId ├── cardinality: one_to_one | one_to_many | many_to_many └── properties (JSON) DataFunnel (数据漏斗) ├── name, description, sourceDatasetId ├── targetType: RDF | OLAP | API ├── targetConfig (JSON): graphName, tableName, endpoint ├── syncMode: FULL | INCREMENTAL | HYBRID ├── scheduleCron (定时调度) ├── fieldMappings (JSON): 字段映射 ├── enabled: true | false ├── funnelStatus: ACTIVE | PAUSED | ERROR ├── lastSyncAt / lastSyncStatus / lastSyncCount └── SyncJobHistory[] (同步历史) ├── jobType: MANUAL | SCHEDULED | RETRY ├── status: RUNNING | SUCCESS | FAILED ├── processedCount / tripleCount ├── startTime / endTime / durationMs └── errorMessage / errorStack ``` ## API Endpoints ### Ontology Management | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/api/ontology/object-types` | List all Object Types | | POST | `/api/ontology/object-types` | Create Object Type | | GET | `/api/ontology/object-types/{id}` | Get Object Type details | | DELETE | `/api/ontology/object-types/{id}` | Delete Object Type | | GET | `/api/ontology/link-types` | List Link Types | | POST | `/api/ontology/link-types` | Create Link Type | | GET | `/api/ontology/link-types/{id}` | Get Link Type details | | DELETE | `/api/ontology/link-types/{id}` | Delete Link Type | | GET | `/api/ontology/rules` | List all Rules | | POST | `/api/ontology/rules` | Create Rule | | PUT | `/api/ontology/rules/{id}` | Update Rule | | DELETE | `/api/ontology/rules/{id}` | Delete Rule | ### Dataset Management (V2 - Independent) 数据集作为独立的数据池,自带 Schema,不强制绑定 ObjectType。 | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/api/datasets-v2` | List all Datasets | | POST | `/api/datasets-v2` | Create Dataset with fields | | GET | `/api/datasets-v2/{id}` | Get Dataset with field definitions | | PUT | `/api/datasets-v2/{id}` | Update Dataset | | DELETE | `/api/datasets-v2/{id}` | Delete Dataset | | GET | `/api/datasets-v2/{id}/fields` | List dataset fields | | POST | `/api/datasets-v2/{id}/fields` | Add field to dataset | | DELETE | `/api/datasets-v2/{id}/fields/{fieldId}` | Remove field | | GET | `/api/datasets-v2/{id}/data` | Query dataset data | | POST | `/api/datasets-v2/{id}/refresh-count` | Refresh dataset record count | | POST | `/api/datasets-v2/refresh-all-counts` | Refresh all datasets record count | **创建数据集请求示例:** ```json { "dataset": { "name": "product_data", "description": "产品原始数据", "sourceType": "manual" }, "fields": [ {"name": "productId", "displayName": "产品ID", "dataType": "string", "isPrimaryKey": true}, {"name": "name", "displayName": "产品名称", "dataType": "string"}, {"name": "price", "displayName": "价格", "dataType": "float"} ] } ``` ### CSV Import (New) 支持从 CSV 文件批量导入数据,自动创建 ObjectType、Property 和 Dataset。 | Method | Endpoint | Description | |--------|----------|-------------| | POST | `/api/csv-import/preview` | 上传 CSV 文件并预览(返回列名、数据类型、前10行) | | POST | `/api/csv-import/create-object-type` | 根据 CSV 列创建 ObjectType 和 Properties | | POST | `/api/csv-import/create-dataset` | 创建 Dataset(关联 ObjectType) | | POST | `/api/csv-import/import/{datasetId}` | 导入 CSV 数据到指定 Dataset | | POST | `/api/csv-import/full-import` | 完整导入(创建 ObjectType + Dataset + 导入数据) | **使用流程:** 1. 调用 `/api/csv-import/preview` 上传 CSV,获取列信息和数据类型推断 2. 配置列映射(CSV列名 → 属性名、显示名、数据类型、是否主键) 3. 调用 `/api/csv-import/full-import` 一键完成导入 **示例请求(full-import):** ```bash curl -X POST http://localhost:8080/api/csv-import/full-import \ -F "file=@products.csv" \ -F "objectTypeName=Product" \ -F "objectTypeDisplayName=产品" \ -F "datasetName=product_data" \ -F "description=产品数据" \ -F "primaryKeyColumn=productId" \ -F 'mappings=[{"csvColumnName":"productId","propertyName":"productId","propertyDisplayName":"产品ID","dataType":"string","primaryKey":true}]' ``` ### OLAP Operations | Method | Endpoint | Description | |--------|----------|-------------| | POST | `/api/olap/{datasetId}/import` | Batch import to OLAP | | GET | `/api/olap/{datasetId}/query` | Query OLAP data | | GET | `/api/olap/{datasetId}/stats` | Get table statistics | | POST | `/api/olap/{datasetId}/aggregate` | Aggregation query | ### Business Actions 业务动作定义在 Ontology 层,与 ObjectType 关联。 | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/api/actions/object-types/{id}/actions` | List actions for object type | | POST | `/api/actions/object-types/{id}/actions` | Create action type | | PUT | `/api/actions/{actionId}` | Update action type | | DELETE | `/api/actions/{actionId}` | Delete action type | | POST | `/api/actions/execute` | Execute action on target | | POST | `/api/actions/execute-batch` | Batch execute action | | GET | `/api/actions/executions` | List execution history | | GET | `/api/actions/executions/target/{targetId}` | Get target's execution history | ### Data Funnel 数据漏斗作为独立管理对象,用于配置和执行数据集同步。 | Method | Endpoint | Description | |--------|----------|-------------| | POST | `/api/funnels` | Create data funnel | | GET | `/api/funnels` | List all funnels (with pagination) | | GET | `/api/funnels/{id}` | Get funnel details | | PUT | `/api/funnels/{id}` | Update funnel | | DELETE | `/api/funnels/{id}` | Delete funnel | | GET | `/api/funnels/dataset/{datasetId}` | Get funnels by dataset | | POST | `/api/funnels/{id}/toggle` | Enable/disable funnel | | POST | `/api/funnels/{id}/sync` | Manual trigger sync | | GET | `/api/funnels/{id}/preview` | Preview sync data | | GET | `/api/funnels/{id}/history` | Get sync history | | GET | `/api/funnels/{id}/statistics` | Get funnel statistics | ### RDF Test & Query | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/api/rdf-test/health` | RDF store health check | | GET | `/api/rdf-test/stats` | RDF store statistics | | GET | `/api/rdf-test/clear` | Clear all RDF data | | POST | `/api/rdf-test/query` | Execute SPARQL query | ### Debug Utilities | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/api/debug/table-status/{datasetName}` | Check table status | | POST | `/api/debug/force-init/{datasetName}` | Force reinitialize table | | GET | `/api/debug/query/{datasetName}` | Query table data | > **注意**:记录数刷新功能已迁移至 `/api/datasets-v2/{id}/refresh-count` 和 `/api/datasets-v2/refresh-all-counts` ## Build and Development ### Backend (Spring Boot) ```bash cd foundry-platform-springboot/backend # Compile mvn clean compile # Run with MySQL (default) mvn spring-boot:run # Run with SQLite (development) mvn spring-boot:run -Dspring-boot.run.profiles=sqlite ``` ### Frontend (Vue3) ```bash cd foundry-platform-vue3 # Install dependencies npm install # Start development server npm run dev # Build for production npm run build ```