# duckdb-analysis **Repository Path**: ourteam/duckdb-analysis ## Basic Information - **Project Name**: duckdb-analysis - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-10-16 - **Last Updated**: 2025-10-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## duckdb 测试 - [prepareData](src/main/java/cc/sofast/practice/duckdb/analysis/PrepareData.java) - 准备数据 - 查询数据 ## doris arrow flight sql 测试 - [DorisArrowFlightSQLQuery](src/main/java/cc/sofast/practice/duckdb/analysis/DorisArrowFlightSQLQuery.java) - 通过arrow协议查询数据 - 注意事项 - 配置flight端口 ``` ➜ ~ cat apache-doris/be/conf/be.conf |grep arrow_flight_sql_port arrow_flight_sql_port = 8050 ➜ ~ cat apache-doris/fe/conf/fe.conf |grep arrow_flight_sql_port arrow_flight_sql_port = 8070 ``` - priority_networks 一般配置成ip ```shell ➜ ~ cat apache-doris/fe/conf/fe.conf |grep priority_networks priority_networks = 172.16.4.183 ➜ ~ cat apache-doris/be/conf/be.conf |grep priority_networks # priority_networks = 10.10.10.0/24;192.168.0.0/16 priority_networks = 172.16.4.183 ``` - 如果报127.0.0.1:8090: Connection refused 可能是be的ip的问题 - 可以用`ALTER SYSTEM DROPP BACKEND "127.0.0.1:9050";` 删除,添加`ALTER SYSTEM ADD BACKEND "172.16.4.178:9050";` - `SHOW BACKENDS;` 查询后端 - 建表sql ```mysql CREATE TABLE `data_metrics2` ( `ts` datetime(3) NOT NULL COMMENT "事件发生时间(精确到毫秒)", `c1` decimal(16, 2) NULL, `c2` decimal(16, 2) NULL, `c3` decimal(16, 2) NULL, `c4` decimal(16, 2) NULL, `c5` decimal(16, 2) NULL, `c6` decimal(16, 2) NULL, `c7` decimal(16, 2) NULL, `c8` decimal(16, 2) NULL, `c9` decimal(16, 2) NULL, `c10` decimal(16, 2) NULL, `c11` decimal(16, 2) NULL, `c12` decimal(16, 2) NULL, `c13` decimal(16, 2) NULL, `c14` decimal(16, 2) NULL, `c15` decimal(16, 2) NULL, `c16` decimal(16, 2) NULL, `c17` decimal(16, 2) NULL, `c18` decimal(16, 2) NULL, `c19` decimal(16, 2) NULL, `c20` decimal(16, 2) NULL, `c21` decimal(16, 2) NULL, `c22` decimal(16, 2) NULL, `c23` decimal(16, 2) NULL, `c24` decimal(16, 2) NULL, `c25` decimal(16, 2) NULL, `c26` decimal(16, 2) NULL, `c27` decimal(16, 2) NULL, `c28` decimal(16, 2) NULL, `c29` decimal(16, 2) NULL, `c30` decimal(16, 2) NULL ) ENGINE = OLAP DUPLICATE KEY(`ts`) AUTO PARTITION BY RANGE (date_trunc(`ts`, 'HOUR'))() DISTRIBUTED BY HASH(`ts`) BUCKETS 12 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "min_load_replica_num" = "-1", "is_being_synced" = "false", "storage_medium" = "hdd", "storage_format" = "V2", "inverted_index_storage_format" = "V2", "light_schema_change" = "true", "disable_auto_compaction" = "false", "enable_single_replica_compaction" = "false", "group_commit_interval_ms" = "10000", "group_commit_data_bytes" = "134217728" ); ``` - 生成测试数据 ```python import csv from datetime import datetime, timedelta import random from decimal import Decimal, ROUND_HALF_UP output_file = "data_metrics_180w.csv" start_time = datetime(2025, 10, 15, 18, 0, 0) num_rows = 2000000 num_columns = 30 with open(output_file, "w", newline="") as csvfile: writer = csv.writer(csvfile) for i in range(num_rows): # 每行递增1秒,毫秒保留.000 ts = start_time + timedelta(seconds=i) ts_str = ts.strftime("%Y-%m-%d %H:%M:%S.%f")[:-3] # 精确到毫秒 row = [ts_str] # 生成30列随机 decimal(16,2) for _ in range(num_columns): val = Decimal(random.random() * 10000).quantize(Decimal("0.01"), rounding=ROUND_HALF_UP) row.append(val) writer.writerow(row) ``` - 导入 ```shell curl --location-trusted -u root:root \ -H "format: csv" \ -H "column_separator: ," \ -T data_metrics_180w.csv \ http://172,16.4.183:8030/api/test_db/data_metrics2/_stream_load ```