# duckdb-analysis

**Repository Path**: ourteam/duckdb-analysis

## Basic Information

- **Project Name**: duckdb-analysis
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-10-16
- **Last Updated**: 2025-10-16

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

## duckdb 测试

- [prepareData](src/main/java/cc/sofast/practice/duckdb/analysis/PrepareData.java)
    - 准备数据
    - 查询数据

## doris arrow flight sql 测试

- [DorisArrowFlightSQLQuery](src/main/java/cc/sofast/practice/duckdb/analysis/DorisArrowFlightSQLQuery.java)
    - 通过arrow协议查询数据
- 注意事项
    - 配置flight端口
    ```
      ➜  ~ cat apache-doris/be/conf/be.conf |grep arrow_flight_sql_port                 
      arrow_flight_sql_port = 8050
      ➜  ~ cat apache-doris/fe/conf/fe.conf |grep arrow_flight_sql_port                 
      arrow_flight_sql_port = 8070
    ```
    - priority_networks 一般配置成ip
      ```shell
      ➜  ~ cat apache-doris/fe/conf/fe.conf |grep priority_networks
      priority_networks = 172.16.4.183
      ➜  ~ cat apache-doris/be/conf/be.conf |grep priority_networks
      # priority_networks = 10.10.10.0/24;192.168.0.0/16
      priority_networks = 172.16.4.183
      ```
    - 如果报127.0.0.1:8090: Connection refused 可能是be的ip的问题
    - 可以用`ALTER SYSTEM DROPP BACKEND "127.0.0.1:9050";` 删除，添加`ALTER SYSTEM ADD BACKEND "172.16.4.178:9050";`
    - `SHOW BACKENDS;` 查询后端
- 建表sql

```mysql
CREATE TABLE `data_metrics2`
(
    `ts`  datetime(3)    NOT NULL COMMENT "事件发生时间（精确到毫秒）",
    `c1`  decimal(16, 2) NULL,
    `c2`  decimal(16, 2) NULL,
    `c3`  decimal(16, 2) NULL,
    `c4`  decimal(16, 2) NULL,
    `c5`  decimal(16, 2) NULL,
    `c6`  decimal(16, 2) NULL,
    `c7`  decimal(16, 2) NULL,
    `c8`  decimal(16, 2) NULL,
    `c9`  decimal(16, 2) NULL,
    `c10` decimal(16, 2) NULL,
    `c11` decimal(16, 2) NULL,
    `c12` decimal(16, 2) NULL,
    `c13` decimal(16, 2) NULL,
    `c14` decimal(16, 2) NULL,
    `c15` decimal(16, 2) NULL,
    `c16` decimal(16, 2) NULL,
    `c17` decimal(16, 2) NULL,
    `c18` decimal(16, 2) NULL,
    `c19` decimal(16, 2) NULL,
    `c20` decimal(16, 2) NULL,
    `c21` decimal(16, 2) NULL,
    `c22` decimal(16, 2) NULL,
    `c23` decimal(16, 2) NULL,
    `c24` decimal(16, 2) NULL,
    `c25` decimal(16, 2) NULL,
    `c26` decimal(16, 2) NULL,
    `c27` decimal(16, 2) NULL,
    `c28` decimal(16, 2) NULL,
    `c29` decimal(16, 2) NULL,
    `c30` decimal(16, 2) NULL
) ENGINE = OLAP DUPLICATE KEY(`ts`)
AUTO PARTITION BY RANGE (date_trunc(`ts`, 'HOUR'))()
DISTRIBUTED BY HASH(`ts`) BUCKETS 12
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"min_load_replica_num" = "-1",
"is_being_synced" = "false",
"storage_medium" = "hdd",
"storage_format" = "V2",
"inverted_index_storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false",
"group_commit_interval_ms" = "10000",
"group_commit_data_bytes" = "134217728"
);
```

- 生成测试数据

```python
import csv
from datetime import datetime, timedelta
import random
from decimal import Decimal, ROUND_HALF_UP

output_file = "data_metrics_180w.csv"

start_time = datetime(2025, 10, 15, 18, 0, 0)
num_rows = 2000000
num_columns = 30

with open(output_file, "w", newline="") as csvfile:
    writer = csv.writer(csvfile)

    for i in range(num_rows):
        # 每行递增1秒，毫秒保留.000
        ts = start_time + timedelta(seconds=i)
        ts_str = ts.strftime("%Y-%m-%d %H:%M:%S.%f")[:-3]  # 精确到毫秒
        row = [ts_str]

        # 生成30列随机 decimal(16,2)
        for _ in range(num_columns):
            val = Decimal(random.random() * 10000).quantize(Decimal("0.01"), rounding=ROUND_HALF_UP)
            row.append(val)

        writer.writerow(row)
```

- 导入

```shell
curl --location-trusted -u root:root \
  -H "format: csv" \
  -H "column_separator: ," \
  -T data_metrics_180w.csv \
  http://172,16.4.183:8030/api/test_db/data_metrics2/_stream_load
```