# cri-virtplus **Repository Path**: iscas-system/cri-virtplus ## Basic Information - **Project Name**: cri-virtplus - **Description**: GPU、NPU/MLU、DCU、FPGA等专用智能处理单元管理。请联系吴恒,邮箱地址:wuheng@otcaix.iscas.ac.cn - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: refactoring - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-04-26 - **Last Updated**: 2026-04-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # share 扩展容器支持GPU、MLU、DCU算力。 ## 前置要求 glibc<2.34 ## 当前支持 | | Test Environment | Driver & Platform | Hardware | | ---- | ------------------------------------------------------------ | -------------------------------------- | --------- | | GPU | Ubuntu version <= 22.04
Kubernetes version 1.23.6
Docker version 20.04
x86 | driver 11.7
CUDAtoolkit 11.7 11.2 | Telsa T4 | | MLU | Ubuntu 18.04
Kubernetes version 1.23.6
Docker version 20.04
x86 | driver v5.10.22
CNtoolkit 3.7.2 | MLU370-S4 | ## roadmap - 1.0 - [X] support local GPU/DCU - [X] support local MLU - 2.0 - [ ] support RDMA ## 部署 - 前置要求 Kubernetes/k3s/cas、Docker/isulad GPU Driver、CUDAtoolkit Mlu Driver、CNtoolkit 安装参考: ./docs/env.md - 构建拦截库,复制到/etc/unishare/hooklib目录 `````` cd ./core mkdir build cd build #容器使用MLU cmake -DUSE_MLU=ON .. #容器使用GPU cmake -DUSE_GPU=ON .. #容器使用GPU、MLU cmake .. make cp libmylibrary.so /etc/unishare/hooklib `````` - 复制MLU、GPU库文件至/usr/local/accelerator `````` cd ./k8sPlugins/deploy //修改copy.sh中FILE变量,指向volume.conf的绝对路径 chmod +x copy.sh `````` - 修改uni-share.yaml文件 切换至根目录,修改./k8sPlugins/deploy/uni-share.yaml中的url和token 详情参考https://github.com/kubesys/client-go - 创建CRD `````` cd ./k8sPlugins/deploy kubectl apply -f crd.yaml kubectl apply -f mluCrd.yaml `````` - 构建镜像 `docker build -t uni-share:v1.0 --load . ` - 启动daemonset `````` cd ./k8sPlugins/deploy kubectl apply -f uni-share.yaml `````` ## pod模板示例 通过resources中的requests和limits指定pod的资源需求,requests和limits须保持一致 core对应GPU的利用率(0-100) memory对应显存,1点代表256Mi显存(不应分配超过一张卡显存) `````` apiVersion: v1 kind: Pod metadata: name: testpod-10.2 spec: restartPolicy: Never containers: - name: cuda102 image: testgpuimg:cuda10.2 imagePullPolicy: Never command: ["sleep"] args: ["infinity"] resources: requests: iscas.cn/gpu-core: 30 #iscas.cn/mlu-core: 30 iscas.cn/gpu-memory: 20 #iscas.cn/mlu-memory: 20 limits: iscas.cn/gpu-core: 30 #iscas.cn/mlu-core: 30 iscas.cn/gpu-memory: 20 #iscas.cn/mlu-memory: 20 `````` 镜像参考: https://hub.docker.com/r/pytorch/pytorch https://sdk.cambricon.com/static/PyTorch/MLU370_1.9_v1.17.0_X86_ubuntu18.04_python3.6_docker/ 任务测试参考: ./docs/test.md