# cri-virtplus
**Repository Path**: iscas-system/cri-virtplus
## Basic Information
- **Project Name**: cri-virtplus
- **Description**: GPU、NPU/MLU、DCU、FPGA等专用智能处理单元管理。请联系吴恒,邮箱地址:wuheng@otcaix.iscas.ac.cn
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: refactoring
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-04-26
- **Last Updated**: 2026-04-07
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# share
扩展容器支持GPU、MLU、DCU算力。
## 前置要求
glibc<2.34
## 当前支持
| | Test Environment | Driver & Platform | Hardware |
| ---- | ------------------------------------------------------------ | -------------------------------------- | --------- |
| GPU | Ubuntu version <= 22.04
Kubernetes version 1.23.6
Docker version 20.04
x86 | driver 11.7
CUDAtoolkit 11.7 11.2 | Telsa T4 |
| MLU | Ubuntu 18.04
Kubernetes version 1.23.6
Docker version 20.04
x86 | driver v5.10.22
CNtoolkit 3.7.2 | MLU370-S4 |
## roadmap
- 1.0
- [X] support local GPU/DCU
- [X] support local MLU
- 2.0
- [ ] support RDMA
## 部署
- 前置要求
Kubernetes/k3s/cas、Docker/isulad
GPU Driver、CUDAtoolkit
Mlu Driver、CNtoolkit
安装参考: ./docs/env.md
- 构建拦截库,复制到/etc/unishare/hooklib目录
``````
cd ./core
mkdir build
cd build
#容器使用MLU
cmake -DUSE_MLU=ON ..
#容器使用GPU
cmake -DUSE_GPU=ON ..
#容器使用GPU、MLU
cmake ..
make
cp libmylibrary.so /etc/unishare/hooklib
``````
- 复制MLU、GPU库文件至/usr/local/accelerator
``````
cd ./k8sPlugins/deploy
//修改copy.sh中FILE变量,指向volume.conf的绝对路径
chmod +x copy.sh
``````
- 修改uni-share.yaml文件
切换至根目录,修改./k8sPlugins/deploy/uni-share.yaml中的url和token
详情参考https://github.com/kubesys/client-go
- 创建CRD
``````
cd ./k8sPlugins/deploy
kubectl apply -f crd.yaml
kubectl apply -f mluCrd.yaml
``````
- 构建镜像
`docker build -t uni-share:v1.0 --load . `
- 启动daemonset
``````
cd ./k8sPlugins/deploy
kubectl apply -f uni-share.yaml
``````
## pod模板示例
通过resources中的requests和limits指定pod的资源需求,requests和limits须保持一致
core对应GPU的利用率(0-100)
memory对应显存,1点代表256Mi显存(不应分配超过一张卡显存)
``````
apiVersion: v1
kind: Pod
metadata:
name: testpod-10.2
spec:
restartPolicy: Never
containers:
- name: cuda102
image: testgpuimg:cuda10.2
imagePullPolicy: Never
command: ["sleep"]
args: ["infinity"]
resources:
requests:
iscas.cn/gpu-core: 30
#iscas.cn/mlu-core: 30
iscas.cn/gpu-memory: 20
#iscas.cn/mlu-memory: 20
limits:
iscas.cn/gpu-core: 30
#iscas.cn/mlu-core: 30
iscas.cn/gpu-memory: 20
#iscas.cn/mlu-memory: 20
``````
镜像参考:
https://hub.docker.com/r/pytorch/pytorch
https://sdk.cambricon.com/static/PyTorch/MLU370_1.9_v1.17.0_X86_ubuntu18.04_python3.6_docker/
任务测试参考:
./docs/test.md