用户需求:
按用户需求进行组网部署,提供硬件驱动、正版操作系统、网络配置,DeepSeek671B部署、模型基本安装部署并测试验证。
操作流程:
参考
华为文档:https://support.huawei.com/enterprise/zh/doc/EDOC1100494820/a045793a
服务器部署:
部署操作系统
docker环境
Ubuntu 22.04
curl -fsSL https://repo.huaweicloud.com/docker-ce/linux/debian/gpg | sudo apt-key add -
apt-get install -y docker-ce
CTyunOS 22.06、openEuler 22.03 LTS、UOS20 1050e、CULinux 3.0、Kylin V10 SP2、Kylin V10 SP3、BC-Linux 21.10
yum install -y docker
昇腾NPU固件、驱动
检查显卡是否在线
lspci | grep d500
确认系统版本和内核版本与下表对应
uname -m && cat /etc/*release
uname -r
| host操作系统版本 | host操作系统架构 | 软件包默认的host操作系统内核版本 | gcc编译器版本 |
|---|---|---|---|
| Ubuntu 22.04 | aarch64 | 5.15.0-25-generic | 11.3.0 |
| openEuler 22.03 LTS | aarch64 | 5.10.0-60.18.0.50.oe2203.aarch64 | 10.3.1 |
| CTyunOS 22.06 | aarch64 | 4.19.90-2102.2.0.0066.ctl2.aarch64 | 7.3.0 |
| UOS20 1050e | aarch64 | 4.19.90-2211.5.0.0178.22.uel20.aarch64 | 7.3.0 |
| Kylin V10 SP2 | aarch64 | kernel-4.19.90-24.4.v2101.ky10.aarch64 | 7.3.0 |
| Kylin V10 SP3 | aarch64 | 4.19.90-52.22.v2207.ky10.aarch64 | 7.3.0 |
| CUlinux 3.0 | aarch64 | 5.10.0-60.67.0.104.ule3.aarch64 | 10.3.1 |
| BC-Linux 21.10 | aarch64 | 4.19.90-2107.6.0.0098.oe1.bclinux.aarch64 | 7.3.0 |
关闭内核自动更新
yum install yum-plugin-versionlock
yum versionlock add kernel-$(uname -r)
执行如下命令检查:
make -v
rpm -qa | grep dkms
rpm -qa | grep gcc
rpm -qa | grep kernel-devel-$(uname -r)
若回显相关软件包版本信息,表示已安装;
若未安装请执行yum install -y make dkms gcc kernel-devel-$(uname -r)命令安装依赖。
创建安装驱动用户
groupadd HwHiAiUser
useradd -g HwHiAiUser -d /home/HwHiAiUser -m HwHiAiUser -s /bin/bash
增加执行权限
chmod +x Ascend-hdk-xxx-npu-driver_<version>_linux-aarch64.run
chmod +x Ascend-hdk-xxx-npu-firmware_<version>.run
首次安装先安装驱动再安装固件
安装驱动
./Ascend-hdk-xxx-npu-driver_<version>_linux-aarch64.run --full --install-for-all
安装固件
./Ascend-hdk-xxx-npu-firmware_<version>.run --full
若系统出现如下关键回显信息,表示固件安装成功。
Firmware package installed successfully! Reboot now or after driver installation for the installation/upgrade to
验证状态
npu-smi info
下载大模型
方法一:使用python
pip install huggingface_hub -i https://pypi.tuna.tsinghua.edu.cn/simple
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="deepseek-ai/DeepSeek-R1-0528",
local_dir="/data/DeepSeek-R1",
local_dir_use_symlinks=False,
cache_dir="/data/cache"
)
from huggingface_hub import snapshot_download
import os
# 设置国内镜像(可选,如果下载速度慢)
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
snapshot_download(
repo_id="deepseek-ai/DeepSeek-R1-0528",
local_dir="/data/DeepSeek-R1",
local_dir_use_symlinks=False,
cache_dir="/data/cache"
)
方法二:使用git
yum install git-lfs -y
git lfs install
git clone https://huggingface.co/deepseek-ai/DeepSeek-R1 ./DeepSeek-R1-671B
#或者镜像源
git clone https://hf-mirror.com/deepseek-ai/DeepSeek-R1 ./DeepSeek-R1-671B
方法三:国内仓库
pip install modelscope -i https://pypi.tuna.tsinghua.edu.cn/simple
modelscope download --model deepseek-ai/DeepSeek-R1-0528 --local_dir ./DeepSeek-R1-671B
或者python下载
from modelscope import snapshot_download
model_dir = snapshot_download(
'deepseek-ai/DeepSeek-R1-0528',
local_dir="/data/DeepSeek-R1",
local_dir_use_symlinks=False,
cache_dir="/data/cache"
)
或者git下载
git lfs install
git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-R1-0528.git
容器环境:
下载镜像
下载镜像:mindie_1.0.RC2_aarch64.tar,此镜像包含CANN(Toolkit、kernels、NNAL)、MindIE、ChatGLM3-6B模型权重和推理脚本,可用于大模型推理。
链接:https://gpt-3.obs.cn-north-4.myhuaweicloud.com/ChatGLM3/mindie_1.0.RC2_aarch64.tar
官方sha256值:4e00c9eec37a5f266331358940409d1ba6e122cd80d1d4851ce3ac3b415be08b
加载镜像
docker load < mindie_1.0.RC2_aarch64.tar
这样用于验证测试,生产环境镜像根据系统、显卡、驱动版本选择
步骤
参考仓库地址:https://www.hiascend.com/developer/ascendhub/
启动容器
docker run -itd -u root --net=host --ipc=host --privileged --shm-size=500g --name=ds_r1 \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ \
-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
-v /usr/local/sbin/:/usr/local/sbin/ \
-v /var/log/npu/conf/slog/slog.conf:/var/log/npu/conf/slog/slog.conf \
-v /var/log/npu/slog/:/var/log/npu/slog \
-v /var/log/npu/profiling/:/var/log/npu/profiling \
-v /var/log/npu/dump/:/var/log/npu/dump \
-v /var/log/npu/:/usr/slog \
-v /data/:/data/ \
mindie镜像id \
/bin/bash
mindie:2.1.RC2-800I-A2-py311-openeuler24.03-lts
| 参数 | 意义 |
|---|---|
| /dev/davinci_X_ | 可以根据ll /dev/ |
| /dev/hisi_hdc | hdc相关管理设备 |
| /dev/davinci_manager | davinci相关的管理设备 |
| /dev/devmm_svm | 内存管理相关设备 |
进入设置变量
docker exec -it ds_r1 bash
source /usr/local/Ascend/mindie/set_env.sh
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
source /usr/local/Ascend/atb-models/set_env.sh
# 根据自己的服务器设置ranktable_file_path,machine_IP, master_IP
export RANK_TABLE_FILE="/usr/local/Ascend/mindie/latest/mindie-service/ranktable.json"
export MIES_CONTAINER_IP='x.x.x.x'
export MASTER_IP='x.x.x.x'
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export HCCL_HOST_SOCKET_PORT_RANGE=60000-60050
export HCCL_CONNECT_TIMEOUT=7200
export MASTER_PORT=7008
export MINDIE_LOG_TO_STDOUT=1
export ATB_LLM_ENABLE_AUTO_TRANSPOSE=0
vim /usr/local/Ascend/atb-models/atb_llm/layers/linear/linear.py +112
out = torch_npu.npu_quant_matmul(input_tensor_quant, self._weight.transpose(-2, -1), self._weight_scale, pertoken_scale=pertoken_scale, bias=None, output_dtype=torch.bfloat16)
提供CDN加速/云存储服务