环境
机器型号: Atlas 800I A2
架构: aarch64
显存大小:910B4 8x32G
注意:Triton Ascend为正式版,非网上其他教程常见dev版本
环境部署
驱动固件部署
关闭内核自动更新
yum install yum-plugin-versionlock -y
yum versionlock add kernel-$(uname -r)
执行如下命令检查:
make -v
rpm -qa | grep dkms
rpm -qa | grep gcc
rpm -qa | grep kernel-devel-$(uname -r)
若回显相关软件包版本信息,表示已安装;
若未安装请执行yum install -y make dkms gcc kernel-devel-$(uname -r)命令安装依赖。
创建安装驱动用户
groupadd HwHiAiUser
useradd -g HwHiAiUser -d /home/HwHiAiUser -m HwHiAiUser -s /bin/bash
增加执行权限
chmod +x Ascend-hdk-xxx-npu-driver_<version>_linux-aarch64.run
chmod +x Ascend-hdk-xxx-npu-firmware_<version>.run
装驱动
./Ascend-hdk-xxx-npu-driver_<version>_linux-aarch64.run --full --install-for-all
安装固件
./Ascend-hdk-xxx-npu-firmware_<version>.run --full
若系统出现如下关键回显信息,表示固件安装成功。
Firmware package installed successfully! Reboot now or after driver installation for the installation/upgrade to
验证状态
npu-smi info
echo -e "\n固件版本"
for i in {0..7}; do npu-smi info -t board -i $i | grep Firmware; done
echo -e "\n驱动版本"
for i in {0..7}; do npu-smi info -t board -i $i | grep Software; done
docker编译安装
关闭selinux
sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
下载文件
wget https://mirrors.huaweicloud.com/docker-ce/linux/static/stable/aarch64/docker-28.0.0.tgz
tar xzvf docker-28.0.0.tgz
sudo mv docker/* /usr/local/bin/
sudo chmod +x /usr/local/bin/docker*
mkdir /etc/docker/
创建system服务
cat >> /etc/systemd/system/docker.service <<EOF
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/local/bin/dockerd --default-ulimit nofile=65535:65535
ExecReload=/bin/kill -s HUP $MAINPID
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
#TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
# restart the docker process if it exits prematurely
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
[Install]
WantedBy=multi-user.target
EOF
docker runtime安装
wget https://gitcode.com/Ascend/mind-cluster/releases/download/v7.3.0/Ascend-docker-runtime_7.3.0_linux-aarch64.run
chmod u+x Ascend-docker-runtime_{version}_linux-{arch}.run
./Ascend-docker-runtime_{version}_linux-{arch}.run --install
回显示例如下,表示安装成功。
Uncompressing ascend-docker-runtime 100%
[INFO]: installing ascend-docker-runtime
...
[INFO] ascend-docker-runtime install success
使它生效
systemctl daemon-reload
systemctl start docker
systemctl enable docker
修改docker配置文件,主要修改bip、docker存储位置、镜像源
# 创建正确的配置文件,以下内容和实际不同,仅作参考
vi /etc/docker/daemon.json
{
"bip":"172.31.255.1/24",
"data-root": "/data/docker",
"registry-mirrors": [
"https://registry.docker-cn.com",
"http://hub-mirror.c.163.com",
"https://dockerhub.azk8s.cn",
"https://mirror.ccs.tencentyun.com",
"https://registry.cn-hangzhou.aliyuncs.com",
"https://docker.mirrors.ustc.edu.cn",
"https://docker.m.daocloud.io"
]
}
systemctl daemon-reload
systemctl restart docker
docker-compose安装
wget https://github.com/docker/compose/releases/download/v2.40.3/docker-compose-linux-aarch64
mv docker-compose-linux-aarch64 docker-compose
cp ./docker-compose /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
docker-compose -v
模型部署
模型下载跳过,参考 https://modelscope.cn/models/Qwen/Qwen3-Next-80B-A3B-Instruct
预设模型存放位置/data/models
预设运行目录/data/ai/qwen3-next
docker run
mkdir /data/ai/qwen3-next/file -p
cd /data/ai/qwen3-next
vi docker_run
docker run -d --name Qwen3-Next-80B-A3B-Instruct \
--shm-size=500g \
--device /dev/davinci0 \
--device /dev/davinci1 \
--device /dev/davinci2 \
--device /dev/davinci3 \
--device /dev/davinci4 \
--device /dev/davinci5 \
--device /dev/davinci6 \
--device /dev/davinci7 \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /data/models/:/data/models/ \
-v /root/.cache:/root/.cache \
-v /data/ai/qwen3-next/file:/workspace/file \
-p 8002:8002 \
-w /workspace \
quay.io/ascend/vllm-ascend:v0.14.0rc1 \
/bin/bash -c "cd /workspace/file && chmod +x run.sh && ./run.sh"
启动命令和编译器
cd /data/ai/qwen3-next/file
wget https://files.pythonhosted.org/packages/09/01/dafaeccae2cffddd7cf0c7abdeec650b430ecda8fb6ca349d29fea118a6f/triton_ascend-3.2.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
wget https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/Ascend-BiSheng-toolkit_aarch64.run
vi run.sh
chmod +x /workspace/file/Ascend-BiSheng-toolkit_aarch64.run
/workspace/file/Ascend-BiSheng-toolkit_aarch64.run --install
pip install /workspace/file/triton_ascend-3.2.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
source /usr/local/Ascend/8.3.RC1/bisheng_toolkit/set_env.sh
export CPU_AFFINITY_CONF=1
export HCCL_OP_EXPANSION_MODE="AIV"
export ATB_LLM_HCCL_ENABLE=1
export ATB_OPERATION_EXECUTE_ASYNC=2
export OMP_NUM_THREADS=10
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export LD_PRELOAD=/usr/local/Ascend/cann-8.5.0/aarch64-linux/lib64/libjemalloc.so:$LD_PRELOAD
vllm serve /data/models/Qwen3-Next-80B-A3B-Instruct --port 8002 --tensor-parallel-size 8
--max-model-len 8192
--gpu-memory-utilization 0.85
--max-num-batched-tokens 8192
--compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'
--served-model-name qwen3-next
运行模型
cd /data/ai/qwen3-next
source docker_run
欢迎交流 admin@supome.cn
未经授权禁止转载
提供CDN加速/云存储服务