Cloud Studio deploys the Qwen2.5-1.5B-Instruct large model through vllm.

Cloud Studio Deploying Qwen2.5-1.5B-Instruct Large Model via vllm#

ps: Originally, I intended to deploy the qwq:32b large model, but even with the advanced version of the hai server, it was unsuccessful (due to insufficient VRAM and memory, all attempts ended in failure). In the end, I had to choose to deploy the wen2.5-1.5B-Instruct model for testing.

Here we use the high-performance space of cloud studio, basic version.

First, install vllm

python -m pip install --upgrade pip

9bRvVs
The waiting time may be a bit longer, please be patient.
Install vllm pip install vllm
by4zzx

After successful installation, we enter the command vllm for a simple test to see if the installation is normal.
MEUVDL
~~Next, we install pip install modelscope~~

r31sE0

Install pip install openai
3iF9rX

Install pip install tqdm and pip install transformers

The section of the dividing line can be ignored, no need to run it.
----------------- Dividing Line Start -----------------

1. Create a tmp folder in the current directory mkdir tmp, or create it directly.
Create model_download_32b.py with the following code:

from modelscope import snapshot_download

model_dir = snapshot_download('Qwen/QwQ-32B', cache_dir='./tmp', revision='master')

~~2. Run model_download_32b.py, it will download the qwq32b model. Since my machine resources are in Singapore, the speed is relatively slow.~~

python model_download_32b.py

9zc8Oi

Need to wait a bit, U_U ~~
----------------- Dividing Line End -----------------

Due to the machine's resources being in the Singapore data center, accessing domestic models from the Magic Tower community is relatively slow.
You can use git lfs clone to use hf's model files, which will be faster.

lfs installation guide

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash
apt-get install git-lfs

git lfs clone https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct

This is to pull the Qwen2.5-1.5B-Instruct model from hf.
Due to VRAM and memory limitations, I tested multiple models and all declared failures, so I chose the Qwen2.5-1.5B-Instruct large model for testing. This blog post is about a machine deployed using hai. Unfortunately, the previous models were not supported, which was a waste of effort.
Okay, let's continue and wait for the pull to complete.

Create a server compatible with the OpenAI API interface.
vllm specific usage can be found in the official documentation

python -m vllm.entrypoints.openai.api_server \
  --model ./Qwen2.5-1.5B-Instruct \
  --served-model-name Qwen2.5-1.5B \
  --max-model-len=2048 \
  --dtype=half

by4zzx
Seeing this interface means the deployment was successful.
bhAZ1f
We open the URL https://ohaxxx.ap-singapore.cloudstudio.work/proxy/8000/version to check if it can be accessed normally.
aGMb7G
Next, we will configure the client, the usual process.
No matter how it's configured, it doesn't work, we can only try to bypass it.
Old command ssh srv.us -R 1:localhost:8000
If there is an error, create a key according to the prompt.
ssh-keygen -t ed25519
Just press enter for all defaults, then rerun the command ssh srv.us -R 1:localhost:8000.
lkTLIn

Configure the client chatx.

Make sure the URL does not end with a slash.

Test in the dialog box.
9tH8RJ

Finally, the deployment of the large model using vllm is complete. This can be considered a general deployment tutorial, as long as the memory and VRAM are sufficient, it theoretically supports deploying any large model on hf. The tutorial ends here.

After nearly 6 hours, using hai for almost 3 hours. About 3.5 per hour... The key is that it was finally written, and I did not use the custom machine of hai, encountering countless pitfalls, and finally finished.
U_U ~_~ D_D
Final shout, can you reimburse me for the costs incurred using hai!!!

Can you reimburse me for the costs incurred using hai!!!

Reimburse the costs!!!