docker
title: Docker description: Install Cortex using Docker.
🚧 Cortex.cpp is currently in development. The documentation describes the intended functionality, which may not yet be fully implemented.
Setting Up Cortex with Docker
This guide walks you through the setup and running of Cortex using Docker.
Prerequisites
- Docker or Docker Desktop
nvidia-container-toolkit
(for GPU support)
Setup Instructions
-
Clone the Cortex Repository
git clone https://github.com/janhq/cortex.cpp.gitcd cortex.cppgit submodule update --init -
Build the Docker Image
- To use the latest versions of
cortex.cpp
andcortex.llamacpp
:docker build -t cortex --build-arg CORTEX_CPP_VERSION=$(git rev-parse HEAD) -f docker/Dockerfile . - To specify versions:
docker build --build-arg CORTEX_LLAMACPP_VERSION=0.1.34 --build-arg CORTEX_CPP_VERSION=$(git rev-parse HEAD) -t cortex -f docker/Dockerfile .
- To use the latest versions of
-
Run the Docker Container
- Create a Docker volume to store models and data:
docker volume create cortex_data
- Run in CPU mode:
docker run -it -d --name cortex -v cortex_data:/root/cortexcpp -p 39281:39281 cortex
- Run in GPU mode (requires
nvidia-docker
):docker run --gpus all -it -d --name cortex -v cortex_data:/root/cortexcpp -p 39281:39281 cortex
- Create a Docker volume to store models and data:
-
Check Logs (Optional)
docker logs cortex -
Access the Cortex Documentation API
- Open http://localhost:39281 in your browser.
-
Access the Container and Try Cortex CLI
docker exec -it cortex bashcortex --help
Usage
With Docker running, you can use the following commands to interact with Cortex. Ensure the container is running and curl
is installed on your machine.
1. List Available Engines
curl --request GET --url http://localhost:39281/v1/engines --header "Content-Type: application/json"
- Example Response
{"data": [{"description": "This extension enables chat completion API calls using the Onnx engine","format": "ONNX","name": "onnxruntime","status": "Incompatible"},{"description": "This extension enables chat completion API calls using the LlamaCPP engine","format": "GGUF","name": "llama-cpp","status": "Ready","variant": "linux-amd64-avx2","version": "0.1.37"}],"object": "list","result": "OK"}
2. Pull Models from Hugging Face
-
Open a terminal and run
websocat ws://localhost:39281/events
to capture download events, follow this instruction to installwebsocat
. -
In another terminal, pull models using the commands below.
# Pull model from Cortex's Hugging Face hubcurl --request POST --url http://localhost:39281/v1/models/pull --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}'# Pull model directly from a URLcurl --request POST --url http://localhost:39281/v1/models/pull --header 'Content-Type: application/json' --data '{"model": "https://huggingface.co/afrideva/zephyr-smol_llama-100m-sft-full-GGUF/blob/main/zephyr-smol_llama-100m-sft-full.q2_k.gguf"}'
3. Start a Model and Send an Inference Request
-
Start the model:
curl --request POST --url http://localhost:39281/v1/models/start --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}' -
Send an inference request:
curl --request POST --url http://localhost:39281/v1/chat/completions --header 'Content-Type: application/json' --data '{"frequency_penalty": 0.2,"max_tokens": 4096,"messages": [{"content": "Tell me a joke", "role": "user"}],"model": "tinyllama:gguf","presence_penalty": 0.6,"stop": ["End"],"stream": true,"temperature": 0.8,"top_p": 0.95}'
4. Stop a Model
- To stop a running model, use:
curl --request POST --url http://localhost:39281/v1/models/stop --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}'