Prerequisites
Feature Description
The llama app command is really exciting and its neat to have the ability to run with different models.
It would be great to publish Dockerfiles for this as you all do for server, cli, full, etc.
Motivation
Dockerfiles are a great package tool for Kubernetes and provide reproducible environments.
docker run -p 8080:8080 -v ~/.cache/huggingface/hub:/root/.cache/huggingface/hub llama-app serve --host 0.0.0.0
Possible Implementation
I tested this out with .devops/cpu.Dockerfile.
### App, unified binary with all subcommands
FROM base AS app
COPY --from=build /app/full/llama /app
WORKDIR /app
ENTRYPOINT [ "/app/llama" ]
I then was able to run docker run -p 8080:8080 -v ~/.cache/huggingface/hub:/root/.cache/huggingface/hub llama-app serve --host 0.0.0.0 and have this running in a container while I switch to different models.
If this interests the community, I'm happy to add support for this to all our Dockerfiles and update the docker docs to mention this approach.
Prerequisites
Feature Description
The llama app command is really exciting and its neat to have the ability to run with different models.
It would be great to publish Dockerfiles for this as you all do for server, cli, full, etc.
Motivation
Dockerfiles are a great package tool for Kubernetes and provide reproducible environments.
Possible Implementation
I tested this out with .devops/cpu.Dockerfile.
I then was able to run
docker run -p 8080:8080 -v ~/.cache/huggingface/hub:/root/.cache/huggingface/hub llama-app serve --host 0.0.0.0and have this running in a container while I switch to different models.If this interests the community, I'm happy to add support for this to all our Dockerfiles and update the docker docs to mention this approach.