Deprecating the requirements.txt for dependencies management#116
Deprecating the requirements.txt for dependencies management#116anhuong merged 1 commit intofoundation-model-stack:mainfrom tedhtchang:deprecate-requirements
Conversation
fabianlim
left a comment
There was a problem hiding this comment.
LGTM just one small comment on the test workflow.
| RUN --mount=type=cache,target=/root/.cache/pip \ | ||
| python -m pip install --upgrade pip && \ | ||
| python -m pip install wheel && \ | ||
| python -m pip install "$(head bdist_name)[aim]" && \ | ||
| python -m pip install "$(head bdist_name)[flash-attn]" && \ | ||
| # Clean up the wheel module. It's only needed by flash-attn install | ||
| python -m pip uninstall wheel -y && \ | ||
| # Cleanup the bdist whl file | ||
| rm $(head bdist_name) /tmp/bdist_name |
There was a problem hiding this comment.
@anhuong Please review
I removed the [dev] optional dependencies.
The only package needed to install flash-attn is wheel.
The others packages in the [dev] are either installed(ie packaging, ninja) in build time or not needed(i.e sckit-learn).
This temp cache mount --mount=type=cache,target=/root/.cache/pip also reduced 3.1GB image size. 15.9G vs 12.8GB
There was a problem hiding this comment.
Changes generally look good to me but when building the image i got error:
Dockerfile:21
--------------------
20 | python -m pip install build
21 | >>> RUN --mount=source=.git,target=.git,type=bind \
22 | >>> --mount=type=bind,readwrite,source=tuning,target=tuning \
23 | >>> --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
24 | >>> python -m build --wheel --outdir /tmp && \
25 | >>> ls /tmp/*.whl >/tmp/bdist_name
26 |
--------------------
ERROR: failed to solve: failed to compute cache key: failed to calculate checksum of ref ucblo3xzuieskt2hd3dy0vnwp::5bhdpz76l1ss8oq6chind70tf: "/tuning": not found
There was a problem hiding this comment.
I have not seen this before. Here is my output look like:
(fms-hf-tuning) root@cataract1:~/fms-hf-tuning# docker build -t fms-hf-tuning:dev . -f build/Dockerfile
[+] Building 212.9s (27/27) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 5.32kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for registry.access.redhat.com/ubi9/ubi:latest 0.4s
=> [internal] load metadata for registry.access.redhat.com/ubi9/python-311:latest 0.3s
=> [internal] load build context 0.2s
=> => transferring context: 10.28MB 0.2s
=> [release 1/18] FROM registry.access.redhat.com/ubi9/ubi@sha256:66233eebd72bb5baa25190d4f55e1dc3fff3a9b77186c1f91a0abdb274452072 0.0s
=> [build 1/3] FROM registry.access.redhat.com/ubi9/python-311@sha256:78aa5022cb9e28652b5923162fee2e6e6192df84166abb5c7438910f70f9d02e 0.0s
=> CACHED [build 2/3] RUN --mount=type=cache,target=/root/.cache/pip python -m pip install --upgrade pip && python -m pip install build 0.0s
=> [build 3/3] RUN --mount=source=.git,target=.git,type=bind --mount=type=bind,readwrite,source=tuning,target=tuning --mount=type=bind,source=pyproject.toml,target=pyproject.toml python -m build --whee 7.7s
=> CACHED [release 2/18] RUN dnf remove -y --disableplugin=subscription-manager subscription-manager python3.11-requests && dnf install -y make procps && dnf clean all 0.0s
=> CACHED [release 3/18] RUN dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo && dnf install -y cuda-cudart-11-8-11.8.89-1 0.0s
=> CACHED [release 4/18] RUN dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo && dnf install -y cuda-libraries-11-8-11.8.0-1 0.0s
=> CACHED [release 5/18] RUN dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo && dnf install -y cuda-command-line-tools-11-8-1 0.0s
=> CACHED [release 6/18] RUN dnf install -y python3.11 git && ln -s /usr/bin/python3.11 /bin/python && python -m ensurepip --upgrade && dnf clean all 0.0s
=> CACHED [release 7/18] WORKDIR /tmp 0.0s
=> [release 8/18] COPY --from=build /tmp/*.whl /tmp/bdist_name /tmp/ 0.1s
=> [release 9/18] RUN --mount=type=cache,target=/root/.cache/pip python -m pip install --upgrade pip && python -m pip install wheel && python -m pip install "$(head bdist_name)[aim]" && python - 166.5s
=> [release 10/18] RUN mkdir -p /licenses 0.5s
=> [release 11/18] COPY LICENSE /licenses/ 0.1s
=> [release 12/18] RUN mkdir /app 0.3s
=> [release 13/18] COPY build/launch_training.py build/accelerate_launch.py fixtures/accelerate_fsdp_defaults.yaml /app/ 0.1s
=> [release 14/18] COPY build/utils.py /app/build/ 0.1s
=> [release 15/18] RUN chmod +x /app/launch_training.py /app/accelerate_launch.py 0.3s
=> [release 16/18] RUN touch /.aim_profile && chmod -R 777 /.aim_profile && mkdir /.cache && chmod -R 777 /.cache 0.5s
=> [release 17/18] RUN useradd -u 1000 tuning -m -g 0 --system && chown -R tuning:0 /app /tmp && chmod -R g+rwX /app /tmp 0.6s
=> [release 18/18] WORKDIR /app 0.1s
=> exporting to image 34.3s
=> => exporting layers 34.3s
=> => writing image sha256:fae12f2d869b0eead3e2bfa05860e84317525404e643291c214c580c1740b37d 0.0s
=> => naming to docker.io/library/fms-hf-tuning:dev 0.0s
(fms-hf-tuning) root@cataract1:~/fms-hf-tuning#
There was a problem hiding this comment.
interesting, the error line for me is --mount=type=bind,readwrite,source=tuning,target=tuning, will look into it.
Also my bad, I think the image should not be building the wheel but pulling the wheel from pypi
There was a problem hiding this comment.
the error line for me is --mount=type=bind,readwrite,source=tuning,target=tuning
In stead of mounting could you try
COPY tuning tuning
Also my bad, I think the image should not be building the wheel but pulling the wheel from pypi
I think the idea of building whl in the image is to detect any changes in the repo which may break the image build early on. Similar to this
|
Hi @tedhtchang Built the image: Push the image to the local OpenShift image repo: Then I run ted's test script pointing to the local image registry And it looks good: |
Signed-off-by: ted chang <htchang@us.ibm.com>
…odel-stack#116) Signed-off-by: ted chang <htchang@us.ibm.com> Signed-off-by: aaron.chew1 <aaron.chew1@ibm.com>
Description of the change
Remove the python requirement.txt files and use only pyproject.toml to manage dependencies. We modify dependencies and optional-dependencies in same location.
Add build wheel and build image tests, etc
Related issue number
Closes #56
Closes #756
How to verify the PR
git clone --single-branch --branch deprecate-requirements https://github.com/tedhtchang/fms-hf-tuning.git cd fms-hf-tuningOr
3. Run e2e to test the SFTtrainer image. This command create kind cluster, install KFTO, Kueue, build whl and image, and create a pytorchjob workload using the built sft-trainer image.
Was the PR tested