Skip to content

Support for extracting and seeding cache mounts. #6900

Description

@phr34k

Description

I have project that is composed out of several micro services uses a docker file with multistage builds to simulate a a limited production environment. The project I'm referencing uses rust, and is slow to build, and uses a package manager (e.g. cargo similar to pip, node and others) so a single added dependency causes layer invalidation and time consuming stuff happens.

For this reason, I rely on cache mounts supports, on local development machines to turn a 10-14 minute process per micro service into 30 seconds process. See example below:

RUN --mount=type=cache,id=sccache,target=/sccache,sharing=locked \
    --mount=type=cache,id=registry,target=/usr/local/cargo/registry,sharing=locked \
    --mount=type=cache,id=git,target=/usr/local/cargo/git,sharing=locked \

The problem I'm faced with is that docker doesn't really support any way interacting with these caches in particular to extract or to seed them from continuous integration caches. There is a third party project 'buildkit cache dance' that seems to cater to this, but in my experience it's a bit brittle and difficult to get setup, so I would like to request official built-in support.

I know this feature has been requested in the past, and mostly been ignored/denied because there's already support for --cache-from and --cache-to or arguments that resources would be fetched over internet regardless, but I'd like to make some reasonable arguments advocating in reconsideration.

  • cache-from, cache-to have limited usability in this context and aren't particularly useful, a manifest addition invalidates the layer in full and it triggers a full 10+ minute compile cycle fetching external dependencies (git) and compilation/compatibility checks.
  • cache mounts aren't included in the layers so (re-)storing the layers from/on the repository has no positive upside, there's no way reuse the packages that were already handled in a previous run except when using cache mounts.
  • software often have 'shared dependencies' (at least in our case), same language often means 70-80% of same dependencies and versions that are shared, the sharing of cache mounts reduces compilation times of additional services and turns 5 services with 10-20 minutes each ~= 60 minutes compresses to approximately 3 minutes is pretty significant.
  • ci caches are integrated with the overall ci pipelines and one important aspect is to be able to clear caches, so I'd rather store/restore from something that is well integrated the rest of the system, rather than having to deal with 'work-around'
  • network locality is important to consider, while a ci caches can be considered on-par with other network resources like S3 buckets, or external package managers the ci caches are in practice faster and not subject to ingress/egress as they are colocated.
  • just using S3 bucket for sccache wouldn't reduce network fetches i.e. git repository checkout, or even apt-get package fetches and just storing these in ci caches have offered quite some significant reducation in building time.

I would kindly like to request if there could be some consideration, maybe have support to mount caches directly into the file system so they could just simply be copied in/out of the ci caches, or maybe a docker command to export/import a folder as cache from the file system.

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions