Skip to content
This repository was archived by the owner on Apr 17, 2026. It is now read-only.

Add Deep Learning VM (DLVM) images as a base image option for Caliban.#20

Open
sagravat wants to merge 14 commits into
google:mainfrom
sagravat:master
Open

Add Deep Learning VM (DLVM) images as a base image option for Caliban.#20
sagravat wants to merge 14 commits into
google:mainfrom
sagravat:master

Conversation

@sagravat

Copy link
Copy Markdown

This contribution enables Deep Learning VM (DLVM) images from the Google Cloud AI Platform to be used as base images for execution in local, cloud, shell, and notebook modes. The notebook mode includes a Jupyterlab extension widget that allows the user to directly submit the notebook for training on the AI Platform with configurable hardware accelerators like GPUs and TPUs.

@codecov

codecov Bot commented Jun 19, 2020

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 10.16949% with 53 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.97%. Comparing base (a912775) to head (3a417e9).
⚠️ Report is 72 commits behind head on main.

Files with missing lines Patch % Lines
caliban/docker.py 7.14% 39 Missing ⚠️
caliban/cloud/core.py 20.00% 4 Missing ⚠️
caliban/main.py 0.00% 4 Missing ⚠️
caliban/config.py 40.00% 3 Missing ⚠️
caliban/history/utils.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #20      +/-   ##
==========================================
- Coverage   45.67%   44.97%   -0.70%     
==========================================
  Files          17       17              
  Lines        2728     2777      +49     
==========================================
+ Hits         1246     1249       +3     
- Misses       1482     1528      +46     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sagravat sagravat changed the title Added Deep Learning VM (DLVM) images as a base image option for Caliban. Add Deep Learning VM (DLVM) images as a base image option for Caliban. Jun 20, 2020

@sritchie sritchie left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sagravat , this is great! I want to see if we can separate the logic here a bit more. Ideally, we would be able to reuse the existing run_interactive and run_notebook commands (I think that's the names), and make this PR into TWO changes.

The first is the --base_image support, and the second is the GCP notebook submitter.

The second is the GCP scheduler, which I think will work without the DLVM base images.

I'm having trouble parsing the new entrypoint that the second change requires. Is it really different than how we launch jupyterlab normally? If not, I wonder if we could actually enable this by making it easier for folks to install ANY jupyterlab extension.

Happy to chat more. Take a look and let me know if these notes make sense.

Comment thread caliban/cli.py
"--dlvm",
help="DLVM base image type. Must be one of "
"{}".format(dlvm_types) + ". If supplied, "
"Caliban will skip the build and push steps and use this image tag.")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey, I'm going to add comments as I read, so this may be clear later! Naively I would have assumed that the DLVM would be a base image, and that you could still install dependencies on top, right? If that is true, can we call this argument --base_image?

Right now, --image_id removes any requirement.txt installation; it's truly a flag to skip any build, including even getting your code into the image.

Really, the syntax should be caliban run e2a4af785bdb...

Comment thread caliban/cloud/core.py


def generate_image_tag(project_id, docker_args, dry_run: bool = False):
def generate_image_tag(project_id, docker_args, dlvm: str = None, dry_run: bool = False):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we call it --base_image we can add that key to the generate_docker_args function here: https://github.com/google/caliban/blob/master/caliban/cli.py#L536

Comment thread caliban/config.py
return _dlvm_config(job_mode).get(dlvm_arg)


def _dlvm_config(job_mode: JobMode) -> Dict[str, str]:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this is good. I think I see now why you have --dlvm vs --base_image. What we COULD do is have a prefix for --base_image, something like --base_image dlvm:pytorch, that would force a lookup here. That would let this feature give us general base images too.

Comment thread caliban/docker.py
return """
if not dlvm:
return """
RUN pip install {}{}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, wait, good catch that we need up date this pip too! I can do that separately.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this --lab flag. I'm thinking that we should only do this special installation if --lab is specified. If not, even with a dlvm base image, we should just install normal jupyter. wdyt?

Comment thread caliban/docker.py
""".format(library, version_suffix)
else:
return """
RUN /opt/conda/bin/pip install \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this only work on the deep learning VMs? and does it work on ALL the deep learning VMs?

If it works without the dlvms, maybe it needs its own flag.

Comment thread caliban/docker.py Outdated
if base_image_fn is None:
base_image_fn = base_image_id

base_image = base_image_fn(job_mode)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you instead pass

base_image_fn = lambda job_mode: _dlvm_id(job_mode, dlvm)

Then we can keep to the API.

Comment thread caliban/docker.py


def _notebook_entries(lab: bool = False, version: Optional[str] = None) -> str:
def _notebook_entries(lab: bool = False, version: Optional[str] = None, dlvm: bool = False) -> str:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this flag called scheduler instead? I think it doesn't depend on dlvm and COULD be a separate flag.

Comment thread caliban/docker.py
if inject_notebook.value != 'none':
install_lab = inject_notebook == NotebookInstall.lab
if dlvm is None:
dockerfile += _notebook_entries(lab=install_lab, version=jupyter_version, dlvm=False)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about remove the if/else and make the line

 dockerfile += _notebook_entries(lab=install_lab, version=jupyter_version, dlvm=bool(dlvm))

Comment thread caliban/docker.py

dockerfile += """

USER {uid}:{gid}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! Now that we're on python 3.6, we can actually make this

dockerfile += f"""
USER {uid}:{gid}
"""

using f-strings.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants