Add AgentOps project type and vector search data preparation workflows#211
Open
veenaramesh wants to merge 10 commits intomainfrom
Open
Add AgentOps project type and vector search data preparation workflows#211veenaramesh wants to merge 10 commits intomainfrom
veenaramesh wants to merge 10 commits intomainfrom
Conversation
Contributor
arpitjasa-db
left a comment
There was a problem hiding this comment.
@veenaramesh can we update the tests to pass? https://github.com/databricks/mlops-stacks/actions/runs/19515122285/job/55864800288?pr=211
Let's also look to add a bit of coverage for this new flow as well
…ks to exclude test
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR adds a new project type "AgentOps" to the existing MLOps Stacks template. Users can now select between two project types when initializing a stack:
This PR also adds the vector search data ingestion pipeline for the AgentOps projects.
Features
AgentOps Template Updates
1. Project type selection
Added
input_project_typeparameter todatabricks_template_schema.jsonmlops(default) oragentopsUpdated minimum Databricks CLI version to
v0.266.0to support new featuresDefault project name now reflects selected project type:
my_{{ .input_project_type }}_projectOther changes:
input_project_typeas order 1input_include_models_in_unity_catalogskipped for agentops)2. Updating project structure layout
Added conditional logic to generate appropriate project structure based on
input_project_typetoupdate_layout.tmplAdded conditional logic to certain files:
Separate code structure sections for MLOps vs AgentOps, which conditionally renders based on
input_project_typerequirements.txt.tmplREADME.md.tmpldatabricks.yml.tmplAll CI/CD pipelines (more on this later)
3. Updating CI/CD workflows
Extended CI/CD pipelines to handle AgentOps projects and test the correct workflows:
.github/workflows/{{.input_project_name}}-run-tests.yml.tmpl).azure/devops-pipelines/{{.input_project_name}}-tests-ci.yml.tmpl).gitlab/pipelines/{{.input_project_name}}-bundle-ci.yml.tmpl)Data preparation with vector search for AgentOps
1. Data preparation code
DataIngestion.py.tmplfetch_data.py.tmplfor retrievalDataPreprocessing.py.tmplcreate_chunk.py.tmplfor chunking logicconfig.py.tmplVectorSearch.py.tmplvector_search_utils.py.tmplfor management + waiting for endpoint to be ready2. Workflow resource configuration
data-preparation-resource.yml.tmpl, which includes each notebook as a separate task (sequential execution)databricks.yml.tmpl3. Defined variables in
databricks.ymldatabricks_staging_workspace_host)input_schema_nameWhat I have tested:
mlopsandagentopsproject types