This guide covers tools and strategies for managing large repositories, including Git LFS, submodules, and modern tools like Nx/Turbo.
A monorepo (monolithic repository) contains multiple projects in a single repository, allowing for:
- Shared code and dependencies
- Atomic changes across projects
- Simplified dependency management
- Consistent tooling and processes
Git LFS is ideal for:
- Large binary files (images, videos, datasets)
- Model files and ML artifacts
- Design assets and prototypes
- Build artifacts and releases
# Install Git LFS
git lfs install
# Track file types
git lfs track "*.psd"
git lfs track "*.mov"
git lfs track "*.zip"
# Track specific files
git lfs track "models/*.pkl"
git lfs track "datasets/*.csv"# Images
*.png filter=lfs diff=lfs merge=lfs -text
*.jpg filter=lfs diff=lfs merge=lfs -text
*.jpeg filter=lfs diff=lfs merge=lfs -text
*.gif filter=lfs diff=lfs merge=lfs -text
# Videos
*.mp4 filter=lfs diff=lfs merge=lfs -text
*.mov filter=lfs diff=lfs merge=lfs -text
# Archives
*.zip filter=lfs diff=lfs merge=lfs -text
*.tar.gz filter=lfs diff=lfs merge=lfs -text
# Data files
*.csv filter=lfs diff=lfs merge=lfs -text
*.json filter=lfs diff=lfs merge=lfs -text- Selective Tracking: Only track files that change frequently
- Compression: Use compressed formats when possible
- Cleanup: Regularly clean up old LFS objects
- Bandwidth: Monitor LFS bandwidth usage
# Add a submodule
git submodule add https://github.com/user/library.git libs/library
# Clone with submodules
git clone --recursive https://github.com/user/project.git
# Update submodules
git submodule update --init --recursive# Update all submodules to latest
git submodule foreach git pull origin main
# Check submodule status
git submodule status
# Remove a submodule
git submodule deinit -f libs/library
git rm libs/library
rm -rf .git/modules/libs/libraryPinned Versions:
# Pin to specific commit
cd libs/library
git checkout v1.2.3
cd ../..
git add libs/library
git commit -m "Pin library to v1.2.3"Branch Tracking:
# Track a branch
git submodule set-branch --branch develop libs/librarySetup:
npx create-nx-workspace@latest myorg --preset=emptyProject Structure:
apps/
├── api/
├── web/
└── mobile/
libs/
├── ui/
├── data-access/
└── utils/
Task Running:
# Run tests for affected projects
nx affected:test
# Build all projects
nx run-many --target=build --all
# Graph dependencies
nx graphConfiguration (nx.json):
{
"npmScope": "myorg",
"affected": {
"defaultBase": "main"
},
"tasksRunnerOptions": {
"default": {
"runner": "nx/tasks-runners/default",
"options": {
"cacheableOperations": ["build", "lint", "test", "e2e"]
}
}
}
}Setup:
npx create-turbo@latest my-projectConfiguration (turbo.json):
{
"$schema": "https://turbo.build/schema.json",
"globalDependencies": ["**/.env.*local"],
"tasks": {
"build": {
"dependsOn": ["^build"],
"outputs": ["dist/**", ".next/**"]
},
"lint": {},
"test": {},
"dev": {
"cache": false,
"persistent": true
}
}
}Pipeline Configuration:
{
"build": {
"dependsOn": ["^build"],
"outputs": ["dist/**"],
"env": ["NODE_ENV"]
},
"test": {
"dependsOn": ["build"],
"inputs": ["src/**/*.tsx", "src/**/*.ts", "test/**/*.ts"],
"outputs": ["coverage/**"]
}
}monorepo/
├── apps/ # Applications
│ ├── web/
│ ├── api/
│ └── mobile/
├── packages/ # Shared packages
│ ├── ui/
│ ├── utils/
│ └── config/
├── tools/ # Build tools and scripts
├── docs/ # Documentation
└── .github/ # GitHub configuration
Internal Dependencies:
// packages/ui/package.json
{
"name": "@myorg/ui",
"version": "1.0.0",
"main": "dist/index.js",
"dependencies": {
"@myorg/utils": "workspace:*"
}
}External Dependencies:
- Use workspace protocols for local packages
- Hoist common dependencies to root
- Use dependency constraints for consistency
# Enable Git features for large repos
git config core.untrackedCache true
git config core.fsmonitor true
git config index.threads 8
# Use sparse checkout for CI
git sparse-checkout init --cone
git sparse-checkout set apps/apiCaching Strategies:
- Cache node_modules
- Cache build artifacts
- Use remote caching (Nx Cloud, Turborepo Remote Cache)
Parallelization:
- Run tasks in parallel
- Distribute across multiple machines
- Use incremental builds
name: CI
on:
push:
branches: [main]
pull_request:
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
project: [api, web, mobile]
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-node@v4
with:
node-version: '18'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Test ${{ matrix.project }}
run: npm run test --workspace=${{ matrix.project }}# Only run tests for changed projects
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@v35
- name: Run tests for changed projects
if: steps.changed-files.outputs.any_changed == 'true'
run: |
for file in ${{ steps.changed-files.outputs.all_changed_files }}; do
# Determine which project changed and run tests
done-
Planning:
- Identify shared code
- Plan folder structure
- Set up tooling
-
Migration Steps:
- Create new monorepo
- Migrate repositories one by one
- Update CI/CD pipelines
- Update documentation
-
Post-Migration:
- Update import paths
- Consolidate dependencies
- Set up monorepo tooling
From Lerna to Nx:
# Install Nx
npm install -D nx
# Generate migration
nx migrate @nrwl/workspace
# Run migration
nx migrate --run-migrations- Clear Boundaries: Define clear ownership boundaries
- Shared Code: Extract common code into shared packages
- Documentation: Document project relationships and dependencies
- Atomic Commits: Make changes that affect multiple projects atomic
- Testing: Test changes across dependent projects
- Code Reviews: Review changes that affect shared code carefully
- Automation: Automate as much as possible
- Caching: Use caching to improve performance
- Monitoring: Monitor build times and resource usage
- Communication: Keep teams informed of changes
- Documentation: Document processes and conventions
- Training: Train team members on monorepo workflows
- Use smaller, focused PRs
- Communicate changes across teams
- Use automated conflict resolution tools
- Implement incremental builds
- Use distributed caching
- Optimize CI/CD pipelines
- Use workspace dependencies
- Implement dependency constraints
- Regular dependency updates
- Use Git LFS for large files
- Implement shallow clones for CI
- Archive old branches and tags
| Tool | Best For | Setup Complexity | Performance | Learning Curve |
|---|---|---|---|---|
| Git LFS | Large files | Low | Good | Low |
| Git Submodules | External dependencies | Medium | Fair | Medium |
| Nx | Task orchestration | High | Excellent | Medium |
| Turborepo | Build optimization | Medium | Excellent | Low |
| Lerna | Package management | Medium | Good | Medium |
- Monitor repository size
- Track build times
- Monitor dependency updates
- Regular cleanup tasks
- Measure development velocity
- Track cross-team dependencies
- Monitor code quality metrics
- Regular retrospectives
Monorepos offer significant benefits for large-scale development but require careful planning and tooling. Choose the right combination of tools for your team's needs, and invest time in setting up proper processes and automation. Regular evaluation and adjustment will help maintain an efficient and productive monorepo environment.