repo/
├── data/ # Raw and processed data
│ ├── raw/ # Original, unmodified data
│ ├── processed/ # Cleaned, intermediate datasets
│ └── metadata/ # Sample info, metadata, mappings
├── results/ # Analysis outputs, statistics, tables
├── figures/ # Generated plots and visualizations
├── src/ # Source code and scripts
│ ├── data_processing/ # Data cleaning and preparation
│ ├── analysis/ # Statistical analysis
│ └── utils/ # Helper functions
├── paper/ # Manuscripts and writeups
└── README.md # Project overview
- Use snake_case for all filenames
- Include date for versioned analyses:
analysis_2025-03-10.R - Separate logical units:
combined_dose_data_processed.csv - Use meaningful descriptors: no
data1.csv,final_final.csv - Data formats:
.csv(text),.fst(fast storage),.rds(R objects),.tsv(tab-delimited)
Example good names:
combined_dose_data_LFQ_only.fstsamples_model_data.tsvkinome_profiling_novartis_combined.fst
- Raw data stored in
data/raw/— never modified - Processed data in
data/processed/— scripts that create them are tracked in version control - Metadata/mappings in
data/metadata/or as explicit*_mapping.csvfiles - Create
*_mapping.csvor*_metadata.tsvfor sample IDs, library IDs, etc.