Parallelisation of Data Writing and Feature subsetting by Saiyam26 · Pull Request #25 · infocusp/scaLR

Saiyam26 · 2025-01-03T08:06:32Z

Data writing to disk can now be done with multiple workers. Feature subset training to extract top-features has also been made parallel

… working due to event logging in pre-processing

…ise_writing

…+ removed print statement

…y dataset being written during parallel write

…el worker_chunksize

…correctly loading into memory

mayur-iipl

Some minor comments.

scalr/data/preprocess/_preprocess.py

scalr/feature/feature_subsetting.py

scalr/utils/file_utils.py

mayur-iipl

LGTM

…e_subset_training

anand-infocusp

Some nit comments, else lgtm

config/README.md

scalr/utils/file_utils.py

scalr/feature/feature_subsetting.py

comments addressed Co-authored-by: anand-infocusp <anand@infocusp.com>

anand-infocusp

Lgtm

mayur-iipl

LGTM

Saiyam26 added 11 commits October 24, 2024 16:49

parallel chunkwise writing using num_workers param from config -- not…

91037f6

… working due to event logging in pre-processing

remove event logger for parallelization purposes

475a208

create a copy of data to apply pre-processing on during transformation

e7018f7

Merge branch 'main' of github.com:infocusp/scaLR into parallel_chunkw…

59e5623

…ise_writing

added num_workers param in config

ac30dce

solved bug for num_workers not working with None

797600e

parallel feature subset training; rudimentary

83ff3b6

removed erroneous validation by train set in feature_subset_training …

363e290

…+ removed print statement

Setting seed value every time before model training + handling of emp…

16ad8c6

…y dataset being written during parallel write

solved error of missing few samples due to integer division of parall…

e88f9a1

…el worker_chunksize

fixed error in parallelization not working for a dataset, andata not …

11e77f0

…correctly loading into memory

Saiyam26 requested review from anand-infocusp and mayur-iipl January 3, 2025 08:07

update requirements.txt & tutorial_config

14627fb

mayur-iipl reviewed Jan 6, 2025

View reviewed changes

setting default values of num_workers to 1

df6a8db

mayur-iipl previously approved these changes Jan 6, 2025

View reviewed changes

Merge branch 'main' of github.com:infocusp/scaLR into parallel_featur…

ae7ca80

…e_subset_training

anand-infocusp previously approved these changes Jan 13, 2025

View reviewed changes

config/README.md Show resolved Hide resolved

scalr/utils/file_utils.py Outdated Show resolved Hide resolved

scalr/feature/feature_subsetting.py Show resolved Hide resolved

Update scalr/utils/file_utils.py

691d811

comments addressed Co-authored-by: anand-infocusp <anand@infocusp.com>

Saiyam26 dismissed stale reviews from anand-infocusp and mayur-iipl via 691d811 January 16, 2025 11:13

comment addressed, added comment

e3a13a2

anand-infocusp approved these changes Jan 16, 2025

View reviewed changes

mayur-iipl approved these changes Jan 16, 2025

View reviewed changes

Saiyam26 merged commit 2d6159b into main Jan 16, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelisation of Data Writing and Feature subsetting#25

Parallelisation of Data Writing and Feature subsetting#25
Saiyam26 merged 16 commits intomainfrom
parallel_feature_subset_training

Saiyam26 commented Jan 3, 2025

Uh oh!

mayur-iipl left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mayur-iipl left a comment

Uh oh!

anand-infocusp left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anand-infocusp left a comment

Uh oh!

mayur-iipl left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Saiyam26 commented Jan 3, 2025

Uh oh!

mayur-iipl left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mayur-iipl left a comment

Choose a reason for hiding this comment

Uh oh!

anand-infocusp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anand-infocusp left a comment

Choose a reason for hiding this comment

Uh oh!

mayur-iipl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mayur-iipl left a comment •

edited

Loading