Reduce number of features during dataset processing

Some datasets have large numbers of features (> 30,000) which makes some processing difficult. Reducing this during dataset processing would help with scalability. Some options include:

- Selecting the top X HVGs (where X is large, say 20,000)
- Selecting the top X genes with highest mean expression
- Selecting the top X genes with lowest percentage zeros
- Doing a high-dimensional (say 500) PCA/SVD and using that as input

The goal would be to limit dimensionality enough that processing/methods/metrics are able to run but without removing much information from the dataset. Selection could be used just for preprocessing or applied to the dataset for consistency.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce number of features during dataset processing #15

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reduce number of features during dataset processing #15

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions