Skip to content

Reduce number of features during dataset processing #15

@lazappi

Description

@lazappi

Some datasets have large numbers of features (> 30,000) which makes some processing difficult. Reducing this during dataset processing would help with scalability. Some options include:

  • Selecting the top X HVGs (where X is large, say 20,000)
  • Selecting the top X genes with highest mean expression
  • Selecting the top X genes with lowest percentage zeros
  • Doing a high-dimensional (say 500) PCA/SVD and using that as input

The goal would be to limit dimensionality enough that processing/methods/metrics are able to run but without removing much information from the dataset. Selection could be used just for preprocessing or applied to the dataset for consistency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions