Skip to content

feat: finalize_characters() #11

@jonthegeek

Description

@jonthegeek

Summary

As a data wrangler, in order to work with categorical data in the most appropriate type, I would like to automatically convert character columns to factors when they have a small number of unique values.

Proposed signature

finalize_characters(.dataset, max_unique = 30L)

Arguments

  • .dataset (data.frame) — The dataset to process.
  • max_unique (integer(1)) — Maximum number of distinct values a character column may have to be eligible for conversion. Defaults to 30L.

Returns the .dataset with qualifying character columns converted to factor.

Behavior

  • Iterates over all columns where is.character(col) is TRUE.
  • For each such column, counts the number of distinct non-NA values.
  • If that count is <= max_unique, converts the column to factor using factor(col).
  • Columns that are not character, or that exceed max_unique distinct values, are left unchanged.
  • NA values in a character column are preserved as NA in the resulting factor.

Details

We already import {stbl}, so use stbl::to_fct() for the factor casting.

This function should "match" finalize_integers(). If anything is repeated, abstract the shared code and update finalize_integers(). For example, the dataset check at the top of finalize_integers() should be repeated here, and thus abstracted to a .check_dataset_is_listish() helper defined in aaa-conditions.R. Be sure to accept a call argument in .check_dataset_is_listish() (default = rlang::caller_env()), and pass it on to .pkg_abort().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions