Migrate plyr->dplyr #268

MichaelChirico · 2025-05-31T06:46:31Z

{plyr} is long-superseded. The package itself only uses plyr::count(); a vignette also uses round_any().

I was pursuing just dropping {plyr} and re-implementing in base, but gave up for three main reasons:

table(<data.frame>) may fail if there are very few
duplicates and each column is of high cardinality, meaning
table(x) would have a very large number of 0 entries that
need to be computed and dropped (plyr::count() skips them).
We can use something like interaction(..., drop=TRUE) +
tapply() to imitate this, but it's hard to generically
reconstruct the un-interacted levels needed to build an
equivalent data.frame -- basically, we'd need to, for full
generality, use a sep=<str> where <str> is not present in
any of the unique values of any of the columns of x in order
for strsplit(<level>, <sep>) to uniquely map back.
Something like vapply(split(x, x), nrow, integer(1L)) is also
appealingly simple, but split() always drops missing levels
(https://bugs.r-project.org/show_bug.cgi?id=18899) --> we'd
need an onerous/ugly loop over the columns to replace missing
observations with a unique NA-equivalent, end-sorting sentinel.

Thus the move to {dplyr}, despite it being a non-lightweight choice.

I also applied some code quality fixes to nearby lines:

T/F --> TRUE/FALSE.
1:<n> loops replaced by seq_len()/seq_along(), as appropriate.
Loop like x <- c(); for (i in seq_along(y)) x[i] <- foo(y[i]) should pre-initialize x to be length(y).
Move some lines around to avoid creating variables just prior to a possible early return().

MichaelChirico added 3 commits May 30, 2025 23:43

Migrate plyr->dplyr

eec4e77

move very long comment to GitHub

fa80b8e

also respect row ordering

51ce1b5

MichaelChirico mentioned this pull request May 31, 2025

Refactor plyr::count to use dplyr ggobi/ggally#520

Merged

Provide feedback