Skip to content

Conversation

mdkompella
Copy link
Collaborator

I added the function get_park_names() which takes a dataframe with a unit code column and adds a unit name column.

Copy link
Member

@RobLBaker RobLBaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a single parameter, e.g. df$unit_column?

You could then eliminate the first line of code.

#' get_park_names(exampleDF, "parkCode")
#' }

get_park_names <- function(df, unit_column) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a single parameter, e.g. "df$unit_column"? Then you could remove the first line of code in the function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept them separate because I use the data frame provided by the user to return that same data frame with the new column appended. Would you rather the function return the dataframe or the list of park names?

unit_codes_na <- NULL

#copied from Rob's function
for (i in 1:length(unit_code)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can cause some problems with edge cases where the length is zero. It's typically better to use seq_along() (usually I use it in combination with length()).

#else if ref_data corresponds to more than one park name, assign park_name to NA and add unit code to na list
#else assign park name to the full unit (park) name
if (length(ref_data) == 0) {
park_name <- NA_character_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this "NA_character_" variable is undefined. Should it be user-supplied in the function call?

park_name <- NA_character_
unit_codes_na <- append(unit_codes_na, unit_code[i])
} else if (length(ref_data[[1]]) != 1) {
park_name <- NA_character_
Copy link
Member

@RobLBaker RobLBaker Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an advantage to using NA_character_ as opposed to just NA?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the column is all character class, I thought an NA_character_ would keep missing values consistent to the column type. Let me know if you think a regular NA would work better.

# create new dataframe with unit name column
df2 <- df %>%
mutate(parkName = unit_names) %>%
relocate(parkName, .after = unit_column)
Copy link
Member

@RobLBaker RobLBaker Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add the package designation for relocate (eg. dplyr::relocate(parkName... ).

relocate also appears to be calling select, which has been deprecated in tidyselect 1.1.0. Is there an updated version of relocate that could be used (or base R?):

Warning message: Using an external vector in selections was deprecated in tidyselect 1.1.0. ℹ Please use all_of()orany_of()` instead.

Was:

data %>% select(unit_column)

Now:

data %>% select(all_of(unit_column))

See
https://tidyselect.r-lib.org/reference/faq-external-vector.html.
This warning is displayed once every 8 hours.
Call lifecycle::last_lifecycle_warnings() to see where this
warning was generated.

<warning/lifecycle_warning_deprecated>
Warning:
Using an external vector in selections was deprecated in
tidyselect 1.1.0.
ℹ Please use all_of() or any_of() instead.

Was:

data %>% select(unit_column)

Now:

data %>% select(all_of(unit_column))

See
https://tidyselect.r-lib.org/reference/faq-external-vector.html.

Backtrace:

  1. ├─global get_park_names(df, "park_units")
  2. │ └─df %>% mutate(parkName = unit_names) %>% ...
  3. ├─dplyr::relocate(., parkName, .after = unit_column)
  4. └─dplyr:::relocate.data.frame(., parkName, .after = unit_column)
  5. └─dplyr:::eval_relocate(...)
  6. └─tidyselect::eval_select(after, data, env = env, error_call = error_call)
    
  7.   └─tidyselect:::eval_select_impl(...)
    
  8.     ├─tidyselect:::with_subscript_errors(...)
    
  9.     │ └─base::withCallingHandlers(...)
    
  10.     └─tidyselect:::vars_select_eval(...)
    
  11.       └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
    
  12.         └─tidyselect:::eval_sym(expr, data_mask, context_mask)
    


# create new dataframe with unit name column
df2 <- df %>%
mutate(parkName = unit_names) %>%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add the package designation for "mutate" (e.g. dplyr::mutate)

# if ref_data list is empty (no corresponding park name) assign park_name to NA and add unit code to na list
# else if ref_data corresponds to more than one park name, assign park_name to NA and add unit code to na list
# else assign park name to the full unit (park) name
if (length(ref_data) == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is doing what you want it to do. For most(?) responces (e.g. ROMO), the length of ref_data is 13. I think perhaps you want nrow(ref_data), which would be 1 for ROMO.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This first statement checks if the unit code resolved to no park names at all, meaning ref_data is not populated. I could also use: if (is.null(nrow(ref_data)), since ref_data remains null if there is no corresponding park name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good. I think you need to take care of the functions that need to be attributed to dplyr, the functions that call deprecated functions, and better handle non-resolveable/ambiguous codes. For instance, if I supply the unit code "G", there are something like 72 matches. But I get an NA (reasonable) and a warning that "The following unit codes were not found: G" when the problem is not so much that they weren't found as that they weren't resolveable to a single unique park name.

@RobLBaker
Copy link
Member

Overall, this looks pretty close! It would be nice to add some basic unit tests for the function.

…ames, many_names for codes with more than one corresponding park name

added another example with missing value parameters
added missing value parameters to function definition
…any park names

populate those lists with unit codes based on size of ref_data variable
added package designation for "mutate", use any_of() to get around deprecated select() function
created two print statements: one prints all unique codes with no corresponding park names, one prints all unique codes with many corresponding park names
…ames, many_names for codes with more than one corresponding park name
…responding park names, one prints all unique codes with many corresponding park names
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants