If we have an unclean data frame with country names, we can use names_to_codes()
to match these to ISO3 codes. The function matches the names vector across all possible names found in the countries
data frame. ISO3 codes for exact matches are always returned, but the user has specific options for non-exact matches. They can be fuzzy matched (the default), always made NA, or require user input to confirm fuzzy matching results. Fuzzy matches always return a message to the user on the confirmed match. More details available through ?names_to_codes
.
library(whoville)
names_to_code(c("Venezuela", "Arentina", "afghanist"))
#> 'arentina' has no exact match. Closest name found was 'argentina'.
#> 'afghanist' has no exact match. Closest name found was 'afghanistan'.
#> [1] "VEN" "ARG" "AFG"
Since these functions are vectorized, we can easily use them in a normal workflow, especially if we’re using the tidyverse
. Below, we can clean up our tidy names and get the correct UN region and income group for our countries, as well as its name in Chinese:
library(dplyr)
df <- data.frame(c_names = c("Venezuela", "Arentina", "afghanist"))
df %>%
mutate(iso3 = names_to_code(c_names),
un_region = iso3_to_regions(iso3, region = "un_region"),
wb_ig = iso3_to_regions(iso3, region = "wb_ig"),
name_zh = iso3_to_names(iso3, org = "un", language = "zh"))
#> c_names iso3 un_region wb_ig name_zh
#> 1 Venezuela VEN 19 <NA> 委内瑞拉玻利瓦尔共和国
#> 2 Arentina ARG 19 UMC 阿根廷
#> 3 afghanist AFG 142 LIC 阿富汗