To begin, we can use sdg_overview()
to begin to explore all data available in the SDG database
library(goalie)
sdg_overview()
#> # A tibble: 16,732 × 12
#> goal goal_title goal_description target_description target_title target
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1
#> 2 1 End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1
#> 3 1 End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1
#> 4 1 End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1
#> 5 1 End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1
#> 6 1 End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1
#> 7 1 End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1
#> 8 1 End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1
#> 9 1 End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1
#> 10 1 End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1
#> # … with 16,722 more rows, and 6 more variables: indicator_description <chr>,
#> # indicator_tier <chr>, indicator <chr>, series_description <chr>,
#> # series_release <chr>, series <chr>
If we want the data for SI_POV_DAY1
, we could now just quickly access the data frame using sdg_data()
.
sdg_data("SI_POV_DAY1")
#> # A tibble: 3,046 × 23
#> Goal Target Indicator SeriesCode SeriesDescription GeoAreaCode GeoAreaName
#> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 2 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 3 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 4 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 5 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 6 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 7 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 8 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 9 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 10 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> # … with 3,036 more rows, and 16 more variables: TimePeriod <dbl>, Value <dbl>,
#> # Time_Detail <dbl>, TimeCoverage <lgl>, UpperBound <lgl>, LowerBound <lgl>,
#> # BasePeriod <lgl>, Source <chr>, GeoInfoUrl <lgl>, FootNote <chr>,
#> # Age <lgl>, Location <lgl>, Nature <chr>, Reporting_Type <chr>, Sex <lgl>,
#> # Units <chr>
From here, standard methods of data manipulation (e.g. base R, the tidyverse) could be used to select variables, filter rows, and explore the data. However, we can also continue to explore other aspects of the SDG database. For instance, if we wanted to see the dimensions and attributes of SI_POV_DAY1
, we can easily access that.
sdg_dimensions(series = "SI_POV_DAY1")
#> # A tibble: 142 × 4
#> id code description sdmx
#> <chr> <chr> <chr> <chr>
#> 1 Age <1M under 1 month old M0
#> 2 Age <1Y under 1 year old Y0
#> 3 Age <5Y under 5 years old Y0T4
#> 4 Age <15Y under 15 years old Y0T14
#> 5 Age <18Y under 18 years old Y0T17
#> 6 Age ALLAGE All age ranges or no breaks by age _T
#> 7 Age 1-14 1 to 14 years old Y1T14
#> 8 Age 1-17 1 to 17 years old Y1T17
#> 9 Age 5-14 5 to 14 years old Y5T14
#> 10 Age 5-17 5 to 17 years old Y5T17
#> # … with 132 more rows
sdg_attributes(series = "SI_POV_DAY1")
#> # A tibble: 8 × 4
#> id code description sdmx
#> <chr> <chr> <chr> <chr>
#> 1 Nature C Country data C
#> 2 Nature CA Country adjusted data CA
#> 3 Nature E Estimated data E
#> 4 Nature G Global monitoring data G
#> 5 Nature M Modeled data M
#> 6 Nature N Non-relevant N
#> 7 Nature NA Data nature not available _X
#> 8 Units PERCENT Percentage PT
Let’s say we want to get data for a specific country, then we could look up the M49 code using the table available through the API.
sdg_geoareas()
#> # A tibble: 390 × 2
#> geoAreaCode geoAreaName
#> <chr> <chr>
#> 1 4 Afghanistan
#> 2 248 Åland Islands
#> 3 8 Albania
#> 4 12 Algeria
#> 5 16 American Samoa
#> 6 20 Andorra
#> 7 24 Angola
#> 8 660 Anguilla
#> 9 10 Antarctica
#> 10 28 Antigua and Barbuda
#> # … with 380 more rows
We can then even check what data is available for a specific country, say Angola.
sdg_geoarea_data(24)
#> # A tibble: 478 × 12
#> goal goal_title goal_description target_description target_title target
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1
#> 2 1 End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1
#> 3 1 End povert… Goal 1 calls for … By 2030, reduce a… By 2030, redu… 1.2
#> 4 1 End povert… Goal 1 calls for … By 2030, reduce a… By 2030, redu… 1.2
#> 5 1 End povert… Goal 1 calls for … By 2030, reduce a… By 2030, redu… 1.2
#> 6 1 End povert… Goal 1 calls for … By 2030, reduce a… By 2030, redu… 1.2
#> 7 1 End povert… Goal 1 calls for … Implement nationa… Implement nat… 1.3
#> 8 1 End povert… Goal 1 calls for … Implement nationa… Implement nat… 1.3
#> 9 1 End povert… Goal 1 calls for … Implement nationa… Implement nat… 1.3
#> 10 1 End povert… Goal 1 calls for … Implement nationa… Implement nat… 1.3
#> # … with 468 more rows, and 6 more variables: indicator_description <chr>,
#> # indicator_tier <chr>, indicator <chr>, series_description <chr>,
#> # series_release <chr>, series <chr>
And we can get data from the SDG for multiple series in one call, with the output data frames already merged together.
sdg_data(c("SI_POV_DAY1", "SI_POV_EMP1", "SI_POV_NAHC"))
#> # A tibble: 27,296 × 23
#> Goal Target Indicator SeriesCode SeriesDescription GeoAreaCode GeoAreaName
#> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 2 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 3 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 4 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 5 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 6 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 7 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 8 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 9 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> 10 1 1.1 1.1.1 SI_POV_DAY1 Proportion of pop… 1 World
#> # … with 27,286 more rows, and 16 more variables: TimePeriod <dbl>,
#> # Value <dbl>, Time_Detail <dbl>, TimeCoverage <lgl>, UpperBound <lgl>,
#> # LowerBound <lgl>, BasePeriod <lgl>, Source <chr>, GeoInfoUrl <lgl>,
#> # FootNote <chr>, Age <chr>, Location <chr>, Nature <chr>,
#> # Reporting_Type <chr>, Sex <chr>, Units <chr>
Of course, the reality is that it’s likely easier for us to work outside the OData filtering framework and directly in R, so here’s a final more complex example using dplyr
and stringr
alongside goalie
to automatically download all indicators for Angola with the word “poverty” in the series description (case insensitive), for the years 1990 to 2005.
library(dplyr)
library(stringr)
sdg_geoarea_data(24) %>%
filter(str_detect(str_to_lower(series_description), "poverty")) %>%
pull(series) %>%
sdg_data(area_codes = 24, 1990, 2005)
#> # A tibble: 61 × 23
#> Goal Target Indicator SeriesCode SeriesDescription GeoAreaCode GeoAreaName
#> <dbl> <chr> <chr> <chr> <chr> <dbl> <chr>
#> 1 1 1.1 1.1.1 SI_POV_EMP1 Employed populati… 24 Angola
#> 2 1 1.1 1.1.1 SI_POV_EMP1 Employed populati… 24 Angola
#> 3 1 1.1 1.1.1 SI_POV_EMP1 Employed populati… 24 Angola
#> 4 1 1.1 1.1.1 SI_POV_EMP1 Employed populati… 24 Angola
#> 5 1 1.1 1.1.1 SI_POV_EMP1 Employed populati… 24 Angola
#> 6 1 1.1 1.1.1 SI_POV_EMP1 Employed populati… 24 Angola
#> 7 1 1.1 1.1.1 SI_POV_EMP1 Employed populati… 24 Angola
#> 8 1 1.1 1.1.1 SI_POV_EMP1 Employed populati… 24 Angola
#> 9 1 1.1 1.1.1 SI_POV_EMP1 Employed populati… 24 Angola
#> 10 1 1.1 1.1.1 SI_POV_EMP1 Employed populati… 24 Angola
#> # … with 51 more rows, and 16 more variables: TimePeriod <dbl>, Value <dbl>,
#> # Time_Detail <dbl>, TimeCoverage <lgl>, UpperBound <lgl>, LowerBound <lgl>,
#> # BasePeriod <lgl>, Source <chr>, GeoInfoUrl <lgl>, FootNote <chr>,
#> # Age <chr>, Location <lgl>, Nature <chr>, Reporting_Type <chr>, Sex <chr>,
#> # Units <chr>
And once we have that data, we can then filter, explore, and analyze the data with our standard R workflow, or even export the downloaded data to Excel or other analytical tools for further use.