Skip to contents

To begin, we can use sdg_overview() to begin to explore all data available in the SDG database

library(goalie)

sdg_overview()
#> # A tibble: 16,732 × 12
#>    goal  goal_title  goal_description   target_description target_title   target
#>    <chr> <chr>       <chr>              <chr>              <chr>          <chr> 
#>  1 1     End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1   
#>  2 1     End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1   
#>  3 1     End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1   
#>  4 1     End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1   
#>  5 1     End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1   
#>  6 1     End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1   
#>  7 1     End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1   
#>  8 1     End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1   
#>  9 1     End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1   
#> 10 1     End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1   
#> # … with 16,722 more rows, and 6 more variables: indicator_description <chr>,
#> #   indicator_tier <chr>, indicator <chr>, series_description <chr>,
#> #   series_release <chr>, series <chr>

If we want the data for SI_POV_DAY1, we could now just quickly access the data frame using sdg_data().

sdg_data("SI_POV_DAY1")
#> # A tibble: 3,046 × 23
#>     Goal Target Indicator SeriesCode  SeriesDescription  GeoAreaCode GeoAreaName
#>    <dbl>  <dbl> <chr>     <chr>       <chr>                    <dbl> <chr>      
#>  1     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#>  2     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#>  3     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#>  4     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#>  5     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#>  6     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#>  7     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#>  8     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#>  9     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#> 10     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#> # … with 3,036 more rows, and 16 more variables: TimePeriod <dbl>, Value <dbl>,
#> #   Time_Detail <dbl>, TimeCoverage <lgl>, UpperBound <lgl>, LowerBound <lgl>,
#> #   BasePeriod <lgl>, Source <chr>, GeoInfoUrl <lgl>, FootNote <chr>,
#> #   Age <lgl>, Location <lgl>, Nature <chr>, Reporting_Type <chr>, Sex <lgl>,
#> #   Units <chr>

From here, standard methods of data manipulation (e.g. base R, the tidyverse) could be used to select variables, filter rows, and explore the data. However, we can also continue to explore other aspects of the SDG database. For instance, if we wanted to see the dimensions and attributes of SI_POV_DAY1, we can easily access that.

sdg_dimensions(series = "SI_POV_DAY1")
#> # A tibble: 142 × 4
#>    id    code   description                        sdmx 
#>    <chr> <chr>  <chr>                              <chr>
#>  1 Age   <1M    under 1 month old                  M0   
#>  2 Age   <1Y    under 1 year old                   Y0   
#>  3 Age   <5Y    under 5 years old                  Y0T4 
#>  4 Age   <15Y   under 15 years old                 Y0T14
#>  5 Age   <18Y   under 18 years old                 Y0T17
#>  6 Age   ALLAGE All age ranges or no breaks by age _T   
#>  7 Age   1-14   1 to 14 years old                  Y1T14
#>  8 Age   1-17   1 to 17 years old                  Y1T17
#>  9 Age   5-14   5 to 14 years old                  Y5T14
#> 10 Age   5-17   5 to 17 years old                  Y5T17
#> # … with 132 more rows
sdg_attributes(series = "SI_POV_DAY1")
#> # A tibble: 8 × 4
#>   id     code    description               sdmx 
#>   <chr>  <chr>   <chr>                     <chr>
#> 1 Nature C       Country data              C    
#> 2 Nature CA      Country adjusted data     CA   
#> 3 Nature E       Estimated data            E    
#> 4 Nature G       Global monitoring data    G    
#> 5 Nature M       Modeled data              M    
#> 6 Nature N       Non-relevant              N    
#> 7 Nature NA      Data nature not available _X   
#> 8 Units  PERCENT Percentage                PT

Let’s say we want to get data for a specific country, then we could look up the M49 code using the table available through the API.

sdg_geoareas()
#> # A tibble: 390 × 2
#>    geoAreaCode geoAreaName        
#>    <chr>       <chr>              
#>  1 4           Afghanistan        
#>  2 248         Åland Islands      
#>  3 8           Albania            
#>  4 12          Algeria            
#>  5 16          American Samoa     
#>  6 20          Andorra            
#>  7 24          Angola             
#>  8 660         Anguilla           
#>  9 10          Antarctica         
#> 10 28          Antigua and Barbuda
#> # … with 380 more rows

We can then even check what data is available for a specific country, say Angola.

sdg_geoarea_data(24)
#> # A tibble: 478 × 12
#>    goal  goal_title  goal_description   target_description target_title   target
#>    <chr> <chr>       <chr>              <chr>              <chr>          <chr> 
#>  1 1     End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1   
#>  2 1     End povert… Goal 1 calls for … By 2030, eradicat… By 2030, erad… 1.1   
#>  3 1     End povert… Goal 1 calls for … By 2030, reduce a… By 2030, redu… 1.2   
#>  4 1     End povert… Goal 1 calls for … By 2030, reduce a… By 2030, redu… 1.2   
#>  5 1     End povert… Goal 1 calls for … By 2030, reduce a… By 2030, redu… 1.2   
#>  6 1     End povert… Goal 1 calls for … By 2030, reduce a… By 2030, redu… 1.2   
#>  7 1     End povert… Goal 1 calls for … Implement nationa… Implement nat… 1.3   
#>  8 1     End povert… Goal 1 calls for … Implement nationa… Implement nat… 1.3   
#>  9 1     End povert… Goal 1 calls for … Implement nationa… Implement nat… 1.3   
#> 10 1     End povert… Goal 1 calls for … Implement nationa… Implement nat… 1.3   
#> # … with 468 more rows, and 6 more variables: indicator_description <chr>,
#> #   indicator_tier <chr>, indicator <chr>, series_description <chr>,
#> #   series_release <chr>, series <chr>

And we can get data from the SDG for multiple series in one call, with the output data frames already merged together.

sdg_data(c("SI_POV_DAY1", "SI_POV_EMP1", "SI_POV_NAHC"))
#> # A tibble: 27,296 × 23
#>     Goal Target Indicator SeriesCode  SeriesDescription  GeoAreaCode GeoAreaName
#>    <dbl>  <dbl> <chr>     <chr>       <chr>                    <dbl> <chr>      
#>  1     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#>  2     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#>  3     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#>  4     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#>  5     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#>  6     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#>  7     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#>  8     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#>  9     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#> 10     1    1.1 1.1.1     SI_POV_DAY1 Proportion of pop…           1 World      
#> # … with 27,286 more rows, and 16 more variables: TimePeriod <dbl>,
#> #   Value <dbl>, Time_Detail <dbl>, TimeCoverage <lgl>, UpperBound <lgl>,
#> #   LowerBound <lgl>, BasePeriod <lgl>, Source <chr>, GeoInfoUrl <lgl>,
#> #   FootNote <chr>, Age <chr>, Location <chr>, Nature <chr>,
#> #   Reporting_Type <chr>, Sex <chr>, Units <chr>

Of course, the reality is that it’s likely easier for us to work outside the OData filtering framework and directly in R, so here’s a final more complex example using dplyr and stringr alongside goalie to automatically download all indicators for Angola with the word “poverty” in the series description (case insensitive), for the years 1990 to 2005.

library(dplyr)
library(stringr)

sdg_geoarea_data(24) %>%
  filter(str_detect(str_to_lower(series_description), "poverty")) %>%
  pull(series) %>%
  sdg_data(area_codes = 24, 1990, 2005)
#> # A tibble: 61 × 23
#>     Goal Target Indicator SeriesCode  SeriesDescription  GeoAreaCode GeoAreaName
#>    <dbl> <chr>  <chr>     <chr>       <chr>                    <dbl> <chr>      
#>  1     1 1.1    1.1.1     SI_POV_EMP1 Employed populati…          24 Angola     
#>  2     1 1.1    1.1.1     SI_POV_EMP1 Employed populati…          24 Angola     
#>  3     1 1.1    1.1.1     SI_POV_EMP1 Employed populati…          24 Angola     
#>  4     1 1.1    1.1.1     SI_POV_EMP1 Employed populati…          24 Angola     
#>  5     1 1.1    1.1.1     SI_POV_EMP1 Employed populati…          24 Angola     
#>  6     1 1.1    1.1.1     SI_POV_EMP1 Employed populati…          24 Angola     
#>  7     1 1.1    1.1.1     SI_POV_EMP1 Employed populati…          24 Angola     
#>  8     1 1.1    1.1.1     SI_POV_EMP1 Employed populati…          24 Angola     
#>  9     1 1.1    1.1.1     SI_POV_EMP1 Employed populati…          24 Angola     
#> 10     1 1.1    1.1.1     SI_POV_EMP1 Employed populati…          24 Angola     
#> # … with 51 more rows, and 16 more variables: TimePeriod <dbl>, Value <dbl>,
#> #   Time_Detail <dbl>, TimeCoverage <lgl>, UpperBound <lgl>, LowerBound <lgl>,
#> #   BasePeriod <lgl>, Source <chr>, GeoInfoUrl <lgl>, FootNote <chr>,
#> #   Age <chr>, Location <lgl>, Nature <chr>, Reporting_Type <chr>, Sex <chr>,
#> #   Units <chr>

And once we have that data, we can then filter, explore, and analyze the data with our standard R workflow, or even export the downloaded data to Excel or other analytical tools for further use.