To begin, we can use gho_indicators()
to begin to explore all data available in the GHO.
library(ghost)
gho_indicators()
#> # A tibble: 2,242 × 3
#> IndicatorCode IndicatorName Language
#> <chr> <chr> <chr>
#> 1 AIR_10 Ambient air pollution attributable DALYs per 100'000… EN
#> 2 AIR_11 Household air pollution attributable deaths EN
#> 3 AIR_12 Household air pollution attributable deaths in childr… EN
#> 4 AIR_13 Household air pollution attributable deaths per 100'0… EN
#> 5 AIR_14 Household air pollution attributable deaths per 100'… EN
#> 6 AIR_15 Household air pollution attributable DALYs EN
#> 7 AIR_16 Household air pollution attributable DALYs in childre… EN
#> 8 AIR_17 Household air pollution attributable DALYs (per 100 0… EN
#> 9 AIR_18 Household air pollution attributable DALYs per 100'0… EN
#> 10 AIR_39 Household air pollution attributable DALYs (per 100 0… EN
#> # … with 2,232 more rows
If we want the data for AIR_10
, we could now just quickly access the data frame using gho_data()
.
gho_data("AIR_10")
#> # A tibble: 173 × 23
#> Id IndicatorCode SpatialDimType SpatialDim TimeDimType TimeDim Dim1Type
#> <int> <chr> <chr> <chr> <chr> <int> <lgl>
#> 1 6452 AIR_10 COUNTRY AFG YEAR 2004 NA
#> 2 6453 AIR_10 COUNTRY ALB YEAR 2004 NA
#> 3 6454 AIR_10 COUNTRY DZA YEAR 2004 NA
#> 4 6455 AIR_10 COUNTRY AND YEAR 2004 NA
#> 5 6456 AIR_10 COUNTRY AGO YEAR 2004 NA
#> 6 6457 AIR_10 COUNTRY ATG YEAR 2004 NA
#> 7 6458 AIR_10 COUNTRY ARG YEAR 2004 NA
#> 8 6459 AIR_10 COUNTRY ARM YEAR 2004 NA
#> 9 6460 AIR_10 COUNTRY AUS YEAR 2004 NA
#> 10 6461 AIR_10 COUNTRY AUT YEAR 2004 NA
#> # … with 163 more rows, and 16 more variables: Dim1 <lgl>, Dim2Type <lgl>,
#> # Dim2 <lgl>, Dim3Type <lgl>, Dim3 <lgl>, DataSourceDimType <lgl>,
#> # DataSourceDim <lgl>, Value <chr>, NumericValue <dbl>, Low <lgl>,
#> # High <lgl>, Comments <lgl>, Date <chr>, TimeDimensionValue <chr>,
#> # TimeDimensionBegin <dbl>, TimeDimensionEnd <dbl>
From here, standard methods of data manipulation (e.g. base R, the tidyverse) could be used to select variables, filter rows, and explore the data. However, we could also provide OData queries as desired, filtering on different dimensions of the data. Let’s first have a quick look at available dimensions.
gho_dimensions()
#> # A tibble: 92 × 2
#> Code Title
#> <chr> <chr>
#> 1 ADVERTISINGTYPE SUBSTANCE_ABUSE_ADVERTISING_TYPES
#> 2 AGEGROUP Age Group
#> 3 ALCOHOLTYPE Beverage Types
#> 4 AMRGLASSCATEGORY AMR GLASS Category
#> 5 ARCHIVE Archive date
#> 6 AWARENESSACTIVITYTYPE SUBSTANCE_ABUSE_AWARENESS_ACTIVITY_TYPES
#> 7 BACGROUP SUBSTANCE_ABUSE_BAC_GROUPS
#> 8 BEVERAGETYPE SUBSTANCE_ABUSE_BEVERAGE_TYPES
#> 9 CAREPATIENT Patient type
#> 10 CARESECTOR Care sector
#> # … with 82 more rows
Let’s say we want to filter by COUNTRY
, then we can explore explore the possible values the SpatialDim COUNTRY
dimension can take.
gho_dimension_values("COUNTRY")
#> # A tibble: 245 × 6
#> Code Title ParentDimension Dimension ParentCode ParentTitle
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 ABW Aruba REGION COUNTRY AMR Americas
#> 2 AFG Afghanistan REGION COUNTRY EMR Eastern Med…
#> 3 AGO Angola REGION COUNTRY AFR Africa
#> 4 AIA Anguilla REGION COUNTRY AMR Americas
#> 5 ALB Albania REGION COUNTRY EUR Europe
#> 6 AND Andorra REGION COUNTRY EUR Europe
#> 7 ANT530 SPATIAL_SYNONYM REGION COUNTRY AMR Americas
#> 8 ANT532 SPATIAL_SYNONYM REGION COUNTRY AMR Americas
#> 9 ARE United Arab Emirates REGION COUNTRY EMR Eastern Med…
#> 10 ARG Argentina REGION COUNTRY AMR Americas
#> # … with 235 more rows
If we wanted to only extract AIR_10
data on Burundi from the GHO, then we can now implement an OData query using the code we’ve identified above. While ghost doesn’t implement complex checks on your OData queries due to their complexity, it allows you to type them with spaces and checks that each query begins with the required "$filter=..."
.
gho_data("AIR_10", "$filter=SpatialDim eq 'BDI'")
#> # A tibble: 1 × 23
#> Id IndicatorCode SpatialDimType SpatialDim TimeDimType TimeDim Dim1Type
#> <int> <chr> <chr> <chr> <chr> <int> <lgl>
#> 1 6479 AIR_10 COUNTRY BDI YEAR 2004 NA
#> # … with 16 more variables: Dim1 <lgl>, Dim2Type <lgl>, Dim2 <lgl>,
#> # Dim3Type <lgl>, Dim3 <lgl>, DataSourceDimType <lgl>, DataSourceDim <lgl>,
#> # Value <chr>, NumericValue <dbl>, Low <lgl>, High <lgl>, Comments <lgl>,
#> # Date <chr>, TimeDimensionValue <chr>, TimeDimensionBegin <dbl>,
#> # TimeDimensionEnd <dbl>
And we can get data from the GHO on multiple indicators in one call, with the output data frames already merged together.
gho_data(c("AIR_10", "AIR_11", "AIR_12"), "$filter=SpatialDim eq 'BDI'")
#> # A tibble: 22 × 23
#> Id IndicatorCode SpatialDimType SpatialDim TimeDimType TimeDim Dim1Type
#> <int> <chr> <chr> <chr> <chr> <int> <chr>
#> 1 6479 AIR_10 COUNTRY BDI YEAR 2004 NA
#> 2 19580064 AIR_11 COUNTRY BDI YEAR 2016 SEX
#> 3 19580065 AIR_11 COUNTRY BDI YEAR 2016 SEX
#> 4 19580066 AIR_11 COUNTRY BDI YEAR 2016 SEX
#> 5 19580067 AIR_11 COUNTRY BDI YEAR 2016 SEX
#> 6 19580068 AIR_11 COUNTRY BDI YEAR 2016 SEX
#> 7 19580069 AIR_11 COUNTRY BDI YEAR 2016 SEX
#> 8 19580070 AIR_11 COUNTRY BDI YEAR 2016 SEX
#> 9 19580071 AIR_11 COUNTRY BDI YEAR 2016 SEX
#> 10 19580072 AIR_11 COUNTRY BDI YEAR 2016 SEX
#> # … with 12 more rows, and 16 more variables: Dim1 <chr>, Dim2Type <chr>,
#> # Dim2 <chr>, Dim3Type <lgl>, Dim3 <lgl>, DataSourceDimType <lgl>,
#> # DataSourceDim <lgl>, Value <chr>, NumericValue <dbl>, Low <dbl>,
#> # High <dbl>, Comments <lgl>, Date <chr>, TimeDimensionValue <chr>,
#> # TimeDimensionBegin <dbl>, TimeDimensionEnd <dbl>
We can even provide different filters for each indicator separately, such as Burundi for AIR_10
, Uganda for AIR_11
, and South Africa for AIR_12
.
gho_data(c("AIR_10", "AIR_11", "AIR_12"),
c("$filter=SpatialDim eq 'BDI'", "$filter=SpatialDim eq 'UGA'", "$filter=SpatialDim eq 'ZAF'"))
#> # A tibble: 22 × 23
#> Id IndicatorCode SpatialDimType SpatialDim TimeDimType TimeDim Dim1Type
#> <int> <chr> <chr> <chr> <chr> <int> <chr>
#> 1 6479 AIR_10 COUNTRY BDI YEAR 2004 NA
#> 2 19582872 AIR_11 COUNTRY UGA YEAR 2016 SEX
#> 3 19582873 AIR_11 COUNTRY UGA YEAR 2016 SEX
#> 4 19582874 AIR_11 COUNTRY UGA YEAR 2016 SEX
#> 5 19582875 AIR_11 COUNTRY UGA YEAR 2016 SEX
#> 6 19582876 AIR_11 COUNTRY UGA YEAR 2016 SEX
#> 7 19582877 AIR_11 COUNTRY UGA YEAR 2016 SEX
#> 8 19582878 AIR_11 COUNTRY UGA YEAR 2016 SEX
#> 9 19582879 AIR_11 COUNTRY UGA YEAR 2016 SEX
#> 10 19582880 AIR_11 COUNTRY UGA YEAR 2016 SEX
#> # … with 12 more rows, and 16 more variables: Dim1 <chr>, Dim2Type <chr>,
#> # Dim2 <chr>, Dim3Type <lgl>, Dim3 <lgl>, DataSourceDimType <lgl>,
#> # DataSourceDim <lgl>, Value <chr>, NumericValue <dbl>, Low <dbl>,
#> # High <dbl>, Comments <lgl>, Date <chr>, TimeDimensionValue <chr>,
#> # TimeDimensionBegin <dbl>, TimeDimensionEnd <dbl>
Of course, the reality is that it’s likely easier for us to work outside the OData filtering framework and directly in R, so here’s a final more complex example using dplyr and stringr alongside ghost to automatically download all indicators with the word “drug” in the indicator name (case insensitive).
library(dplyr)
library(stringr)
gho_indicators() %>%
filter(str_detect(str_to_lower(IndicatorName), "drug")) %>%
pull(IndicatorCode) %>%
gho_data()
#> # A tibble: 25,302 × 23
#> Id IndicatorCode SpatialDimType SpatialDim TimeDimType TimeDim Dim1Type
#> <int> <chr> <chr> <chr> <chr> <int> <chr>
#> 1 273692 MALARIA_30539 COUNTRY MWI YEAR 2004 RESIDENCE…
#> 2 273693 MALARIA_30539 COUNTRY MWI YEAR 2004 RESIDENCE…
#> 3 273694 MALARIA_30539 COUNTRY MWI YEAR 2004 NA
#> 4 273695 MALARIA_30539 COUNTRY TZA YEAR 2004 RESIDENCE…
#> 5 273714 MALARIA_30539 COUNTRY BDI YEAR 2005 RESIDENCE…
#> 6 273715 MALARIA_30539 COUNTRY COG YEAR 2005 RESIDENCE…
#> 7 273716 MALARIA_30539 COUNTRY COG YEAR 2005 RESIDENCE…
#> 8 273717 MALARIA_30539 COUNTRY COG YEAR 2005 NA
#> 9 273718 MALARIA_30539 COUNTRY GIN YEAR 2005 RESIDENCE…
#> 10 273719 MALARIA_30539 COUNTRY GIN YEAR 2005 RESIDENCE…
#> # … with 25,292 more rows, and 16 more variables: Dim1 <chr>, Dim2Type <lgl>,
#> # Dim2 <lgl>, Dim3Type <lgl>, Dim3 <lgl>, DataSourceDimType <chr>,
#> # DataSourceDim <lgl>, Value <chr>, NumericValue <dbl>, Low <lgl>,
#> # High <lgl>, Comments <lgl>, Date <chr>, TimeDimensionValue <chr>,
#> # TimeDimensionBegin <dbl>, TimeDimensionEnd <dbl>
And once we have that data, we can then filter, explore, and analyze the data with our standard R workflow, or even export the downloaded data to Excel or other analytical tools for further use.