Usage of ghost • ghost

To begin, we can use gho_indicators() to begin to explore all data available in the GHO.

library(ghost)

gho_indicators()
#> # A tibble: 2,242 × 3
#>    IndicatorCode IndicatorName                                          Language
#>    <chr>         <chr>                                                  <chr>   
#>  1 AIR_10        Ambient air pollution  attributable DALYs per 100'000… EN      
#>  2 AIR_11        Household air pollution attributable deaths            EN      
#>  3 AIR_12        Household air pollution attributable deaths in childr… EN      
#>  4 AIR_13        Household air pollution attributable deaths per 100'0… EN      
#>  5 AIR_14        Household air pollution  attributable deaths per 100'… EN      
#>  6 AIR_15        Household air pollution attributable DALYs             EN      
#>  7 AIR_16        Household air pollution attributable DALYs in childre… EN      
#>  8 AIR_17        Household air pollution attributable DALYs (per 100 0… EN      
#>  9 AIR_18        Household air pollution  attributable DALYs per 100'0… EN      
#> 10 AIR_39        Household air pollution attributable DALYs (per 100 0… EN      
#> # … with 2,232 more rows

If we want the data for AIR_10, we could now just quickly access the data frame using gho_data().

gho_data("AIR_10")
#> # A tibble: 173 × 23
#>       Id IndicatorCode SpatialDimType SpatialDim TimeDimType TimeDim Dim1Type
#>    <int> <chr>         <chr>          <chr>      <chr>         <int> <lgl>   
#>  1  6452 AIR_10        COUNTRY        AFG        YEAR           2004 NA      
#>  2  6453 AIR_10        COUNTRY        ALB        YEAR           2004 NA      
#>  3  6454 AIR_10        COUNTRY        DZA        YEAR           2004 NA      
#>  4  6455 AIR_10        COUNTRY        AND        YEAR           2004 NA      
#>  5  6456 AIR_10        COUNTRY        AGO        YEAR           2004 NA      
#>  6  6457 AIR_10        COUNTRY        ATG        YEAR           2004 NA      
#>  7  6458 AIR_10        COUNTRY        ARG        YEAR           2004 NA      
#>  8  6459 AIR_10        COUNTRY        ARM        YEAR           2004 NA      
#>  9  6460 AIR_10        COUNTRY        AUS        YEAR           2004 NA      
#> 10  6461 AIR_10        COUNTRY        AUT        YEAR           2004 NA      
#> # … with 163 more rows, and 16 more variables: Dim1 <lgl>, Dim2Type <lgl>,
#> #   Dim2 <lgl>, Dim3Type <lgl>, Dim3 <lgl>, DataSourceDimType <lgl>,
#> #   DataSourceDim <lgl>, Value <chr>, NumericValue <dbl>, Low <lgl>,
#> #   High <lgl>, Comments <lgl>, Date <chr>, TimeDimensionValue <chr>,
#> #   TimeDimensionBegin <dbl>, TimeDimensionEnd <dbl>

From here, standard methods of data manipulation (e.g. base R, the tidyverse) could be used to select variables, filter rows, and explore the data. However, we could also provide OData queries as desired, filtering on different dimensions of the data. Let’s first have a quick look at available dimensions.

gho_dimensions()
#> # A tibble: 92 × 2
#>    Code                  Title                                   
#>    <chr>                 <chr>                                   
#>  1 ADVERTISINGTYPE       SUBSTANCE_ABUSE_ADVERTISING_TYPES       
#>  2 AGEGROUP              Age Group                               
#>  3 ALCOHOLTYPE           Beverage Types                          
#>  4 AMRGLASSCATEGORY      AMR GLASS Category                      
#>  5 ARCHIVE               Archive date                            
#>  6 AWARENESSACTIVITYTYPE SUBSTANCE_ABUSE_AWARENESS_ACTIVITY_TYPES
#>  7 BACGROUP              SUBSTANCE_ABUSE_BAC_GROUPS              
#>  8 BEVERAGETYPE          SUBSTANCE_ABUSE_BEVERAGE_TYPES          
#>  9 CAREPATIENT           Patient type                            
#> 10 CARESECTOR            Care sector                             
#> # … with 82 more rows

Let’s say we want to filter by COUNTRY, then we can explore explore the possible values the SpatialDim COUNTRY dimension can take.

gho_dimension_values("COUNTRY")
#> # A tibble: 245 × 6
#>    Code   Title                ParentDimension Dimension ParentCode ParentTitle 
#>    <chr>  <chr>                <chr>           <chr>     <chr>      <chr>       
#>  1 ABW    Aruba                REGION          COUNTRY   AMR        Americas    
#>  2 AFG    Afghanistan          REGION          COUNTRY   EMR        Eastern Med…
#>  3 AGO    Angola               REGION          COUNTRY   AFR        Africa      
#>  4 AIA    Anguilla             REGION          COUNTRY   AMR        Americas    
#>  5 ALB    Albania              REGION          COUNTRY   EUR        Europe      
#>  6 AND    Andorra              REGION          COUNTRY   EUR        Europe      
#>  7 ANT530 SPATIAL_SYNONYM      REGION          COUNTRY   AMR        Americas    
#>  8 ANT532 SPATIAL_SYNONYM      REGION          COUNTRY   AMR        Americas    
#>  9 ARE    United Arab Emirates REGION          COUNTRY   EMR        Eastern Med…
#> 10 ARG    Argentina            REGION          COUNTRY   AMR        Americas    
#> # … with 235 more rows

If we wanted to only extract AIR_10 data on Burundi from the GHO, then we can now implement an OData query using the code we’ve identified above. While ghost doesn’t implement complex checks on your OData queries due to their complexity, it allows you to type them with spaces and checks that each query begins with the required "$filter=...".

gho_data("AIR_10", "$filter=SpatialDim eq 'BDI'")
#> # A tibble: 1 × 23
#>      Id IndicatorCode SpatialDimType SpatialDim TimeDimType TimeDim Dim1Type
#>   <int> <chr>         <chr>          <chr>      <chr>         <int> <lgl>   
#> 1  6479 AIR_10        COUNTRY        BDI        YEAR           2004 NA      
#> # … with 16 more variables: Dim1 <lgl>, Dim2Type <lgl>, Dim2 <lgl>,
#> #   Dim3Type <lgl>, Dim3 <lgl>, DataSourceDimType <lgl>, DataSourceDim <lgl>,
#> #   Value <chr>, NumericValue <dbl>, Low <lgl>, High <lgl>, Comments <lgl>,
#> #   Date <chr>, TimeDimensionValue <chr>, TimeDimensionBegin <dbl>,
#> #   TimeDimensionEnd <dbl>

And we can get data from the GHO on multiple indicators in one call, with the output data frames already merged together.

gho_data(c("AIR_10", "AIR_11", "AIR_12"), "$filter=SpatialDim eq 'BDI'")
#> # A tibble: 22 × 23
#>          Id IndicatorCode SpatialDimType SpatialDim TimeDimType TimeDim Dim1Type
#>       <int> <chr>         <chr>          <chr>      <chr>         <int> <chr>   
#>  1     6479 AIR_10        COUNTRY        BDI        YEAR           2004 NA      
#>  2 19580064 AIR_11        COUNTRY        BDI        YEAR           2016 SEX     
#>  3 19580065 AIR_11        COUNTRY        BDI        YEAR           2016 SEX     
#>  4 19580066 AIR_11        COUNTRY        BDI        YEAR           2016 SEX     
#>  5 19580067 AIR_11        COUNTRY        BDI        YEAR           2016 SEX     
#>  6 19580068 AIR_11        COUNTRY        BDI        YEAR           2016 SEX     
#>  7 19580069 AIR_11        COUNTRY        BDI        YEAR           2016 SEX     
#>  8 19580070 AIR_11        COUNTRY        BDI        YEAR           2016 SEX     
#>  9 19580071 AIR_11        COUNTRY        BDI        YEAR           2016 SEX     
#> 10 19580072 AIR_11        COUNTRY        BDI        YEAR           2016 SEX     
#> # … with 12 more rows, and 16 more variables: Dim1 <chr>, Dim2Type <chr>,
#> #   Dim2 <chr>, Dim3Type <lgl>, Dim3 <lgl>, DataSourceDimType <lgl>,
#> #   DataSourceDim <lgl>, Value <chr>, NumericValue <dbl>, Low <dbl>,
#> #   High <dbl>, Comments <lgl>, Date <chr>, TimeDimensionValue <chr>,
#> #   TimeDimensionBegin <dbl>, TimeDimensionEnd <dbl>

We can even provide different filters for each indicator separately, such as Burundi for AIR_10, Uganda for AIR_11, and South Africa for AIR_12.

gho_data(c("AIR_10", "AIR_11", "AIR_12"), 
         c("$filter=SpatialDim eq 'BDI'", "$filter=SpatialDim eq 'UGA'", "$filter=SpatialDim eq 'ZAF'"))
#> # A tibble: 22 × 23
#>          Id IndicatorCode SpatialDimType SpatialDim TimeDimType TimeDim Dim1Type
#>       <int> <chr>         <chr>          <chr>      <chr>         <int> <chr>   
#>  1     6479 AIR_10        COUNTRY        BDI        YEAR           2004 NA      
#>  2 19582872 AIR_11        COUNTRY        UGA        YEAR           2016 SEX     
#>  3 19582873 AIR_11        COUNTRY        UGA        YEAR           2016 SEX     
#>  4 19582874 AIR_11        COUNTRY        UGA        YEAR           2016 SEX     
#>  5 19582875 AIR_11        COUNTRY        UGA        YEAR           2016 SEX     
#>  6 19582876 AIR_11        COUNTRY        UGA        YEAR           2016 SEX     
#>  7 19582877 AIR_11        COUNTRY        UGA        YEAR           2016 SEX     
#>  8 19582878 AIR_11        COUNTRY        UGA        YEAR           2016 SEX     
#>  9 19582879 AIR_11        COUNTRY        UGA        YEAR           2016 SEX     
#> 10 19582880 AIR_11        COUNTRY        UGA        YEAR           2016 SEX     
#> # … with 12 more rows, and 16 more variables: Dim1 <chr>, Dim2Type <chr>,
#> #   Dim2 <chr>, Dim3Type <lgl>, Dim3 <lgl>, DataSourceDimType <lgl>,
#> #   DataSourceDim <lgl>, Value <chr>, NumericValue <dbl>, Low <dbl>,
#> #   High <dbl>, Comments <lgl>, Date <chr>, TimeDimensionValue <chr>,
#> #   TimeDimensionBegin <dbl>, TimeDimensionEnd <dbl>

Of course, the reality is that it’s likely easier for us to work outside the OData filtering framework and directly in R, so here’s a final more complex example using dplyr and stringr alongside ghost to automatically download all indicators with the word “drug” in the indicator name (case insensitive).

library(dplyr)
library(stringr)

gho_indicators() %>%
  filter(str_detect(str_to_lower(IndicatorName), "drug")) %>%
  pull(IndicatorCode) %>%
  gho_data()
#> # A tibble: 25,302 × 23
#>        Id IndicatorCode SpatialDimType SpatialDim TimeDimType TimeDim Dim1Type  
#>     <int> <chr>         <chr>          <chr>      <chr>         <int> <chr>     
#>  1 273692 MALARIA_30539 COUNTRY        MWI        YEAR           2004 RESIDENCE…
#>  2 273693 MALARIA_30539 COUNTRY        MWI        YEAR           2004 RESIDENCE…
#>  3 273694 MALARIA_30539 COUNTRY        MWI        YEAR           2004 NA        
#>  4 273695 MALARIA_30539 COUNTRY        TZA        YEAR           2004 RESIDENCE…
#>  5 273714 MALARIA_30539 COUNTRY        BDI        YEAR           2005 RESIDENCE…
#>  6 273715 MALARIA_30539 COUNTRY        COG        YEAR           2005 RESIDENCE…
#>  7 273716 MALARIA_30539 COUNTRY        COG        YEAR           2005 RESIDENCE…
#>  8 273717 MALARIA_30539 COUNTRY        COG        YEAR           2005 NA        
#>  9 273718 MALARIA_30539 COUNTRY        GIN        YEAR           2005 RESIDENCE…
#> 10 273719 MALARIA_30539 COUNTRY        GIN        YEAR           2005 RESIDENCE…
#> # … with 25,292 more rows, and 16 more variables: Dim1 <chr>, Dim2Type <lgl>,
#> #   Dim2 <lgl>, Dim3Type <lgl>, Dim3 <lgl>, DataSourceDimType <chr>,
#> #   DataSourceDim <lgl>, Value <chr>, NumericValue <dbl>, Low <lgl>,
#> #   High <lgl>, Comments <lgl>, Date <chr>, TimeDimensionValue <chr>,
#> #   TimeDimensionBegin <dbl>, TimeDimensionEnd <dbl>

And once we have that data, we can then filter, explore, and analyze the data with our standard R workflow, or even export the downloaded data to Excel or other analytical tools for further use.