-
Notifications
You must be signed in to change notification settings - Fork 0
/
covid_institutional_performance.Rmd
682 lines (513 loc) · 32.1 KB
/
covid_institutional_performance.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
---
title: "Covid Effects on Institutional Performance"
author: "Rodrigo"
date: "2023-04-28"
output:
pdf_document: default
html_document: default
---
# Summary
This project is a condensed public version of my analysis for the Ministry of Education of Peru (Minedu) in 2021, which was conducted while my team and I designed the 2022 edition of a Results Based Financing (RBF) Program called "Compromisos de Desempeño" (CdD or Performance Commitments). The aim of the analysis was to investigate how the Covid-19 pandemic affected institutional performance, specifically in the education sector in Peru.
The public education sector in Peru is under the governance of the Ministry of Education (Minedu) at the national level. However, 26 Regional Offices and 222 Local Education Management Units (UGELs or Unidades de Gestión Educativa Local) are responsible for various functions and responsabilities, such as enrolling students, hiring teachers, distributing books, addressing bullying and violence cases, among others.
For more information on the public education sector in Peru, please refer to the appendix.
## Data Used
* How to measure Institutional Performance? *
The historical data of CdD results contain the average achievement value or "Valor Logrado" for each evaluated indicator in every UGEL or jurisdiction. These results can be analyzed by year and phase. For example, during phase 1 of the CdD 2020 on January 31st, 2020, Minedu assessed eight indicators among 223 institutions. You can find these results on the CdD Program's website (http://www.minedu.gob.pe/cdd/).
* How to measure the severity of the pandemic in each jurisdiction? *
I chose to utilize the number of Covid-19 related deaths per 1,000 people in each jurisdiction for every quarter between 2020 and 2021. This decision was made due to the belief that this data provides a more accurate representation of the impact of Covid-19 on the country, as Covid-19 infections may have been underreported due to limited testing availability, particularly during the initial stages of the pandemic. It is worth noting that Peru has been transparent in releasing pandemic data, which can be accessed through the "Datos Abiertos" (Open Data) webpage (https://www.datosabiertos.gob.pe/dataset/fallecidos-por-covid-19-ministerio-de-salud-minsa).
* Other data used *
In order to associate districts to UGEL jurisdictions and to better characterize each UGEL, we required more data about the UGELs. For this purpose, I used the education census of 2020, which can be found at https://escale.minedu.gob.pe/uee/-/document_library_display/GMv7/view/6226837.
Additionally, we needed to obtain estimates of Peru's population for 2021, as this was the only available dataset in the "Datos Abiertos" (Open Data) web page, which can be accessed at https://www.datosabiertos.gob.pe/dataset/población-peru.
## Results
Based on the results, there was a significant impact of Covid-19 on UGELs performance in Peru in the first wave of 2020, particularly in areas that were hit harder by the pandemic. However, our analysis of the second wave in 2021 suggests that institutions may have adapted to the pandemic, and that there may have been underlying factors contributing to performance reduction.
# Data processing
## Preparation
```{r message=FALSE, warning=TRUE, include=FALSE}
#Packages
library(readxl)
library(openxlsx)
library(purrr)
library(dplyr)
library(tidyr)
library(writexl)
library(data.table)
library(foreign)
library(tibble)
library(ggplot2)
library(lubridate)
library(ggthemes)
library(stringr)
library(sf)
library(mapsPERU)
library(RColorBrewer)
library(scales)
library(knitr)
library(kableExtra)
# Working directory
f_base <- "F:/Rodrigo/Antiguo Escritorio/Rodrigo/Data Projects/Covid-19 Effects on Institutions"
setwd(f_base)
#Theme for graphs
rColors <- c("#9fafa4","#516875","#f3c09c","#94918f","#9adbae","#3d4246")
```
## Minedu data about schools in Peru
Education Census of 2020.
https://escale.minedu.gob.pe/uee/-/document_library_display/GMv7/view/6226837
```{r echo=FALSE, message=FALSE, warning=FALSE}
#Importing the data from the link as an Excel because there were some character barriers using the DBF format
setwd(f_base)
peru_schools_full_df <- read_excel("padron.xlsx") %>%
as.data.frame()
#Data cleaning
peru_schools <- peru_schools_full_df %>%
mutate(codmod = as.numeric(COD_MOD),
codlocal = as.numeric(CODLOCAL),
ubigeo = as.numeric(CODGEO),
codooii = as.numeric(CODOOII),
level_of_education = case_when(
NIV_MOD %in% c("A1", "A2", "A3","A4") ~ "Preschool",
NIV_MOD == "B0" ~ "Primary school",
NIV_MOD == "F0" ~ "Secondary school",
NIV_MOD %in% c("D1", "D2") ~ "Alternative school",
NIV_MOD %in% c("E0","E1", "E2") ~ "Special needs school",
NIV_MOD %in% c("K0","L0", "M0","T0") ~ "Higher education - not universities"),
n_schools = 1,
region = str_to_title(DPTO),
region = ifelse(region == "Madre De Dios", "Madre de Dios", region),
region = ifelse(region == "San Martin", "San Martín", region),
region = ifelse(region == "Junin", "Junín", region),
region = ifelse(region == "Huanuco", "Huánuco", region),
region = ifelse(region == "Apurimac", "Apurímac", region),
region = ifelse(region == "Ancash", "Áncash", region)) %>%
filter(ANEXO ==0,
D_FORMA == "Escolarizada",
level_of_education != "Higher education - not universities") %>%
rename(school_name = CEN_EDU,
management = D_GESTION,
urban_context = DAREAMED,
geography = REGION_NAT,
latitud = NLAT_IE,
longitud = NLONG_IE,
elevation_msnm = ALTITUD,
ugel = DRE_UGEL) %>%
select(region,ugel,codmod,codlocal,ubigeo,codooii,level_of_education,school_name,management,urban_context,geography,latitud,longitud,elevation_msnm,n_schools)
#Number of districts in Peru
paste0("Number of districts in Peru: ",length(unique(peru_schools$ubigeo)))
#Number of UGEL in Peru
paste0("Number of UGELs in Peru: ",length(unique(peru_schools$codooii)))
#Ubigeo - Codooii relation
ubigeo_codooii <- peru_schools %>%
group_by(codooii,ubigeo) %>%
summarise(n_schools = n(),
ugel = first(ugel))
write_xlsx(ubigeo_codooii,"ubigeo_codooii.xlsx")
ubigeo_codooii <- ubigeo_codooii %>%
group_by(ubigeo) %>%
arrange(desc(n_schools)) %>%
slice(1) %>%
ungroup()
#Map of UGEL
ugel_map_data <- map_DIST %>%
mutate(ubigeo = as.numeric(COD_DISTRITO),
region = DEPARTAMENTO) %>%
select(c(ubigeo,region,coords_x,coords_y,geometry))
ugel_map_data <- merge(ugel_map_data,ubigeo_codooii) %>%
mutate(ugel = str_replace(ugel, "\\d+", ""),
ugel = gsub(" ", " ",ugel))
myPalette <- colorRampPalette(brewer.pal(8, "Set3"))(223)
ugel_map <- ggplot(ugel_map_data, aes(geometry=geometry,fill = as.character(ugel)),linewidth = 0.01) +
geom_sf() +
scale_fill_manual(values = myPalette, guide = FALSE) +
theme_map() +
labs(title = "Map of the UGELs' Jurisdictions")
ugel_map
ggsave("ugel_map.png", plot = ugel_map, width = 3, height = 5, dpi = 300)
#Number of schools in urban and rural areas, by region of Peru
schools_chart <- ggplot(data = peru_schools, aes(x = n_schools, y = reorder(region, n_schools), fill = urban_context)) +
geom_bar(stat = "identity") +
theme_fivethirtyeight() +
scale_fill_manual(values = rColors) +
labs(title = "Number of Schools by Region in Peru",
x = "Region",
y = "Number of Schools",
fill = "Urban/Rural Area ")
schools_chart
ggsave("schools_chart.png", plot = schools_chart, width = 8, height = 5, dpi = 300)
```
We will use the additional module of the Censo Educativo to collect the number of students enrolled in each school district.
Additionally, I used the Education Census to construct more variables to better characterize each UGEL.
```{r}
#Number of students
students <- read.dbf("Matricula_01.dbf", as.is = TRUE) %>%
as.data.frame() %>%
mutate(codmod = as.numeric(COD_MOD),
codooii = as.numeric(CODOOII),
n_students = rowSums(select(., starts_with("D")))) %>%
group_by(codmod, codooii) %>%
summarise(n_students = sum(n_students))
# Additional data from the census
ugel_vars <- merge(peru_schools,students,by=c("codmod","codooii"), all.x = T) %>%
mutate(rural_strudents = ifelse(urban_context =="Rural",n_students,0)) %>%
group_by(codooii) %>%
summarise(region = first(region),
n_students = sum(n_students),
rural_strudents = sum(rural_strudents,na.rm = T),
rurality = sum(rural_strudents)/sum(n_students),
mean_elevation_msnm = mean(elevation_msnm),
n_schools = n(),
prcnt_andes = sum(geography=="SIERRA")/n(),
prcnt_selva = sum(geography=="SELVA")/n())
#Visualizacion
geom_peru <- map_DEP %>%
rename(region = DEPARTAMENTO)
map_data <- ugel_vars %>%
group_by(region) %>%
summarise(n_students = sum(n_students),
rurality = sum(rural_strudents)/sum(n_students))
map_data <- merge(geom_peru, map_data, by = "region")
peru_map <- ggplot(map_data, aes(geometry=geometry)) +
geom_sf(aes()) +
theme_map() +
geom_point(data = map_data, aes(x = coords_x, y = coords_y, size = n_students, color = rurality), alpha = 0.8) +
scale_color_gradient(low = "#516875", high = "#9adbae") +
scale_size(range = c(2, 12)) +
labs(title = "Number of Students in each Region of Peru",
subtitle = "Peru is Divided in 25 Regions and 1,874 Districts",
size = "Number of Students",
color = "Rurality") +
theme(legend.position = c(1, 0.1), legend.key.width = unit(0.2, "cm"),
legend.key.height = unit(.5, "cm"), legend.text = element_text(size = 8),
legend.title = element_text(size = 8))
peru_map
ggsave("peru_map.png", plot = peru_map, width = 8, height = 5, dpi = 300)
```
## Compromisos de Desempeño (CdD) results
The CdD program is evaluated in phases, with each phase having a specific evaluation date. We will only consider results from UGELs based on the "type of entity" column, excluding the Regional Office of Callao (DRE CALLAO) as it functions as another UGEL. The CdD indicators are scored on a scale of 0% to 100%, assessing key processes within the education sector such as book distribution to schools, coverage of teacher and principal positions, retention of students, completion rates for teacher programs and courses, and infrastructure diagnosis by UGELs.
```{r}
#The source of the phases dates comes from the legal documents and directives on the web page
phases_cdd <- read_excel("cdd_phase_date.xlsx") %>%
as.data.frame()
#Results as available on the web page (validation made in-house by the team)
cdd_results <- read.csv("resultados_hist_cdd.csv") %>%
as.data.frame() %>%
filter(tipo_entidad == "UGEL EJECUTORA" | tipo_entidad =="UGEL OPERATIVA" | iged =="DRE CALLAO")
cdd_results <- merge(cdd_results,phases_cdd,by = c("periodo","tramo")) %>%
rename(year = periodo,
phase = tramo,
ugel = iged) %>%
mutate(region = str_to_title(region),
region = ifelse(region == "Madre De Dios", "Madre de Dios", region),
region = ifelse(region == "San Martin", "San Martín", region),
region = ifelse(region == "Junin", "Junín", region),
region = ifelse(region == "Huanuco", "Huánuco", region),
region = ifelse(region == "Apurimac", "Apurímac", region),
region = ifelse(region == "Ancash", "Áncash", region),
region = ifelse(region == "Lima Metropolitana", "City of Lima", region),
region = ifelse(region == "Lima Provincias", "Region of Lima", region))
#Best Perforing Regions Historically
cdd_historical_chart <- cdd_results %>%
group_by(region) %>%
summarise(cdd_result_value = mean(valor_logrado_ufd,na.rm = T)) %>%
ggplot(aes(x = cdd_result_value, y = reorder(region, cdd_result_value), fill = cdd_result_value)) +
geom_bar(stat = "identity", fill = "#9fafa4") +
theme_fivethirtyeight() +
scale_x_continuous(labels = percent_format(accuracy = 1)) +
labs(title = "Best Performing Regions of Peru in the CdD",
subtitle = "Historical data from 2014 to 2022",
x = "Mean Achievement Value",
y = "Regions") +
geom_text(aes(label = paste0(round(cdd_result_value*100, 1), "%")),
position = position_stack(vjust = 1.1),
color = "black", size = 4)
cdd_historical_chart
ggsave("cdd_historical_chart.png", plot = cdd_historical_chart, width = 8, height = 8, dpi = 300)
```
## Ministerio de Salud data about Covid-19 related deaths in Peru
https://www.datosabiertos.gob.pe/dataset/fallecidos-por-covid-19-ministerio-de-salud-minsa
I'm going to group covid deaths by quarter and year for simplicity in the analysis.
```{r}
covid_deaths <- read.csv("fallecidos_covid.csv",sep = ";") %>%
as.data.frame() %>%
mutate(year = as.integer(substr(FECHA_FALLECIMIENTO, 1, 4)),
month = as.integer(substr(FECHA_FALLECIMIENTO, 5, 6)),
ubigeo = as.numeric(UBIGEO),
age = EDAD_DECLARADA) %>%
filter(!is.na(ubigeo))
# Visualization
timeline_covid_data <- covid_deaths %>%
mutate(region = str_to_title(DEPARTAMENTO),
region = ifelse(region == "Madre De Dios", "Madre de Dios", region),
region = ifelse(region == "San Martin", "San Martín", region),
region = ifelse(region == "Junin", "Junín", region),
region = ifelse(region == "Huanuco", "Huánuco", region),
region = ifelse(region == "Apurimac", "Apurímac", region),
region = ifelse(region == "Ancash", "Áncash", region)) %>%
group_by(year,month,region) %>%
summarise(covid_deaths = n())
top_5_hit_regions <- timeline_covid_data %>%
group_by(region) %>%
summarise(total_covid_deaths = sum(covid_deaths)) %>%
arrange(desc(total_covid_deaths)) %>%
slice_head(n = 5) %>%
pull(region)
timeline_covid_data$region_group <- ifelse(timeline_covid_data$region %in% top_5_hit_regions,
timeline_covid_data$region, "Others")
covid_timeline <- timeline_covid_data %>%
mutate(date = ymd(paste(year, month, "01", sep = "-")),
covid_deaths = ifelse(is.na(covid_deaths),0,covid_deaths),
region_group = reorder(region_group, -covid_deaths)) %>%
ggplot(aes(x = date, y = covid_deaths, fill = region_group)) +
geom_area(position = "stack") +
scale_fill_brewer(palette = "Set3") +
theme_fivethirtyeight() +
labs(title = "Regions of Peru with the Highest Covid-19 Impact",
subtitle = "Number of Covid-19 Related Deaths",
x = "Time",
y = "Number of Deaths",
fill = "Region")
covid_timeline
ggsave("covid_timeline.png", plot = covid_timeline, width = 12, height = 8, dpi = 300)
#Grouping covid deaths by quarter for simplicity
covid_deaths_ubigeo <- covid_deaths %>%
group_by(ubigeo) %>%
summarise(covid_deaths_2020_1 = sum(year==2020 & month <=3),
covid_deaths_2020_2 = sum(year==2020 & month >3 & month<=6),
covid_deaths_2020_3 = sum(year==2020 & month >6 & month<=9),
covid_deaths_2020_4 = sum(year==2020 & month >9),
covid_deaths_2021_1 = sum(year==2021 & month <=3),
covid_deaths_2021_2 = sum(year==2021 & month >3 & month<=6),
covid_deaths_2021_3 = sum(year==2021 & month >6 & month<=9),
covid_deaths_2021_4 = sum(year==2021 & month >9),
covid_deaths = n())
total_deaths <- as.numeric(sum(covid_deaths$covid_deaths))
print(total_deaths)
```
Timeline of Covid and UGELs performance
```{r}
timeline_cdd <- cdd_results %>%
filter(year > 2019) %>%
mutate(month = mes_corte) %>%
group_by(year,month) %>%
summarise(cdd_result_value = mean(valor_logrado_ufd,na.rm = T))
timeline_covid <- covid_deaths %>%
group_by(year,month) %>%
summarise(covid_deaths = n())
#Timeline
timeline <- merge(timeline_covid, timeline_cdd, by = c("year","month"),all = T) %>%
add_row(year = 2020, month = 2) %>%
mutate(date = ymd(paste(year, month, "01", sep = "-")),
cdd_result_value = ifelse(is.na(cdd_result_value),0,cdd_result_value),
covid_deaths = ifelse(is.na(covid_deaths),0,covid_deaths))
timelinechart <- ggplot(timeline, aes(x = date)) +
geom_bar(aes(y = cdd_result_value * 20000), stat = "identity", fill = "#9fafa4") +
geom_line(aes(y = covid_deaths),color = "#3d4246",alpha = .8,size = 1.2) +
theme_fivethirtyeight() +
labs(title = "CdD Results and Covid-19 Impact",
subtitle = "Mean Achievement Value of UGELs & Covid-19 Deaths",
x = "Phases of the CdD",
y = "Number of Covid-19 Deaths",
fill = "Achievment Value",
color = "Number of Covid-19 Deaths") +
geom_text(data = subset(timeline, cdd_result_value > 0),
aes(y = (cdd_result_value + .035) * 20000, label = paste0(scales::percent(cdd_result_value, accuracy = 0.1))),
hjust = 0.5, size = 3, color = "#3d4246")
timelinechart
ggsave("timelinechart.png", plot = timelinechart, width = 14, height = 8, dpi = 300)
```
As we can see, the two waves of the pandemic in Peru coincide with reductions in the performance of the UGELs in the CdD, meaning a performance dip in both cases. The first Covid-19 case in Peru was reported in March 2020, by this time, Minedu had already evaluated 8 indicators in the phase 1 of the CdD 2020. This event, gives us the possibility to see how the UGELs would have performed on normal conditions.
It's important to remember that the indicators evaluated in each phase fo the CdD are different, depending on the design of the program, so comparison is not 100% accurate. Nevertheless, it is valuable to see if there is any correlation between the mean achievement value and Covid-19 related deaths in the jurisdiction, for each UGEL.
# Analysis
I will analyze how these tough quarters for Peru, in regards of Covid-19 impact, affected the performance of the UGELs, measured as the mean achievement value of each institution on the indicators evaluated in each phase. I will perform some simple correlation analysis, as well as some linear regression analysis.
## Correlation analysis
* Correlation of the mean achievement value (phases 2 and 3 of 2020) vs Covid-19 related deaths (quarters 2 and 3 of 2020).
* Correlation of the mean achievement value (phases 1 and 2 of 2021) Covid-19 related deaths (quarters 1 and 2 of 2021).
* Covid hit: Correlation and linear regression of (phase 1 2020 - mean vachievement value of phases 2 and 3) vs Covid-19 related deaths (quarters 2 and 3 of 2020).
### Final Data Preparations
I used data from INEI (Equivalent to a Census Bureau in Peru) to collect estimated data of the population. This way, I can express the Covid-19 related deaths relative to each jurisdiction population.
```{r}
#Population in Peru
peru_population <- read.csv("TB_POBLACION_INEI.csv",sep = ",") %>%
as.data.frame() %>%
mutate(ubigeo = ubigeo_inei) %>%
group_by(ubigeo) %>%
summarise(population = sum(Cantidad))
sum(peru_population$population)
df_list <- list(covid_deaths_ubigeo, peru_population, ubigeo_codooii)
covid_deaths_pops <- df_list %>% reduce(full_join, by="ubigeo")
#Now we need to calculate how many Covid-19 related death occurred in every UGEL jurisdiction, as well as the population of each jurisdiction
# Also we want to analyze covid deaths relative to jurisdiction population. I chose covid deaths per 1k inhabitants.
covid_deaths_ugel <- covid_deaths_pops %>%
group_by(codooii) %>%
summarise(covid_deaths_2020_1 = 1000*sum(covid_deaths_2020_1,na.rm = T)/sum(population),
covid_deaths_2020_2 = 1000*sum(covid_deaths_2020_2,na.rm = T)/sum(population),
covid_deaths_2020_3 = 1000*sum(covid_deaths_2020_3,na.rm = T)/sum(population),
covid_deaths_2020_4 = 1000*sum(covid_deaths_2020_4,na.rm = T)/sum(population),
covid_deaths_2021_1 = 1000*sum(covid_deaths_2021_1,na.rm = T)/sum(population),
covid_deaths_2021_2 = 1000*sum(covid_deaths_2021_2,na.rm = T)/sum(population),
covid_deaths_2021_3 = 1000*sum(covid_deaths_2021_3,na.rm = T)/sum(population),
covid_deaths_2021_4 = 1000*sum(covid_deaths_2021_4,na.rm = T)/sum(population),
covid_deaths = sum(covid_deaths,na.rm = T),
population = sum(population))
# Mean achievement value by year and phase from 2020 to 2021, we will call this variables vl_*
cdd_results_ugel <- cdd_results %>%
filter(year == 2020 | year == 2021) %>%
group_by(codooii) %>%
summarise(ugel = first(ugel),
vl_2020_1 = mean(valor_logrado_ufd[year==2020 & phase==1],na.rm = T),
vl_2020_2 = mean(valor_logrado_ufd[year==2020 & phase==2],na.rm = T),
vl_2020_2_3 = mean(valor_logrado_ufd[year==2020 & (phase==2 | phase==3)],na.rm = T),
vl_2020_4 = mean(valor_logrado_ufd[year==2020 & phase==4],na.rm = T),
vl_2021_1_2 = mean(valor_logrado_ufd[year==2021 & (phase==1 | phase==2)],na.rm = T),
vl_2021_3 = mean(valor_logrado_ufd[year==2021 & phase==3],na.rm = T))
#We need to merge these three dataframes by codooii
df_list2 <- list(cdd_results_ugel, covid_deaths_ugel, ugel_vars)
analysis_df <- df_list2 %>% reduce(full_join, by="codooii")
head(analysis_df)
#Regions with the higher population in Peru:
analysis_df %>%
group_by(region) %>%
summarise(population = sum(population),
rurality = sum(rural_strudents)/sum(n_students),
covid_deaths = sum(covid_deaths)) %>%
arrange(desc(population)) %>%
slice_head(n = 10)
```
### First Wave of Covid-19 in Peru (2nd and 3rd quarters of 2020)
Based on the results, it is apparent that during the first wave of Covid-19 in Peru, there was a strong correlation between the number of Covid-19 related deaths relative to the population of each jurisdiction, and the performance of the UGELs on the CdD Program of Minedu. This means that in jurisdictions that were hit harder by the pandemic, the performance of the UGELs was significantly lower. This could be due to a variety of factors, such as increased pressure on UGELs to respond to the pandemic, including adapting to remote learning models and ensuring the safety of students and staff, as well as dealing with the social and economic consequences of the pandemic.
It's important to note that this correlation may not necessarily imply causation. Other factors, such as limited human capital in rural areas of the country and geographic challenges, may also have contributed to the variation in UGEL performance. Nonetheless, the correlation does suggest that the pandemic had a significant impact on the performance of UGELs in Peru.
```{r}
#2020 subset
df_2020 <- analysis_df %>%
mutate(covid_deaths_2020_2_3 = covid_deaths_2020_2 + covid_deaths_2020_3) %>%
filter(!is.na(vl_2020_2_3)) %>%
select(vl_2020_2_3,covid_deaths_2020_2_3)
#Correlation test
cor(df_2020)
cor_2020 <- cor.test(df_2020$covid_deaths_2020_2_3, df_2020$vl_2020_2_3)
cor_2020$estimate # correlation coefficient
cor_2020$p.value # p-value
#Visualization
corr_2020_chart <- df_2020 %>%
ggplot(aes(x = covid_deaths_2020_2_3,y=vl_2020_2_3)) +
geom_point(color = "#516875") +
geom_smooth(method = "lm", se = FALSE,color = "#9fafa4",size=1.2) +
theme_fivethirtyeight() +
labs(title = "Correlation: UGEL Performance & Covid Deaths", subtitle = "First Wave - 2020") +
annotate("text", x = Inf, y = Inf, label = paste0("Correlation Coefficient: ",
paste0(scales::percent(cor_2020$estimate, accuracy = 0.1))),
hjust = 1, vjust = 1) +
annotate("text", x = Inf, y = Inf, label = paste0("P-Value: ",
paste0(scales::percent(cor_2020$p.value, accuracy = 0.1))),
hjust = 1, vjust = 2.5) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
theme(axis.title = element_text()) + ylab("CdD Results") + xlab("Covid Deaths per 1k People")
corr_2020_chart
ggsave("corr_2020_chart.png", plot = corr_2020_chart, width = 8, height = 5, dpi = 300)
```
### Second Wave of Covid-19 in Peru (1st and 2nd quarters of 2021)
Based on the data analyzed, it seems that the second wave of Covid-19 related deaths in Peru may not be significantly correlated with UGEL performance, as the p-value is higher than .25. This could be an indication that institutions have become better adapted to the pandemic after a year of experience, and other factors may be contributing to the performance hit.
```{r}
#2021 subset
df_2021 <- analysis_df %>%
mutate(covid_deaths_2021_1_2 = covid_deaths_2021_1 + covid_deaths_2021_2) %>%
filter(!is.na(vl_2021_1_2)) %>%
select(vl_2021_1_2,covid_deaths_2021_1_2)
#Correlation test
cor(df_2021)
cor_2021 <- cor.test(df_2021$covid_deaths_2021_1_2, df_2021$vl_2021_1_2)
cor_2021$estimate # correlation coefficient
cor_2021$p.value # p-value
#Visualization
corr_2021_chart <- df_2021 %>%
ggplot(aes(x = covid_deaths_2021_1_2,y=vl_2021_1_2)) +
geom_point(color = "#516875") +
geom_smooth(method = "lm", se = FALSE,color = "#9fafa4",size=1.2) +
theme_fivethirtyeight() +
labs(title = "Correlation: UGEL Performance & Covid Deaths", subtitle = "Second Wave - 2021") +
annotate("text", x = Inf, y = Inf, label = paste0("Correlation Coefficient: ",
paste0(scales::percent(cor_2021$estimate, accuracy = 0.1))),
hjust = 1, vjust = 1) +
annotate("text", x = Inf, y = Inf, label = paste0("P-Value: ",
paste0(scales::percent(cor_2021$p.value, accuracy = 0.1))),
hjust = 1, vjust = 2.5) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
theme(axis.title = element_text()) + ylab("CdD Results") + xlab("Covid Deaths per 1k People")
corr_2021_chart
ggsave("corr_2021_chart.png", plot = corr_2021_chart, width = 8, height = 5, dpi = 300)
```
### Covid-19 Hit on Performance During the 1st Wave
Based on the analysis conducted, it appears that there was a correlation between the impact of Covid-19 on each jurisdiction and the reduced performance of the UGELs in those jurisdictions during the first wave of the pandemic. Although, this correlation seems to be less evident in this case, and there might be other factors in play.
```{r}
#2021 subset
df_2020_hit <- analysis_df %>%
mutate(covid_deaths_2020_2_3 = covid_deaths_2020_2 + covid_deaths_2020_3,
vl_2020_dif = vl_2020_2_3 - vl_2020_1) %>%
filter(!is.na(vl_2020_dif)) %>%
select(vl_2020_dif,covid_deaths_2020_2_3)
#Correlation test
cor(df_2020_hit)
cor_2020_hit <- cor.test(df_2020_hit$covid_deaths_2020_2_3, df_2020_hit$vl_2020_dif)
cor_2020_hit$estimate # correlation coefficient
cor_2020_hit$p.value # p-value
#Visualization
corr_2020hit_chart <- df_2020_hit %>%
ggplot(aes(x = covid_deaths_2020_2_3,y=vl_2020_dif))+
geom_point(color = "#516875") +
geom_smooth(method = "lm", se = FALSE,color = "#9fafa4",size=1.2) +
theme_fivethirtyeight() +
labs(title = "UGEL Performance Reduction & Covid Deaths", subtitle = "First Wave - 2020") +
annotate("text", x = Inf, y = Inf, label = paste0("Correlation Coefficient: ",
paste0(scales::percent(cor_2020_hit$estimate, accuracy = 0.1))),
hjust = 1, vjust = 1) +
annotate("text", x = Inf, y = Inf, label = paste0("P-Value: ",
paste0(scales::percent(cor_2020_hit$p.value, accuracy = 0.1))),
hjust = 1, vjust = 2.5) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
theme(axis.title = element_text()) + ylab("CdD Results Reduction") + xlab("Covid Deaths per 1k People")
corr_2020hit_chart
ggsave("corr_2020hit_chart.png", plot = corr_2020hit_chart, width = 8, height = 5, dpi = 300)
```
## Linear regression analysis.
Linear regressions are a powerful statistical tool that can help analyze the impact of other factors, other than Covid-19, that affected UGELs performance on the CdD program. By using regression models, we can estimate the relationships between different variables in different jurisdictions. This approach can provide a more comprehensive analysis of the underlying factors that influenced institutional performance during the pandemic.
In order to estimate the performance of UGELs using linear regression models, we included several variables in the analysis. Besides Covid-19 related deaths, which were measured relative to the population in each quarter, we also considered the number of students in each jurisdiction, the degree of rurality, the number of schools, the mean elevation, and the percentage of schools located in the Andes or in the rain forest. By analyzing the impact of these different factors, we can better understand how they have influenced the performance of UGELs during the pandemic. However, only few factors had an effect on UGELs performance, here's a brief overview of the analysis.
In the first model, we can observe that Covid-19 related deaths had a significant impact on UGELs performance during the first wave in 2020, but the effect was reduced during the third quarter of that year. Additionally, the number of students had a significant but small impact on CdD results.
In the second model, we found that the impact of Covid-19 on UGELs performance during the second wave in Peru was not as significant as during the first wave in 2020. However, the model revealed some interesting and somewhat contradictory results, as it showed that Covid-19 related deaths in the 2nd quarter and rurality had a positive impact on UGELs performance. One possible explanation could be that urban areas had multiple lockdowns mandates enforced by the government relative and that could have hinder the results of the UGELs. However it is important to conduct more research to fully understand the underlying factors contributing to these results.
Finally, the third model may indicate that, although this analysis may be headed on the right path, further analysis may be needed to finally understand how Covid-19 effected UGELs performance. None of the variables selected explain the reduction in achievement value between phase 1 of the CdD 2020 and phases 2 and 3 of the same year.
```{r}
head(analysis_df)
analysis_df_scaled <- analysis_df %>%
select(-c(ugel,region)) %>%
as.data.frame(scale())
#First pandemic wave: 2020
model_2020 <- lm(vl_2020_2_3 ~ covid_deaths_2020_2 + covid_deaths_2020_3 + n_students + rurality,
data = analysis_df_scaled)
#Second pandemic wave: 2021
model_2021 <- lm(vl_2021_1_2 ~ covid_deaths_2021_1 + covid_deaths_2021_2 + n_students + rurality,
data = analysis_df_scaled)
#First pandemic hit on performance
df_2020_hit_full <- analysis_df %>%
mutate(vl_2020_dif = vl_2020_2_3 - vl_2020_1) %>%
select(-c(ugel,region)) %>%
as.data.frame(scale())
model_2020_hit <- lm(vl_2020_dif ~ covid_deaths_2020_2 + covid_deaths_2020_3 + n_students + rurality,
data = df_2020_hit_full)
#Models summary
summary(model_2020)
summary(model_2021)
summary(model_2020_hit)
```
# Consclusions and recommendations
In conclusion, our research highlights the significant impact that the first wave of Covid-19 had on UGELs performance in Peru in 2020, particularly in areas that were hit harder by the pandemic. However, our analysis of the second wave in 2021 suggests that institutions may have adapted to the pandemic, and that there may have been underlying factors contributing to performance reduction.
While our findings provide some insights into the effects of Covid-19 on UGELs performance, more research is needed to fully understand the underlying factors contributing to these changes. Further research could include exploring individual indicators evaluated during 2019, 2020, and 2021 to identify trends related to Covid-19 impact on UGELs performance, and incorporating other variables into the linear regression models, such as health access and income in the jurisdiction.
# Exports
Excel tables
```{r}
export_sheets <- list("Timeline" = timeline, "Results dataframe"= analysis_df, "Covid data - UGEL" = covid_deaths_ugel)
openxlsx::write.xlsx(export_sheets, file = "Analysis Results.xlsx")
```
Summary of the lineal regression models
```{r}
library(stargazer)
stargazer(model_2020, type = "html", out = "model_2020.html")
stargazer(model_2021, type = "html", out = "model_2021.html")
stargazer(model_2020_hit, type = "html", out = "model_2020_hit.html")
```