Homework 4 – Spatial Analysis of Peru’s Protected Areas
1. Learning goals
- Practice querying vector data from OpenStreetMap (OSM) via the Overpass API.
- Download and work with MODIS Vegetation Continuous Fields (VCF, product MOD44B v061) for the year 2024.
- Perform buffer-based spatial joins to link raster pixels with protected-area boundaries.
- Compute and interpret summary statistics of tree-cover variables at multiple buffer widths.
- (Extra credit) Build a regression-discontinuity (RD) plot to explore causal impacts of legal protection on forest cover.
2. Required software & libraries
- Python ≥ 3.9
- Typical spatial stack
- Python:
osmnx, pyrosm, requests, geopandas, rasterio, rioxarray, xarray, numpy, pandas, shapely, matplotlib, statsmodels
Tip: Generate a new environment in conda for this project such that all the libraries are in the correct version
3. Tasks (detailed)
3.1 Acquire Peru’s protected-area polygons from OSM (25 pts) info
- Formulate an Overpass query that returns all polygons tagged with any of the following (logical OR):
boundary=protected_area
leisure=nature_reserve
protect_class=*
Limit to objects inside Peru (ISO 3166-1 alpha-2 code PE).
- Fetch the data programmatically (e.g., with
requests or osmnx.features_from_polygon).
- Filter to keep national / regional parks, wildlife reserves, and forest reserves.
Keep your filtering rule explicit in code (e.g., accept protect_class ∈ {“2”, “4”, …}).
- Save the final boundaries to GeoPackage
peru_protected_areas.gpkg.
3.2 Download VCF (MOD44B v061) for 2024 (25 pts) info
- Register for an Earthdata Login (if you haven’t already).
- Using
requests or wget, programmatically pull all tiles covering Peru for collection 6.1, subset “Percent Tree Cover” (science dataset Percent_Tree_Cover), year = 2024.
- Mosaic & reproject to EPSG:4326; name the raster
vcf_2024_peru.tif.
Hint: rio merge, gdalwarp -t_srs EPSG:4326, or rioxarray.combine_by_coords.
- It has many layers. We want to work with Percent_Tree_Cover.
3.3 Generate buffer zones & extract raster data (50 pts)
For each protected-area polygon P and each bandwidth b ∈ {5, 10, 20, 25 km}:
-
Create two mutually exclusive geometries:
- Inner buffer:
b km inside the boundary.
- Outer buffer:
b km out of the polygon since the boundary.
- Use the boundary to generate these rings. We want to inspect forest around protected areas.
-
Use Zonal stats to generate the statistics (mean, median, sd, n pixels) forest cover for the inner buffer and .
-
Store results in a tidy CSV:
| park_id |
buffer_km |
in_out |
mean_tc |
median_tc |
sd_tc |
Percent_Tree_Cover |
-
Generate a simple binary regression with mean of Percent_Tree_Cover as outcome and D (in boundary==1) treatment variable. Weight the observations using the number of pixels.
3.5 (Extra credit, 100 pts) RD plot at pixel level
- Change the aggregation level of your data at pixel level. Keep all the pixels inside the inner and outer buffer areas using pixel centroids. We want this analysis only for b = 25 km.
- Find the minimum distance from the pixel centroid to boundary.
- Generate rd plots using distance as the score and forest value (Percent_Tree_Cover value of the raster). The score has positive values if the centroid is in the inner buffer, and negative otherwise.
- Discuss the results
4. 🌳 Paper Analysis with LLMs
Download the following paper titled Tree Cover and Carbon Dioxide Capture in Urban Parks: The Case of Northern Lima.
Objetive
Develop a reproducible workflow that allows you to:
- Upload and process documents (PDF).
- Extract information (text).
- Apply prompt engineering techniques to obtain insights using language models (LLMs), such as park efficiency rankings, optimal species recommendations, and district-level diagnostics.
- Assess their use for the development of public policies.
Required Tools
- Google Colab
- Model GPT: 3.5 (Important)
- Libraries:
5. Helpful references
Deadline June 1st, 23:59.
Homework 4 – Spatial Analysis of Peru’s Protected Areas
1. Learning goals
2. Required software & libraries
osmnx,pyrosm,requests,geopandas,rasterio,rioxarray,xarray,numpy,pandas,shapely,matplotlib,statsmodels3. Tasks (detailed)
3.1 Acquire Peru’s protected-area polygons from OSM (25 pts) info
boundary=protected_arealeisure=nature_reserveprotect_class=*Limit to objects inside Peru (ISO 3166-1 alpha-2 code
PE).requestsorosmnx.features_from_polygon).Keep your filtering rule explicit in code (e.g., accept
protect_class∈ {“2”, “4”, …}).peru_protected_areas.gpkg.3.2 Download VCF (MOD44B v061) for 2024 (25 pts) info
requestsorwget, programmatically pull all tiles covering Peru for collection 6.1, subset “Percent Tree Cover” (science datasetPercent_Tree_Cover), year = 2024.vcf_2024_peru.tif.Hint:
rio merge,gdalwarp -t_srs EPSG:4326, orrioxarray.combine_by_coords.3.3 Generate buffer zones & extract raster data (50 pts)
For each protected-area polygon P and each bandwidth b ∈ {5, 10, 20, 25 km}:
Create two mutually exclusive geometries:
bkm inside the boundary.bkm out of the polygon since the boundary.Use Zonal stats to generate the statistics (mean, median, sd, n pixels) forest cover for the inner buffer and .
Store results in a tidy CSV:
Generate a simple binary regression with mean of Percent_Tree_Cover as outcome and D (in boundary==1) treatment variable. Weight the observations using the number of pixels.
3.5 (Extra credit, 100 pts) RD plot at pixel level
4. 🌳 Paper Analysis with LLMs
Download the following paper titled Tree Cover and Carbon Dioxide Capture in Urban Parks: The Case of Northern Lima.
Objetive
Develop a reproducible workflow that allows you to:
Required Tools
PyMuPDF(fitz)API KEY GPT: sk-proj-I4aPpyL-B4srUDYvz6UNVs3eq8tWzZsCowFUDri8BhLPhX6z_vyuqqR7C_aTyKi-eMXeemCESMT3BlbkFJyX_znbwbYs17CjAe3gU1tQ05O_J-akrulnBc9UtQEE_GCrqeLYh5ljB-UsLh_6NfvONgqFTQUAopenia5. Helpful references
Overpass API & OSM tags
osmnx): https://osmnx.readthedocs.io/en/stable/osmnx.html#osmnx.geometries.geometries_from_polygonMODIS VCF (MOD44B v061, 250 m)
https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MOD44B/2024/Raster & vector handling
RD visualization
statsmodels.nonparametric.kernel_regression.KernelReg(Python)rdrobustpackage (R): https://rdpackages.github.io/Deadline June 1st, 23:59.