homework_4

# Homework 4 – Spatial Analysis of Peru’s Protected Areas  

---

## 1. Learning goals
1. Practice querying vector data from **OpenStreetMap (OSM)** via the Overpass API.  
2. Download and work with **MODIS Vegetation Continuous Fields (VCF, product MOD44B v061)** for the year 2024.  
3. Perform buffer-based spatial joins to link raster pixels with protected-area boundaries.  
4. Compute and interpret summary statistics of tree-cover variables at multiple buffer widths.  
5. *(Extra credit)* Build a **regression-discontinuity (RD) plot** to explore causal impacts of legal protection on forest cover.

---

## 2. Required software & libraries
* **Python ≥ 3.9**
* Typical spatial stack  
  * **Python**: `osmnx`, `pyrosm`, `requests`, `geopandas`, `rasterio`, `rioxarray`, `xarray`, `numpy`, `pandas`, `shapely`, `matplotlib`, `statsmodels`  

> **Tip:** Generate a new environment in conda for this project such that all the libraries are in the correct version
---

## 3. Tasks (detailed)

### 3.1 Acquire Peru’s protected-area polygons from OSM (25 pts) [info](https://wiki.openstreetmap.org/wiki/API)
1. **Formulate an Overpass query** that returns all polygons tagged with any of the following (logical **OR**):
   * `boundary=protected_area`
   * `leisure=nature_reserve`
   * `protect_class=*`  
   Limit to objects **inside Peru** (ISO 3166-1 alpha-2 code `PE`).
2. Fetch the data programmatically (e.g., with `requests` or `osmnx.features_from_polygon`).  
3. **Filter** to keep national / regional parks, wildlife reserves, and forest reserves.  
   *Keep your filtering rule explicit in code (e.g., accept `protect_class` ∈ {“2”, “4”, …}).*  
4. **Save** the final boundaries to GeoPackage `peru_protected_areas.gpkg`.

### 3.2 Download VCF (MOD44B v061) for 2024 (25 pts) [info](https://lpdaac.usgs.gov/products/mod44bv061/)
1. Register for an **Earthdata Login** (if you haven’t already).  
2. Using `requests` or `wget`, programmatically pull **all tiles covering Peru** for **collection 6.1**, subset “Percent Tree Cover” (science dataset `Percent_Tree_Cover`), year = 2024.  
3. **Mosaic & reproject** to **EPSG:4326**; name the raster `vcf_2024_peru.tif`.  
   *Hint: `rio merge`, `gdalwarp -t_srs EPSG:4326`, or `rioxarray.combine_by_coords`.*
4. It has many layers. We want to work with **Percent_Tree_Cover**.

### 3.3 Generate buffer zones & extract raster data (50 pts)
For each protected-area polygon **P** and each bandwidth **b ∈ {5, 10, 20, 25 km}**:

1. Create two mutually exclusive geometries:  
   * **Inner buffer**: `b` km inside the boundary.  
   * **Outer buffer**: `b` km out of the polygon since the boundary.
    * Use the boundary to generate these rings. We want to inspect forest around protected areas. 
2. Use Zonal stats to generate the statistics (mean, median, sd, n pixels) forest cover for the inner buffer and .  
3. Store results in a tidy CSV:  

   | park_id | buffer_km | in_out | mean_tc | median_tc | sd_tc | Percent_Tree_Cover |
   |---------|-----------|--------|---------|-----------|-------|-----------------|

4. Generate a simple binary regression with mean of Percent_Tree_Cover as outcome and D (in boundary==1) treatment variable. Weight the observations using the number of pixels. 

### 3.5 *(Extra credit, 100 pts)* RD plot at pixel level
1. Change the aggregation level of  your data at pixel level. Keep all the pixels inside the inner and outer buffer areas using pixel centroids. We want this analysis only for b = 25 km.
2. Find the minimum distance from the pixel centroid to boundary.
3. Generate rd plots using distance as the score and forest value (Percent_Tree_Cover value of the raster). The score has positive values if the centroid is in the inner buffer, and negative otherwise.
4. Discuss the results

## 4. 🌳 Paper Analysis with LLMs
Download the following [paper](https://revistasinvestigacion.unmsm.edu.pe/index.php/iigeo/article/view/20638) titled Tree Cover and Carbon Dioxide Capture in Urban Parks: The Case of Northern Lima.

##  Objetive
Develop a reproducible workflow that allows you to:
1. Upload and process documents (PDF).
2. Extract information (text).
3. Apply prompt engineering techniques to obtain insights using language models (LLMs), such as park efficiency rankings, optimal species recommendations, and district-level diagnostics.
4. Assess their use for the development of public policies.

---

##  Required Tools
- Google Colab
- Model GPT: 3.5 (Important)
- Libraries:
  - `PyMuPDF` (`fitz`)
  
  - `API KEY GPT: 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`
  - `openia`

---
---


## 5. Helpful references
* **Overpass API & OSM tags**  
  * *Beginner guide:* <https://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_turbo/Quick_guide>  
  * Example Python usage (`osmnx`): <https://osmnx.readthedocs.io/en/stable/osmnx.html#osmnx.geometries.geometries_from_polygon>

* **MODIS VCF (MOD44B v061, 250 m)**  
  * Product page: <https://lpdaac.usgs.gov/products/mod44bv061/>  
  * Direct HTTPS directory (LAADS DAAC): `https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MOD44B/2024/`  

* **Raster & vector handling**  
  * *Rasterio* tutorial: <https://rasterio.readthedocs.io/en/latest/quickstart.html>  
  * *GeoPandas* buffering: <https://geopandas.org/en/stable/docs/user_guide/geometric_manipulations.html#buffer>  
  * *rioxarray* for merging tiles: <https://corteva.github.io/rioxarray/stable/examples/merge_tutorial.html>

* **RD visualization**  
  * `statsmodels.nonparametric.kernel_regression.KernelReg` (Python)  
  * `rdrobust` package (R): <https://rdpackages.github.io/>  

Deadline June 1st, 23:59. 
---


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

homework_4 #100

Homework 4 – Spatial Analysis of Peru’s Protected Areas

1. Learning goals

2. Required software & libraries

3. Tasks (detailed)

3.1 Acquire Peru’s protected-area polygons from OSM (25 pts) info

3.2 Download VCF (MOD44B v061) for 2024 (25 pts) info

3.3 Generate buffer zones & extract raster data (50 pts)

3.5 (Extra credit, 100 pts) RD plot at pixel level

4. 🌳 Paper Analysis with LLMs

Objetive

Required Tools

5. Helpful references

Deadline June 1st, 23:59.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

homework_4 #100

Description

Homework 4 – Spatial Analysis of Peru’s Protected Areas

1. Learning goals

2. Required software & libraries

3. Tasks (detailed)

3.1 Acquire Peru’s protected-area polygons from OSM (25 pts) info

3.2 Download VCF (MOD44B v061) for 2024 (25 pts) info

3.3 Generate buffer zones & extract raster data (50 pts)

3.5 (Extra credit, 100 pts) RD plot at pixel level

4. 🌳 Paper Analysis with LLMs

Objetive

Required Tools

5. Helpful references

Deadline June 1st, 23:59.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions