Measuring Geographic Distributions with GeoPandas: Weighted Mean Center

Table of Contents

Introduction

The “unweighted” mean center is mainly used for events that occur at a place and time such as burglaries. The weighted center, however, is predominantly used for stationary features such as retail outlets or delineated areas for example (such as Census tracts). The Weighted Mean Center does not take into account distance between features in the dataset.

The weight needs to be a numerical attribute. The greater the value, the higher the weight for that feature.

Sources:
The Esri Guide to GIS Analysis, Volume 2: Spatial Measurements and Statistics.
An Introduction to Statistical Problem Solving in Geography

This course is designed to instill the basics of Python Programming by incrementally increasing your knowledge session-upon-session. In each section you will be given new material for a workbook to fill out and by the end of this course you will have your very own Python reference handbook. So how does this course have a GIS focus? Simple, most elements of the course have GIS and geospatial data in mind. Instead of using non-descript variables and values, we will use terms such as population, city, x_coord, y_coord, and so on. This will aid participants with pinpointing how they can relate geospatial data to Python. 

The Formula

The Weighted Mean Center is calculated by multiplying the x and y coordinate by the weight for that feature and summing all for both x and y individually, and then dividing this by the sum of all the weights.

GIS Geospatial Weighted Mean Center formula

For Point features the X and Y coordinates of each feature is used, for Polygons the centroid of each feature represents the X and Y coordinate to use, and for Linear features the mid-point of each line is used for the X and Y coordinate.

Using GeoPandas to Calculate the Weighted Mean Center

The code below uses GeoPandas and Shapely to find the weighted mean center for a dataset based on the formula above and create an output file. In our example we will use a Shapefile, but you can use any input and output filetypes that you have available with your GeoPandas setup. 

The code is heavily commented for ease of understanding the workflow. For a Point, we now we need get the mean value for all x values and y values. For a Polyline, we need to first get the midpoint of each line, and then get the mean value for all x values and y values for the midpoints. For a Polygon, we need to the the centroid value for each polygon, and then get the mean value for all x values and y values for the centroids.

				
					import geopandas as gpd
from shapely.geometry import Point

## input shapefile path
in_shp = r"path\to\input\shapefile\input.shp"

## the output shapefile path for the weighted mean center point
out_shp = r"path\to\output\shapefile\output.shp"

## the field that contains the numerical weight
weight_fld = "FIELD_NAME"

## read in the shapefile to a GeoDataFrame
gdf = gpd.read_file(in_shp)

## get the geometry type from the first record
geom_type = gdf.geom_type[0]

## get the EPSG code
crs = gdf.crs

## for Point geometry
if geom_type == "Point":
    ## get all x and y values
    gdf["x"] = gdf.geometry.x
    gdf["y"] = gdf.geometry.y

## for LineString geometry
elif geom_type == "LineString":
    ## get all x and y values for the midpoints
    gdf["midpoint"] = gdf.geometry.interpolate(0.5, normalized=True)
    gdf["x"] = gdf["midpoint"].x
    gdf["y"] = gdf["midpoint"].y

## for Polygon geometry
elif geom_type == "Polygon":
    ## get all x and y values for the centroids
    gdf["centroid"] = gdf.geometry.centroid
    gdf["x"] = gdf["centroid"].x
    gdf["y"] = gdf["centroid"].y

## get the sum of the x and y values multiplied by the weight for each feature
sum_of_x_wgts = (gdf["x"] * gdf[weight_fld]).sum()
sum_of_y_wgts = (gdf["y"] * gdf[weight_fld]).sum()

## get the total sum of all weights
sum_of_wgts = gdf[weight_fld].sum()

## divide the sum of x and y weights by the sum of the weights
weighted_x = sum_of_x_wgts / sum_of_wgts
weighted_y = sum_of_y_wgts / sum_of_wgts

## create a point geometry representing the weighted mean center
weighted_mean_center = Point(weighted_x, weighted_y)

## create a GeoDataFrame with the point geometry
gdf_weighted_mean_center = gpd.GeoDataFrame(geometry=[weighted_mean_center], crs=crs)

## add fields for the XCoord and YCoord
gdf_weighted_mean_center["XCoord"] = weighted_x
gdf_weighted_mean_center["YCoord"] = weighted_y

## write the weighted mean center point to the output shapefile
gdf_weighted_mean_center.to_file(out_shp, driver="ESRI Shapefile")

				
			

Weighted Mean Center in Action

Data for Primary School location was downloaded from the Department of Education (Ireland) and processed to contain Primary Schools in County Kildare in a projected coordinate system – Irish Transverse Mercator (EPSG:2157). You can download the Shapefile containing the data used below here.

Weighted Mean Center Geopandas

Running the script produces a Shapefile that contains the Weighted Mean Center based on schools in Kildare using the total number of pupils as the weight factor.

Weighted Mean Center with Geopandas for Measuring Geographic Distributions

Below is a comparison between our GeoPandas tool and the (Weighted) Mean Center tool output from ArcGIS Pro. Spot on!

Weighted Mean Center Geopandas Measuring Geographic Distributions
Weighted Mean Center ArcGIS Pro Measuring Geographic Distributions

At Final Draft Mapping we provide comprehensive courses for automating tasks within ArcGIS Pro and ArcGIS Online with ArcPy and the ArcGIS API for Python. Courses range from beginner to advanced workflows and all paid courses provide extra support where you can ask questions. Automation within ArcGIS is a highly sought after skill, by adding these skills to your arsenal you are placing yourself at the forefront of that demand. 

We appreciate our blog readers, you can get 25% off any (non-sale) course at any time with the code FDMBLOG25

Also in this series...

Leave a Comment

Your email address will not be published. Required fields are marked *