Measuring Geographic Distributions with GeoPandas: Median Center

Table of Contents

Introduction

Also known as the Center of Minimum Distance, the Median Center is a location that is the shortest total distance to all features in the study area (not to be confused with the Central Feature, which is the feature that is the shortest distance to all others). It can be used to find a suitable location for something that needs to be centrally located. The Median Center will gravitate towards an area with the most features.

The Median Center is good for finding the most accessible location.

Sources:
The Esri Guide to GIS Analysis, Volume 2: Spatial Measurements and Statistics.
An Introduction to Statistical Problem Solving in Geography

This course is designed to instill the basics of Python Programming by incrementally increasing your knowledge session-upon-session. In each section you will be given new material for a workbook to fill out and by the end of this course you will have your very own Python reference handbook. So how does this course have a GIS focus? Simple, most elements of the course have GIS and geospatial data in mind. Instead of using non-descript variables and values, we will use terms such as population, city, x_coord, y_coord, and so on. This will aid participants with pinpointing how they can relate geospatial data to Python. 

The Formula

The is no single formula or equation for calculating an exact Median Center, according to Andy Mitchell it is an iterative process involving calculating the Mean Center, summing the distances from it to each feature, offsetting the center slightly and summing the distances again until it eventually zones in on the optimum location that has the lowest sum.

The code below implements the Yehuda Vardi and Cun-Hui Zhang algorithm or the Weiszfeld algorithm take from sources on the internet and also from asking ChatGPT to implement both algorithms.

Vardi and Cun-Hui Zhang from orlp on StackOverflow.
Weizfeld from endolith on GitHub.

For Point features the X and Y coordinates of each feature is used, for Polygons the centroid of each feature represents the X and Y coordinate to use, and for Linear features the mid-point of each line is used for the X and Y coordinates.

Using GeoPandas to Calculate the Median Center

The code below uses GeoPandas and a number of other libraries required to perform the mathematical computations to find the median center for a dataset and create an output file. In our example we will use a Shapefile, but you can use any input and output filetypes that you have available with your GeoPandas setup. 

I won’t even pretend to know the maths behind the algorithms, they are taken from those that have blazed a path before me. Links to sourced code are available earlier in the post and from the code below. 

Outside of the madness of the algrithms, we use NumPy for the efficiencies brought by their arrays to supply the list of all coordinates into each algorithm. Once the algorithm works its magic, we get an array/list returned containing the x and y coordinates for the median center. We the create a GeoPandas GeoDateFrame based off these coordinates and add them as XCoord and YCoord fields too to match the output from a similar tool in ArcGIS Pro.

				
					import math
import geopandas as gpd
import numpy as np
from shapely.geometry import Point
from scipy.spatial.distance import cdist

## input shapefile path
in_shp = r"path\to\input\shapefile\input.shp"

## the output shapefile path for the median center for
out_shp = r"path\to\output\shapefile\output.shp"

## choose an algorithm
## W - Weiszfield
## WGPT - Weiszfield from ChatGPT
## YC - Yehuda Vardi and Cun-Hui Zhang
## YCGPT - Yehuda Vardi and Cun-Hui Zhang from ChatGPT
algorithm = "YCGPT"

################################################################################
## Weiszfield algorithm ########################################################

## https://gist.github.com/endolith/2837160
def numersum(median_center,dataPoint):
    ## Provides the denominator of the weiszfeld algorithm depending on whether
    ## you are adjusting the candidate x or y
    return 1/math.sqrt((median_center[0]-dataPoint[0])**2 + (median_center[1]-dataPoint[1])**2)

def denomsum(median_center, xy_arr):
    ## Provides the denominator of the weiszfeld algorithm
    temp = 0.0
    for i in range(0,len(xy_arr)):
        temp += 1/math.sqrt((median_center[0] - xy_arr[i][0])**2 + (median_center[1] - xy_arr[i][1])**2)
    return temp

################################################################################
## Weiszfield algorithm ChatGPT ################################################

def weiszfeld_chatgpt(coords, tol=1e-6, max_iter=1000):
    """Weiszfeld algorithm to compute geometric median."""
    median = coords.mean(axis=0)
    for iteration in range(max_iter):
        # Calculate distances between the current median and all points
        distances = np.linalg.norm(coords - median, axis=1)

        # Avoid division by zero by setting very small distances to a small value
        distances[distances == 0] = 1e-10

        # Update the median based on the Weiszfeld formula
        weights = 1 / distances
        weighted_coords = (coords.T * weights).T
        new_median = weighted_coords.sum(axis=0) / weights.sum()

        # Check convergence (if the difference between iterations is less than tolerance)
        if np.linalg.norm(new_median - median) < tol:
            print(f"Converged after {iteration+1} iterations")
            return new_median

        median = new_median

    print(f"Reached maximum iterations ({max_iter}) without full convergence")
    return median

################################################################################
## Yehuda Vardi and Cun-Hui Zhang ##############################################

## http://stackoverflow.com/questions/30299267/geometric-median-of-multidimensional-points
## user: orlp
def vardi_zhang(X, eps=1e-5):
    y = np.mean(X, 0)

    while True:
        D = cdist(X, [y])
        nonzeros = (D != 0)[:, 0]
        Dinv = 1 / D[nonzeros]
        Dinvs = np.sum(Dinv)
        W = Dinv / Dinvs
        T = np.sum(W * X[nonzeros], 0)
        num_zeros = len(X) - np.sum(nonzeros)
        if num_zeros == 0:
            y1 = T
        elif num_zeros == len(X):
            return y
        else:
            R = (T - y) * Dinvs
            r = np.linalg.norm(R)
            rinv = 0 if r == 0 else num_zeros/r
            y1 = max(0, 1-rinv)*T + min(1, rinv)*y
        if np.linalg.norm(y - y1) < eps:
            return y1
        y = y1

################################################################################
## Yehuda Vardi and Cun-Hui Zhang ChatGPT ######################################

def vardi_zhang_chatgpt(coords, tol=1e-6, max_iter=1000):
    """Vardi-Zhang algorithm to compute geometric median using GeoPandas and NumPy."""
    median = coords.mean(axis=0)  # Start with the centroid
    for iteration in range(max_iter):
        # Compute distances between current median and all points
        distances = np.linalg.norm(coords - median, axis=1)

        # Find points very close to the current median to avoid division by zero
        close_to_median = (distances < tol)

        if close_to_median.any():
            # If any points are very close to the median, exclude them from the update
            non_zero_indices = ~close_to_median
            non_zero_coords = coords[non_zero_indices]
            non_zero_distances = distances[non_zero_indices]

            # Update the median only with points not close to the current median
            weights = 1 / non_zero_distances
            weighted_coords = (non_zero_coords.T * weights).T
            new_median = weighted_coords.sum(axis=0) / weights.sum()
        else:
            # Regular update step if no points are too close to the median
            weights = 1 / distances
            weighted_coords = (coords.T * weights).T
            new_median = weighted_coords.sum(axis=0) / weights.sum()

        # Check convergence
        if np.linalg.norm(new_median - median) < tol:
            print(f"Converged after {iteration+1} iterations")
            return new_median

        # Update the median for the next iteration
        median = new_median

    print(f"Reached maximum iterations ({max_iter}) without full convergence")
    return median

################################################################################

## read in the shapefile to a GeoDataFrame
gdf = gpd.read_file(in_shp)

## get the geometry type from the first record
geom_type = gdf.geom_type[0]

## get the EPSG code
crs = gdf.crs

## for Point and Polygon geometry get the centroid
if geom_type in ("Point", "Polygon"):
    ## get the centroid of each feature as a Point geometry
    gdf["point"] = gdf.geometry.centroid

## for LineString geometry get the midpoint
elif geom_type == "LineString":
    ## get the midpoint of each line as a Point geometry
    gdf["point"] = gdf.geometry.interpolate(0.5, normalized=True)

## convert coordinates to NumPy array of coordinates
xy_arr = np.array([[geom.x, geom.y] for geom in gdf["point"]])

## if using the Weiszfield algorithm
if algorithm == "W":
    ## https://gist.github.com/endolith/2837160
    mean_x, mean_y = np.mean(xy_arr, axis=0)
    median_center = [mean_x, mean_y]
    numIter = 50

    ## minimise the objective function
    for x in range(0,numIter):
        denom = denomsum(median_center,xy_arr)
        nextx = 0.0
        nexty = 0.0

        for y in range(0,len(xy_arr)):
            nextx += (xy_arr[y][0] * numersum(median_center,xy_arr[y]))/denom
            nexty += (xy_arr[y][1] * numersum(median_center,xy_arr[y]))/denom

        median_center = [nextx,nexty]

## if using the Weiszfield algorithm from ChatGPT
elif algorithm == "WGPT":
    median_center = weiszfeld_chatgpt(xy_arr)

## if using Yehuda Vardi and Cun-Hui Zhang algorithm
elif algorithm == "YC":
    median_center = vardi_zhang(xy_arr)

## if using Yehuda Vardi and Cun-Hui Zhang from ChatGPT
elif algorithm == "YCGPT":
    median_center = vardi_zhang_chatgpt(xy_arr)

## create a point geometry representing the median center
median_center = Point(median_center[0], median_center[1])

## create a GeoDataFrame with the point geometry
gdf_median_center = gpd.GeoDataFrame(geometry=[median_center], crs=crs)

## add fields for the XCoord and YCoord
gdf_median_center["XCoord"] = median_center.x
gdf_median_center["YCoord"] = median_center.y

## write the median center point to the output shapefile
gdf_median_center.to_file(out_shp, driver="ESRI Shapefile")

				
			

Median Center in Action

Data for Primary School location was downloaded from the Department of Education (Ireland) and processed to contain Primary Schools in County Kildare in a projected coordinate system – Irish Transverse Mercator (EPSG:2157). You can download the Shapefile containing the data used below here.

Primary Schools in Kildare for Median Center in ArcGIS Pro

Running the script produces a Shapefile that contains a Point representing the Median Center of the original Shapefile dataset.

Median Center for Schools in Kildare using GeoPandas and displayed in ArcGIS Pro

Below is a comparison between our GeoPandas tool and the Central Feature tool output from ArcGIS Pro. Very close!

Median Center Geopandas Weiszfield algorithm
Median Center GeoPandas Weiszfield algorithm from ChatGPT
Median Center GeoPandas Yehuda Vardi and Cun-Hui Zhang algorithm
Median Center GeoPandas Yehuda Vardi and Cun-Hui Zhang algorithm from ChatGPT
Median Center from ArcGIS Pro

At Final Draft Mapping we provide comprehensive courses for automating tasks within ArcGIS Pro and ArcGIS Online with ArcPy and the ArcGIS API for Python. Courses range from beginner to advanced workflows and all paid courses provide extra support where you can ask questions. Automation within ArcGIS is a highly sought after skill, by adding these skills to your arsenal you are placing yourself at the forefront of that demand. 

We appreciate our blog readers, you can get 25% off any (non-sale) course at any time with the code FDMBLOG25

Also in this series...

Leave a Comment

Your email address will not be published. Required fields are marked *