Delete Records in ArcGIS Pro with Identical Geometries based on OID with ArcPy

Table of Contents

Introduction

Every week I put time towards answering questions on three main platforms; GIS StackExchangeEsri Communities, and Reddit. I go under the alias Clubdebambos on each of them. This week I helped g3odood on Reddit who was looking to remove duplicate records based on geometry, but needed to keep the records from the duplicate groups that had the lowest OID, while maintaining attributes from the original table. There is currently no out-of-box tool to achieve this. In the end, it was a mixture of ArcPy tools and functions mixed with some basic Python to create the custom tool that enables a user to keep the duplicate based on the highest (max) or lowest (min) OID within the duplicates. 

ArcPy for Data Management and Geoprocessing with ArcGIS Pro

Leverage ArcPy for geospatial data management workflows within ArcGIS Pro. Learn the fundamentals of utilising ArcGIS Pro geoprocessing tools with ArcPy for data management, conversions, and analysis. This course is packed with amazing content that will help you discover the power of automating tasks within the ArcGIS Pro environment. Take your ArcPy skills from beginner to snake charmer. A little code goes a long way, a little ArcPy provides you with an in-demand skill. Sign up now for our highly rated course.

Custom Tool Syntax

The syntax for the Delete Identical Geometry by OID tool is as follows…

				
					deleteIdenticalGeomByOID(
    in_features,
    out_features,
    keep_based_on
)
				
			

We require three user inputs; in_features (Type: Feature Layer), out_features (Type: Feature Class), and keep_based_on (Type: String)

Import the ArcPy module

ArcPy is a Python site package that enables automation, analysis, and management of geographic data within the ArcGIS software environment. 

				
					import arcpy
				
			

Define the parameters

				
					## input feature class
in_features = arcpy.GetParameterAsText(0)

## output feature class
out_features = arcpy.GetParameterAsText(1)

## keep duplicate based on min or max oid
keep_based_on = arcpy.GetParameterAsText(2)
				
			

Required Python object for the tool

We’ll set some required objects to aid with the workflow. The code is commented, read through and get familiar with each statement.

				
					## the OID field is not always OBJECTID, this gets the OID field from the input
oid_fld = [fld.name for fld in arcpy.ListFields(in_features) if fld.type=="OID"][0]

## this dictionary will hold the min/max oid from the indentical groups returned
## from the FindIdentical tool (FEAT_SEQ)
keep_oids = {}
				
			

ArcPy Find Identical tool

We utilise the Find Indentical tool from the Data Management toolbox in ArcGIS Pro to create a table in our memory workspace (this prevents the need to write the table to disk). By supplying “SHAPE” as the fields parameter, we are looking for records that have duplicate geometry only.

				
					## call the find identical tool, this will return a table with the features
## grouped by OBJECTID (IN_FID) and FEAT_SEQ which is their group id.
## we use the memory workspace so the table doesnt need to be written to disk
identical = arcpy.management.FindIdentical(
    in_dataset = in_features,
    out_dataset = "memory\\out_fc",
    fields="SHAPE"
)
				
			

The Main Event Part 1: Calculating Polygon Overlap Percentage

Time to work some magic and add a field to store the percentage overlap values and then populate the attributes. Instead of using ArcPy to join tables together, we will use a Python dictionary like a look up table.

				
					## add a field to contain our percentage of overlap
arcpy.management.AddField(
    in_table = memory_fc,
    field_name = "overlap_percent",
    field_type = "DOUBLE"
)
 
## key (fid_fld_name_1, fid_fld_name_2), value Shape_Area
fid_dict = {(row[0],row[1]):row[2] for row in arcpy.da.SearchCursor(memory_fc, [fid_fld_name_1, fid_fld_name_2, "SHAPE@AREA"])}
 
## SQL expression for where clause when getting necessary features from the
## first of the in_features
sql_exp = "{0} IN ({1})".format(join_fld, ",".join(str(x[1]) for x in fid_dict.keys()))
 
## iterate through each feature from the sql_exp for the 2nd in_features
with arcpy.da.SearchCursor(join_tbl, search_flds, sql_exp) as cursor:
    ## for each row (polygon) in the feature class
    for row in cursor:
        ## iterate over the memory_fc records where the fid_fld_name_2 is equal
        ## to the FID in the current search cursor.
        ## in the memory fc
        with arcpy.da.UpdateCursor(memory_fc, update_flds, "{0} = {1}".format(fid_fld_name_2, row[0])) as u_cursor:
            ## for each record
            for u_row in u_cursor:
                ## set the percentage_overlap to be the overlap area / the original
                ## area for the overlapped polygoon from the 2nd fc from in_features
                u_row[2] = fid_dict[(u_row[0], u_row[1])]/row[1]
                u_cursor.updateRow(u_row)
				
			

The above is indicative of the table returned from the Find Identical tool. The FEAT_SEQ field groups any duplicates found by a matching number and the IN_FID field is the original OID of the feature.

The Main Event Part 2: Deciding what to keep and what to delete

We use a Search Cursor to get a list of tuples containing IN_FID and FEAT_SEQ. Based on the example above thisn would be; [(1,1), (2,2), (3,3), (4,3), (5,4), (6,4), (7,5), (8,6), (9,7), (10,7)]

We then use some Python Magic to update our keep_oids dictionary, where the key is the unique FEAT_SEQ and the value is either the min or max OID relating to that FEAT_SEQ (duplicate group).

From our dictionary values, which represent or OIDs to keep, we create a list of strings – needed for using the Python string join function.

Lastly, we create a where clause SQL expression from our list of OIDs. This where clause will be used in the Export Features tool.

				
					## get a list of tuples representing (IN_FID, FEAT_SEQ)
data = [row for row in arcpy.da.SearchCursor(identical, ["IN_FID", "FEAT_SEQ"])]

## iterate through (IN_FID, FEAT_SEQ) tuples and store the minimum OID value
## per identical group found
if keep_based_on == "Minimum OID":
    for oid, group in data:
        if group not in keep_oids or oid < keep_oids[group]:
            keep_oids[group] = oid
elif keep_based_on == "Maximum OID":
    for oid, group in data:
        if group not in keep_oids or oid > keep_oids[group]:
            keep_oids[group] = oid

## convert list of OIDs to strings for use in join below
str_oids = [str(oid) for oid in keep_oids.values()]

## create a where clause for the export features tool
## update OBJECTID field name if necessary (courld be FID, OID etc)
where_clause = "{0} IN ({1})".format(oid_fld, ",".join(str_oids))
				
			

Export to disk

Our job is done and we can now export from the original in_feature using the Export Features tool from the Conversion toolbox and utilise the optional where clause parameter to subset the data based on whet we want to keep.

				
					## export from original dataset using the where clause to nsubset the data
arcpy.conversion.ExportFeatures(
    in_features = in_features,
    out_features = out_features,
    where_clause = where_clause
)
				
			

Clean-up memory workspace

Don’t forget your sweeping-brush and clean-up the memory workspace.

				
					arcpy.management.Delete(identical)
				
			

ArcPy for Data Management and Geoprocessing with ArcGIS Pro

Leverage ArcPy for geospatial data management workflows within ArcGIS Pro. Learn the fundamentals of utilising ArcGIS Pro geoprocessing tools with ArcPy for data management, conversions, and analysis. This course is packed with amazing content that will help you discover the power of automating tasks within the ArcGIS Pro environment. Take your ArcPy skills from beginner to snake charmer. A little code goes a long way, a little ArcPy provides you with an in-demand skill. Sign up now for our highly rated course.

Create the tool in ArcGIS Pro

Save the script an open up ArcGIS Pro. Right-click on your toolbox/toolset of choice and select New > Script. The New Script window will appear. In the General tab set Name to deleteIdenticalGeomByOIDLabel to Delete Identical Geometry by OID, and the Description as per below or any description you feel is apt.

In the Parameters tab set as below. Set the Data Type for the Input Features to Feature Layer, and for the Output Feature Class to Feature Class (and the Direction to Output). Set the Data Type for Keep Based On to String and create a Filter Value List that contains Minimum OID and Maximum OID, you can also set the Default to Minimum OID.

In the Execution tab, click the folder icon in the top-right corner and add your saved Python script. 

Click OK and you are ready to take your new tool for a spin.

You can download the tool and other custom tools over on this page. This tool is in the Custom Tools on a Basic License with ArcPy section.

All the code in one place

You can find the entire code workflow below with links to important components in the documentation that were used.

				
					import arcpy

################################################################################
## Esri Documentation
##  https://pro.arcgis.com/en/pro-app/3.2/arcpy/functions/getparameterastext.htm
##  https://pro.arcgis.com/en/pro-app/3.2/arcpy/functions/listfields.htm
##  https://pro.arcgis.com/en/pro-app/3.2/tool-reference/data-management/find-identical.htm
##  https://pro.arcgis.com/en/pro-app/3.2/arcpy/data-access/searchcursor-class.htm
##  https://pro.arcgis.com/en/pro-app/3.2/tool-reference/conversion/export-features.htm
##  https://pro.arcgis.com/en/pro-app/3.2/tool-reference/data-management/delete.htm
##
## ArcGIS Pro Version: 3.2.0
##
################################################################################

################################################################################
## USER INPUTS #################################################################

## input feature class
in_features = arcpy.GetParameterAsText(0)

## out put feature class
out_features = arcpy.GetParameterAsText(1)

## keep duplicate based on min or max oid
keep_based_on = arcpy.GetParameterAsText(2)

################################################################################
## REQUIRED OBJECTS ############################################################

## the OID field is not always OBJECTID, this gets the OID field from the input
oid_fld = [fld.name for fld in arcpy.ListFields(in_features) if fld.type=="OID"][0]

## this dictionary will hold the min/max oid from the indentical groups returned
## from the FindIdentical tool (FEAT_SEQ)
keep_oids = {}

################################################################################
## CREATE IDENTICAL TABLE BASED ON GEOMETRY ####################################

## call the find identical tool, this will return a table with the features
## grouped by OBJECTID (IN_FID) and FEAT_SEQ which is their group id.
## we use the memory workspace so the table doesnt need to be written to disk
identical = arcpy.management.FindIdentical(
    in_dataset = in_features,
    out_dataset = "memory\\out_fc",
    fields="SHAPE"
)

################################################################################
## GET OIDS TO KEEP ############################################################

## get a list of tuples representing (IN_FID, FEAT_SEQ)
data = [row for row in arcpy.da.SearchCursor(identical, ["IN_FID", "FEAT_SEQ"])]

## iterate through (IN_FID, FEAT_SEQ) tuples and store the minimum OID value
## per identical group found
if keep_based_on == "Minimum OID":
    for oid, group in data:
        if group not in keep_oids or oid < keep_oids[group]:
            keep_oids[group] = oid
elif keep_based_on == "Maximum OID":
    for oid, group in data:
        if group not in keep_oids or oid > keep_oids[group]:
            keep_oids[group] = oid

## convert list of OIDs to strings for use in join below
str_oids = [str(oid) for oid in keep_oids.values()]

## create a where clause for the export features tool
## update OBJECTID field name if necessary (courld be FID, OID etc)
where_clause = "{0} IN ({1})".format(oid_fld, ",".join(str_oids))

################################################################################
## EXPORT DATA TO DISK #########################################################

## export from original dataset using the where clause to nsubset the data
arcpy.conversion.ExportFeatures(
    in_features = in_features,
    out_features = out_features,
    where_clause = where_clause
)

################################################################################
## CLEAN-UP MEMORY WORKSPACE ##################################################

arcpy.management.Delete(identical)

################################################################################
				
			

Leave a Comment

Your email address will not be published. Required fields are marked *