Table of Contents
Introduction
Every week I put time towards answering questions on three main platforms; GIS StackExchange, Esri Communities, and Reddit. I go under the alias Clubdebambos on each of them. This week I helped g3odood on Reddit who was looking to remove duplicate records based on geometry, but needed to keep the records from the duplicate groups that had the lowest OID, while maintaining attributes from the original table. There is currently no out-of-box tool to achieve this. In the end, it was a mixture of ArcPy tools and functions mixed with some basic Python to create the custom tool that enables a user to keep the duplicate based on the highest (max) or lowest (min) OID within the duplicates.
Leverage ArcPy for geospatial data management workflows within ArcGIS Pro. Learn the fundamentals of utilising ArcGIS Pro geoprocessing tools with ArcPy for data management, conversions, and analysis. This course is packed with amazing content that will help you discover the power of automating tasks within the ArcGIS Pro environment. Take your ArcPy skills from beginner to snake charmer. A little code goes a long way, a little ArcPy provides you with an in-demand skill. Sign up now for our highly rated course.
Custom Tool Syntax
The syntax for the Delete Identical Geometry by OID tool is as follows…
deleteIdenticalGeomByOID(
in_features,
out_features,
keep_based_on
)
We require three user inputs; in_features (Type: Feature Layer), out_features (Type: Feature Class), and keep_based_on (Type: String)
Import the ArcPy module
ArcPy is a Python site package that enables automation, analysis, and management of geographic data within the ArcGIS software environment.
import arcpy
Define the parameters
## input feature class
in_features = arcpy.GetParameterAsText(0)
## output feature class
out_features = arcpy.GetParameterAsText(1)
## keep duplicate based on min or max oid
keep_based_on = arcpy.GetParameterAsText(2)
Required Python object for the tool
We’ll set some required objects to aid with the workflow. The code is commented, read through and get familiar with each statement.
## the OID field is not always OBJECTID, this gets the OID field from the input
oid_fld = [fld.name for fld in arcpy.ListFields(in_features) if fld.type=="OID"][0]
## this dictionary will hold the min/max oid from the indentical groups returned
## from the FindIdentical tool (FEAT_SEQ)
keep_oids = {}
ArcPy Find Identical tool
We utilise the Find Indentical tool from the Data Management toolbox in ArcGIS Pro to create a table in our memory workspace (this prevents the need to write the table to disk). By supplying “SHAPE” as the fields parameter, we are looking for records that have duplicate geometry only.
## call the find identical tool, this will return a table with the features
## grouped by OBJECTID (IN_FID) and FEAT_SEQ which is their group id.
## we use the memory workspace so the table doesnt need to be written to disk
identical = arcpy.management.FindIdentical(
in_dataset = in_features,
out_dataset = "memory\\out_fc",
fields="SHAPE"
)
The above is indicative of the table returned from the Find Identical tool. The FEAT_SEQ field groups any duplicates found by a matching number and the IN_FID field is the original OID of the feature.
The Main Event Part: Deciding what to keep and what to delete
We use a Search Cursor to get a list of tuples containing IN_FID and FEAT_SEQ. Based on the example above thisn would be; [(1,1), (2,2), (3,3), (4,3), (5,4), (6,4), (7,5), (8,6), (9,7), (10,7)]
We then use some Python Magic to update our keep_oids dictionary, where the key is the unique FEAT_SEQ and the value is either the min or max OID relating to that FEAT_SEQ (duplicate group).
From our dictionary values, which represent our OIDs to keep, we create a list of strings – needed for using the Python string join function.
Lastly, we create a where clause SQL expression from our list of OIDs. This where clause will be used in the Export Features tool.
## get a list of tuples representing (IN_FID, FEAT_SEQ)
data = [row for row in arcpy.da.SearchCursor(identical, ["IN_FID", "FEAT_SEQ"])]
## iterate through (IN_FID, FEAT_SEQ) tuples and store the minimum OID value
## per identical group found
if keep_based_on == "Minimum OID":
for oid, group in data:
if group not in keep_oids or oid < keep_oids[group]:
keep_oids[group] = oid
elif keep_based_on == "Maximum OID":
for oid, group in data:
if group not in keep_oids or oid > keep_oids[group]:
keep_oids[group] = oid
## convert list of OIDs to strings for use in join below
str_oids = [str(oid) for oid in keep_oids.values()]
## create a where clause for the export features tool
## update OBJECTID field name if necessary (courld be FID, OID etc)
where_clause = "{0} IN ({1})".format(oid_fld, ",".join(str_oids))
Export to disk
Our job is done and we can now export from the original in_feature using the Export Features tool from the Conversion toolbox and utilise the optional where clause parameter to subset the data based on whet we want to keep.
## export from original dataset using the where clause to nsubset the data
arcpy.conversion.ExportFeatures(
in_features = in_features,
out_features = out_features,
where_clause = where_clause
)
Clean-up memory workspace
Don’t forget your sweeping-brush and clean-up the memory workspace.
arcpy.management.Delete(identical)
Mastering ArcGIS Pro ArcPy Search, Insert, & Update Cursors
Unlock the full potential of ArcPy Cursors with our intensive and in-depth course. Designed for GIS professionals, analysts, and enthusiasts, this course delves into the intricacies of Search, Insert, and Update Cursors in the ArcPy library, empowering you to manipulate and manage spatial data with precision and efficiency.
ArcPy Cursors enable you to streamline your GIS workflows, automate repetitive tasks, and witness the full potential of spatial data manipulation in ArcGIS Pro.
Create the tool in ArcGIS Pro
Save the script an open up ArcGIS Pro. Right-click on your toolbox/toolset of choice and select New > Script. The New Script window will appear. In the General tab set Name to deleteIdenticalGeomByOID, Label to Delete Identical Geometry by OID, and the Description as per below or any description you feel is apt.
In the Parameters tab set as below. Set the Data Type for the Input Features to Feature Layer, and for the Output Feature Class to Feature Class (and the Direction to Output). Set the Data Type for Keep Based On to String and create a Filter Value List that contains Minimum OID and Maximum OID, you can also set the Default to Minimum OID.
In the Execution tab, click the folder icon in the top-right corner and add your saved Python script.
Click OK and you are ready to take your new tool for a spin.
You can download the tool and other custom tools over on this page. This tool is in the Custom Tools on a Basic License with ArcPy section.
All the code in one place
You can find the entire code workflow below with links to important components in the documentation that were used.
import arcpy
################################################################################
## Esri Documentation
## https://pro.arcgis.com/en/pro-app/3.2/arcpy/functions/getparameterastext.htm
## https://pro.arcgis.com/en/pro-app/3.2/arcpy/functions/listfields.htm
## https://pro.arcgis.com/en/pro-app/3.2/tool-reference/data-management/find-identical.htm
## https://pro.arcgis.com/en/pro-app/3.2/arcpy/data-access/searchcursor-class.htm
## https://pro.arcgis.com/en/pro-app/3.2/tool-reference/conversion/export-features.htm
## https://pro.arcgis.com/en/pro-app/3.2/tool-reference/data-management/delete.htm
##
## ArcGIS Pro Version: 3.2.0
##
################################################################################
################################################################################
## USER INPUTS #################################################################
## input feature class
in_features = arcpy.GetParameterAsText(0)
## out put feature class
out_features = arcpy.GetParameterAsText(1)
## keep duplicate based on min or max oid
keep_based_on = arcpy.GetParameterAsText(2)
################################################################################
## REQUIRED OBJECTS ############################################################
## the OID field is not always OBJECTID, this gets the OID field from the input
oid_fld = [fld.name for fld in arcpy.ListFields(in_features) if fld.type=="OID"][0]
## this dictionary will hold the min/max oid from the indentical groups returned
## from the FindIdentical tool (FEAT_SEQ)
keep_oids = {}
################################################################################
## CREATE IDENTICAL TABLE BASED ON GEOMETRY ####################################
## call the find identical tool, this will return a table with the features
## grouped by OBJECTID (IN_FID) and FEAT_SEQ which is their group id.
## we use the memory workspace so the table doesnt need to be written to disk
identical = arcpy.management.FindIdentical(
in_dataset = in_features,
out_dataset = "memory\\out_fc",
fields="SHAPE"
)
################################################################################
## GET OIDS TO KEEP ############################################################
## get a list of tuples representing (IN_FID, FEAT_SEQ)
data = [row for row in arcpy.da.SearchCursor(identical, ["IN_FID", "FEAT_SEQ"])]
## iterate through (IN_FID, FEAT_SEQ) tuples and store the minimum OID value
## per identical group found
if keep_based_on == "Minimum OID":
for oid, group in data:
if group not in keep_oids or oid < keep_oids[group]:
keep_oids[group] = oid
elif keep_based_on == "Maximum OID":
for oid, group in data:
if group not in keep_oids or oid > keep_oids[group]:
keep_oids[group] = oid
## convert list of OIDs to strings for use in join below
str_oids = [str(oid) for oid in keep_oids.values()]
## create a where clause for the export features tool
## update OBJECTID field name if necessary (courld be FID, OID etc)
where_clause = "{0} IN ({1})".format(oid_fld, ",".join(str_oids))
################################################################################
## EXPORT DATA TO DISK #########################################################
## export from original dataset using the where clause to nsubset the data
arcpy.conversion.ExportFeatures(
in_features = in_features,
out_features = out_features,
where_clause = where_clause
)
################################################################################
## CLEAN-UP MEMORY WORKSPACE ##################################################
arcpy.management.Delete(identical)
################################################################################