Python Toolbox 4 : Clone Digger

Looking for duplicate code or opportunities to refactor, let me introduce you to a great Python tool called Clone Digger. As the projects page says

Clone Digger aimed to detect similar code in Python and Java programs. The synonyms for the term "similar code" are "clone" and "duplicate code".

Once installed you call the clonedigger.py file with as arguments the path for the output html and the path to a folder or code file to analyze. If you call it with the parameter -h it outputs the different commandline options. To show the power of Clone Digger I used the following extract from actual code.

import arcgisscripting

gp = arcgisscripting.create()

def create_point(x, y):
    p = gp.createobject('Point')
    p.x = x
    p.y = y
    return p

def createPolygon(xMin, xMax, yMin, yMax):
    polygon = gp.createobject('array')
    ##Add the first point
    newPoint = createPoint(xMin, yMin)
    polygon.Add(newPoint)
    ##Add the second point
    newPoint = createPoint(xMin, yMax)
    polygon.Add(newPoint)
    ##Add the third point
    newPoint = createPoint(xMax, yMax)
    polygon.Add(newPoint)
    ##Add the fourth point
    newPoint = createPoint(xMax, yMin)
    polygon.Add(newPoint)
    ##Close the polygon
    newPoint = createPoint(xMin, yMin)
    #polygon.Add(newPoint)
    return polygon

To run Clone Digger on this file all you have to is issue the below command. Make sure that your shell finds the file clonedigger.py by adding to your path variables or by navigating to its folder.

python clonedigger.py -o D:\output.html D:\CodeTest.py

The output first shows some summary values from the code analysis. Then it shows the different code blocks where duplicate or similar code where found. This is how the output looks like for my short Python code.

Source files: 1

Clones detected: 2

9 of 17 lines are duplicates (52.94%)

Parameters
clustering_threshold = 10
distance_threshold = 5
size_threshold = 5
hashing_depth = 1
clusterize_using_hash = False
clusterize_using_dcup = False

Time elapsed
Construction of AST : 0.00 seconds
Building statement hash : 0.00 seconds
Building patterns : 0.00 seconds
Marking similar statements : 0.02 seconds
Finding similar sequences of statements : 0.00 seconds
Refining candidates : 0.02 seconds
Total time: 0.03
Started at: Wed Jun 24 21:05:15 2009
Finished at: Wed Jun 24 21:05:15 2009

Clone # 1
Distance between two fragments = 4
Clone size = 7

Source file "D:\CodeToTest.py"
The first line is 16
Source file "D:\CodeToTest.py"
The first line is 13
newPoint = createPoint(xMin, yMax) newPoint = createPoint(xMin, yMin)
polygon.Add(newPoint) polygon.Add(newPoint)
newPoint = createPoint(xMax, yMax) newPoint = createPoint(xMin, yMax)
polygon.Add(newPoint) polygon.Add(newPoint)
newPoint = createPoint(xMax, yMin) newPoint = createPoint(xMax, yMax)
polygon.Add(newPoint) polygon.Add(newPoint)
newPoint = createPoint(xMin, yMin) newPoint = createPoint(xMax, yMin)



Clone # 2
Distance between two fragments = 4
Clone size = 5

Source file "D:\CodeToTest.py"
The first line is 19
Source file "D:\CodeToTest.py"
The first line is 13
newPoint = createPoint(xMax, yMax) newPoint = createPoint(xMin, yMin)
polygon.Add(newPoint) polygon.Add(newPoint)
newPoint = createPoint(xMax, yMin) newPoint = createPoint(xMin, yMax)
polygon.Add(newPoint) polygon.Add(newPoint)
newPoint = createPoint(xMin, yMin) newPoint = createPoint(xMax, yMax)



Clone Digger is aimed to find software clones in Python and Java programs. It is provided under the GPL license and can be downloaded from the site http://clonedigger.sourceforge.net

I would not try remove all the duplicates or similarities found by Clone Digger. But I think it's a great tool to find code that can be improved. The degree to which you refactor depends greatly on the goal of your code and your time restrictions. Clone Digger is also be useful when working with multiple people on a project or when having to improve some legacy code.
Have any comments ? Any tools you can't live without ? Any suggestions for me ? Feel free to let me know.

Related posts
Pythonnet (call .NET from Python)
Pygments (syntax highlighter)
Logging

3 comments:

Ira said...
This comment has been removed by a blog administrator.
Anonymous said...

How do you install clone digger?

Samuel Bosch said...

All info about the installation can be found here : http://clonedigger.sourceforge.net/download.html