June 2009

Drag Drop from ArcCatalog

As you probably already noticed you can drag drop items within ArcGIS. For example you can drag a feature class from ArcCatalog to ArcMap or from ArcMap or ArcCatalog to a geoprocessing form. In this post I'm going to show you what has to be done to enable drag drop behavior from ArcCatalog and Windows Explorer to a textbox.

To enable drag drop on a textbox I added event handlers for DragEnter, DragOver and DragDrop for my textbox called textBoxPath. In the DragEnter and DragOver handlers I check whether the dragged object is valid. If this is the case the drag drop effect is set to All. This is a combination of the Copy, Move, and Scroll effect. But its in the DragDrop event handler and more precisely in the helper function GetPaths that the most import stuff happens. As you can see the Data property of the DragEventArgs is processed by the GetPaths method and if any paths to feature classes or tables are found the first path is shown. After that I added a small hack to put the cursor at the end of the text in the textbox.

private void TextBoxPath_DragEnter(object sender, DragEventArgs e)
{
 e.Effect = EsriDragDrop.IsValid(e.Data) ? DragDropEffects.All : DragDropEffects.None;
}

private void TextBoxPath_DragOver(object sender, DragEventArgs e)
{
 e.Effect = EsriDragDrop.IsValid(e.Data) ? DragDropEffects.All : DragDropEffects.None;
}

private void TextBoxPath_DragDrop(object sender, DragEventArgs e)
{
 List<string> paths = EsriDragDrop.GetPaths(e.Data);
 TextBox txtBoxPath = (TextBox)sender;
 if (paths.Count > 0)
 {
  // set value of textbox to the first found path
  txtBoxPath.Text = esriDragDrop.Paths[0];
  // place cursor at the end of the textbox
  txtBoxPath.SelectionStart = txtBoxPath.TextLength;
 }
}

In the below EsriDragDrop class I placed the IsValid and GetPaths methods. The IsValid method checks whether the IDataObject coming from the drag event contains any valid objects. The GetPaths method retrieves those valid objects and returns the paths to the found feature classes and tables. It uses the IDataObjectHelper interface and its GetNames and GetFiles methods to access the objects in the IDataObject. Note that only feature classes and tables will be returned by my code but this constraint can easily be removed by not checking the Type of the datasetName. I didn't add any functionality to check whether the file path dragged from a Windows Explorer to the textbox was valid but you can implement this by using the Geoprocessor object and its Exists and Describe methods or by trying to open the table or feature class.

using System.Collections.Generic;
using System.Windows.Forms;
using ESRI.ArcGIS.esriSystem;
using ESRI.ArcGIS.Geodatabase;
using ESRI.ArcGIS.SystemUI;

namespace GisSolved.DragDrop
{
 public class EsriDragDrop
 {
  const string DATAOBJECT_ESRINAMES = "ESRI Names";
  public static bool IsValid(IDataObject dataObject)
  {
   return dataObject.GetDataPresent(DATAOBJECT_ESRINAMES) ||
    dataObject.GetDataPresent(System.Windows.Forms.DataFormats.FileDrop);
  }
  public static List<string> GetPaths(IDataObject dataObject)
  {
   List<string> foundPaths = new List<string>();
   IDataObjectHelper dataObjectHelper = new DataObjectHelperClass();
   dataObjectHelper.InternalObject = (object)dataObject;
   
   if (dataObjectHelper.CanGetNames())
   {
    IEnumName enumNames = dataObjectHelper.GetNames();
    IName name;
    while ((name = enumNames.Next()) != null)
    {
     if (name is IDatasetName)
     {
      IDatasetName datasetName = (IDatasetName)name;
      // only accept feature classes and tables
      if (datasetName.Type == esriDatasetType.esriDTFeatureClass ||
       datasetName.Type == esriDatasetType.esriDTTable)
      {
       string path = System.IO.Path.Combine(datasetName.WorkspaceName.PathName, datasetName.Name);
       foundPaths.Add(path);
      }
     }
    }
   }
   else if (dataObjectHelper.CanGetFiles())
   {
    string[] paths = (string[])dataObjectHelper.GetFiles();
    foreach (string path in paths)
    {
     // TODO : Add code here to check if the file path is a valid path
     foundPaths.Add(path);
    }
   }
   return foundPaths;
  }
 }
}

This is all you need to implement drag drop behavior from ArcCatalog or Windows Explorer to your textbox. If you want to implement drag drop from ArcMap to your form I suggest you to read this and this. Any comments or suggestions ? Let me know.

Python Toolbox 4 : Clone Digger

Looking for duplicate code or opportunities to refactor, let me introduce you to a great Python tool called Clone Digger. As the projects page says

Clone Digger aimed to detect similar code in Python and Java programs. The synonyms for the term "similar code" are "clone" and "duplicate code".

Once installed you call the clonedigger.py file with as arguments the path for the output html and the path to a folder or code file to analyze. If you call it with the parameter -h it outputs the different commandline options. To show the power of Clone Digger I used the following extract from actual code.

import arcgisscripting

gp = arcgisscripting.create()

def create_point(x, y):
    p = gp.createobject('Point')
    p.x = x
    p.y = y
    return p

def createPolygon(xMin, xMax, yMin, yMax):
    polygon = gp.createobject('array')
    ##Add the first point
    newPoint = createPoint(xMin, yMin)
    polygon.Add(newPoint)
    ##Add the second point
    newPoint = createPoint(xMin, yMax)
    polygon.Add(newPoint)
    ##Add the third point
    newPoint = createPoint(xMax, yMax)
    polygon.Add(newPoint)
    ##Add the fourth point
    newPoint = createPoint(xMax, yMin)
    polygon.Add(newPoint)
    ##Close the polygon
    newPoint = createPoint(xMin, yMin)
    #polygon.Add(newPoint)
    return polygon

To run Clone Digger on this file all you have to is issue the below command. Make sure that your shell finds the file clonedigger.py by adding to your path variables or by navigating to its folder.

python clonedigger.py -o D:\output.html D:\CodeTest.py

The output first shows some summary values from the code analysis. Then it shows the different code blocks where duplicate or similar code where found. This is how the output looks like for my short Python code.

Source files: 1

Clones detected: 2

9 of 17 lines are duplicates (52.94%)

Parameters
clustering_threshold = 10
distance_threshold = 5
size_threshold = 5
hashing_depth = 1
clusterize_using_hash = False
clusterize_using_dcup = False

Time elapsed
Construction of AST : 0.00 seconds
Building statement hash : 0.00 seconds
Building patterns : 0.00 seconds
Marking similar statements : 0.02 seconds
Finding similar sequences of statements : 0.00 seconds
Refining candidates : 0.02 seconds
Total time: 0.03
Started at: Wed Jun 24 21:05:15 2009
Finished at: Wed Jun 24 21:05:15 2009

Clone # 1
Distance between two fragments = 4
Clone size = 7

Source file "D:\CodeToTest.py" The first line is 16		Source file "D:\CodeToTest.py" The first line is 13
newPoint = createPoint(xMin, yMax)		newPoint = createPoint(xMin, yMin)
polygon.Add(newPoint)		polygon.Add(newPoint)
newPoint = createPoint(xMax, yMax)		newPoint = createPoint(xMin, yMax)
polygon.Add(newPoint)		polygon.Add(newPoint)
newPoint = createPoint(xMax, yMin)		newPoint = createPoint(xMax, yMax)
polygon.Add(newPoint)		polygon.Add(newPoint)
newPoint = createPoint(xMin, yMin)		newPoint = createPoint(xMax, yMin)

Clone # 2
Distance between two fragments = 4
Clone size = 5

Source file "D:\CodeToTest.py" The first line is 19		Source file "D:\CodeToTest.py" The first line is 13
newPoint = createPoint(xMax, yMax)		newPoint = createPoint(xMin, yMin)
polygon.Add(newPoint)		polygon.Add(newPoint)
newPoint = createPoint(xMax, yMin)		newPoint = createPoint(xMin, yMax)
polygon.Add(newPoint)		polygon.Add(newPoint)
newPoint = createPoint(xMin, yMin)		newPoint = createPoint(xMax, yMax)

Clone Digger is aimed to find software clones in Python and Java programs. It is provided under the GPL license and can be downloaded from the site http://clonedigger.sourceforge.net

I would not try remove all the duplicates or similarities found by Clone Digger. But I think it's a great tool to find code that can be improved. The degree to which you refactor depends greatly on the goal of your code and your time restrictions. Clone Digger is also be useful when working with multiple people on a project or when having to improve some legacy code.
Have any comments ? Any tools you can't live without ? Any suggestions for me ? Feel free to let me know.

PostGIS : Loading and querying data

Today I'm going to load some POIs (points of interest) in a PostGIS database and query them back. To do this you'll first need an installed version of PostgreSQL with PostGIS. I suggest to first download and install PostgreSQL and then install PostGIS with the latest version. Then you should create a spatially enabled database. I called mine pois_db.

Now we are ready to import the POIs. I first extracted 500.000 POIs from a file geodatabase into a shapefile with the ArcGIS Select tool. Then I used the PostGIS shp2pgsql tool which is located in your bin folder of your PostgreSQL installation to create a text file. This file can then be used to import the data in your database with the PostgreSQL psql tool. My commandline looked like below.

rem -s : srid (spatial reference id)
rem -c : creates a new table and populates it
rem -D : use PostgreSQL dump format
rem -i create a GiST index on the geometry column
"C:\Program Files\PostgreSQL\8.3\bin\shp2pgsql.exe" -s 4326 -c -D -i poi_500000.shp pois_db > pois.sql

More information about the usage of shp2pgsql can be found here. Note that I set the srid to 4326 which stands for WGS84. On spatialreference.org you can find a list of spatial reference ids or you can take a look at the spatial_ref_sys table in your database.
Next thing we are going to is load the pois.sql in the pois_db with the following command line.

"C:\Program Files\PostgreSQL\8.3\bin\psql" -d pois_db -f pois.sql

To query the POIs we issue the following sql command. This query checks whether the bounding box of the point intersects with the bounding box of a created rectangle polygon.

SELECT *
FROM pois
WHERE the_geom && GeomFromText('POLYGON((4.5 50.5, 5.0 50.5, 5.0 51.0, 4.5 51.0, 4.5 50.5))',4326);

This query takes in pgAdmin III with a gist index on the geometry column only 515 ms for more then 4000 found POIs. If I only ask the ids the query is down to 67 ms which is slightly slower then with Rtree but faster then with MongoDb.

I've come to the end of this post. I hope you liked it and leave a comment if you want to.

Related Posts
Pythonnet
MongoDB
Rtree

Python Toolbox 3 : Pythonnet

After logging and Pygments I want to uncover another useful Python tool called pythonnet. Pythonnet provides a way to call .NET assemblies from within Python. Its usage is very similar to that of IronPython and also comparable to that of ctypes which I used in my post about using the ArcGIS projection engine dll (pe.dll) to project coordinates.

To get working binaries for this project you need to make your hands a little bit dirty but with some small effort you're good to go. And although the last release is rather old I've read about people using it on Python 2.6 with .NET 3.5. Before anything I suggest you to read the readme because this will explain and clarify a lot of things. What I did is download the project from sourceforge and open the visual studio solution. Then I applied the patch described here. For more information about this patch and sample code to test whether the patch was applied successfully I suggest you to read this. Finally you have to build the solution and copiy the clr.pyd and Python.Runtime.dll to the root of your Python installation. In my case this is C:\Python24.
To test if everything went fine you can take a look at the demo directory in the pythonnet folder. I had to add import clr before the other import statements to make the samples work. This is not a bug but merely due to the fact that I used the standard Python executable.

One of the possible uses of pythonnet in a GIS context is when you're creating some arcgisscripting code but want to do things that can't be done with the Python bindings of ArcGIS. One of those things is a spatial filter although you can create a workaround with an extra query layer and the SelectLayerByLocation tool. I first created a static method called Hello to test if pythonnet worked with my dll. Then I created the CreateLayerFromSpatialIntersect method. The goal of this method is to do a bounding box query and export the queried features. This method takes input the path to a shapefile, a destination folder and name and the minimum and maximum values of x and y. The found features are exported to a new shapefile. I also added a helper method to get an IFeatureClass object from the shapefile. I also used the LicenseInitializer class. This class is part of the ArcGIS Console application template and has the sole purpose of getting and releasing an ArcGIS license.

using System;
using ESRI.ArcGIS.Geodatabase;
using ESRI.ArcGIS.Geometry;
using ESRI.ArcGIS.GeoDatabaseUI;
using ESRI.ArcGIS.esriSystem;
using ESRI.ArcGIS.DataSourcesFile;
namespace GisSolved.SpatialQueryTool
{
    public class SpatialQueryTool
    {

        public static string Hello()
        {
            return "Hello !!!";
        }
        public static void CreateLayerFromSpatialIntersect(string shapeSourcePath, string destinationFolder, string destinationName, double xMin, double xMax, double yMin,double yMax)
        {
            LicenseInitializer licenseInitializer = new LicenseInitializer();
            licenseInitializer.InitializeApplication(
    new esriLicenseProductCode[] { esriLicenseProductCode.esriLicenseProductCodeArcView },
    new esriLicenseExtensionCode[] { });

            IFeatureClass sourceFeatureClass = GetShapeFile(shapeSourcePath);

            // create  the spatial filter
            ISpatialFilter spatialFilter = new SpatialFilterClass();
            spatialFilter.SpatialRel = esriSpatialRelEnum.esriSpatialRelIntersects;

            IEnvelope boundingBox = new EnvelopeClass();
            boundingBox.XMax = xMax;
            boundingBox.XMin = xMin;
            boundingBox.YMax = yMax;
            boundingBox.YMin = yMin;

            spatialFilter.Geometry = (IGeometry)boundingBox;

            // create other parameters for the export operation
            IDataset sourceDataset = (IDataset)sourceFeatureClass;
            IDatasetName sourceDatasetName = (IDatasetName)sourceDataset.FullName;

            IFeatureClassName destinationClassName = new FeatureClassNameClass();
            IDatasetName destinationDatasetName = (IDatasetName)destinationClassName;

            destinationDatasetName.Name = destinationName;

            IWorkspaceName destinationWorkspaceName = new WorkspaceNameClass();
            destinationWorkspaceName.PathName = destinationFolder;
            destinationWorkspaceName.WorkspaceFactoryProgID = "esriDataSourcesFile.ShapefileWorkspaceFactory";
            destinationDatasetName.WorkspaceName = destinationWorkspaceName;

            // do the actual export
            IExportOperation exportOp = new ExportOperationClass();
            exportOp.ExportFeatureClass(sourceDatasetName, (IQueryFilter)spatialFilter, null, null, destinationClassName, 0);

            licenseInitializer.ShutdownApplication();
        }

        private static IFeatureClass GetShapeFile(string path)
        {
            string workspacePath = System.IO.Path.GetDirectoryName(path);
            string name = System.IO.Path.GetFileName(path);
            IWorkspaceFactory shapefileWorkspaceFactory = new ShapefileWorkspaceFactoryClass();
            IFeatureWorkspace workspace = (IFeatureWorkspace)shapefileWorkspaceFactory.OpenFromFile(workspacePath, 0);
            return workspace.OpenFeatureClass(name);
        }
    }
}

It's easy to use this code from pythhonnet. The first step is to build the .NET project and copy the resulting dll to the same directory as your Python code. Then you need to import clr and add a reference to the dll. Next thing to do is import the desired class from your assembly. Now you can use your class like you would do in IronPython. To test my code I exported 500.000 POIs to a shapefile which I then queried with the same bounding box as in my two previous posts about MongoDb and Rtree.

import clr
clr.AddReference('SpatialQueryTool')
from GisSolved.SpatialQueryTool import SpatialQueryTool

print SpatialQueryTool.Hello()

source= r"D:\source\poi_500000.shp";
destinationFolder = r"D:\destination";
destinationName = "poi_45_50_505_510_2";
SpatialQueryTool.CreateLayerFromSpatialIntersect(source, destinationFolder, destinationName, 4.5, 5.0, 50.5, 51.0)

I hope that this post was useful to you and as always I welcome any comments.

Spatial indexing a MongoDb with Rtree

Update : recent versions of MongoDB support native geospatial indexing.

In my previous post about populating a MongoDb I mentioned the existence of the Rtree package as a solution to the lack of a spatial index mechanism in MongoDb. In this post 'm going to try this out. Rtree is a Python package with the sole purpose of creating spatial indexes. To install Rtree I downloaded the Windows installer and ran the installer.

To test Rtree I decided to create a spatial index for my previously created POI database in MongoDb. To do this I first connect to my running MongoDb instance. Then I create an Rtree disk index in the same directory as my script called poi_index. This creates two files (poi_index.dat and poi_index.idx) who will contain the index. Finally I loop over all the POIs in the database and add the x and y values to the index with as reference the id of the POI.

from pymongo.connection import Connection
from rtree import Rtree

mongo_conn = Connection('localhost', 27017)
poidb = mongo_conn.poidb
pois = poidb.pois

poi_index = Rtree("poi_index", overwrite=1)

for poi in pois.find():
    poi_index.add(poi["ID"], (poi["x"], poi["y"]))

Now we have created an index for the 500.002 POIs in my database. The file size of this index is 350 mb which is more than I expected it to be when compared my Mongo POI database which has a total file size of 460 mb. Querying this index is very easy and very fast. To time my query I added a handy wrapper function. Essentially to query the index the only thing you need to do is create a bounding box and call the intersection method of the index. This returns the ids of the intersecting points.

import time

def print_timing(func):
    def wrapper(*arg):
        t1 = time.time()
        res = func(*arg)
        t2 = time.time()
        print '%s took %0.0f ms' % (func.func_name, (t2-t1)*1000.0)
        return res
    return wrapper

minx, maxx = 4.5, 5.0
miny, maxy = 50.5, 51.0
bbox = ( minx, miny, maxx, maxy )

@print_timing
def intersect(index, bbox):
    print len(index.intersection(bbox)), "hits"
intersect(poi_index, bbox)

On my machine this bounding box query took 265 ms on the MongoDb and only 15 ms with Rtree. This is a great improvement for my query. But the disadvantage is that I still need to query my POIs out of the MongoDb with the ids I got from Rtree. Sadly enough this doesn't go very fast so the total improvement is very small and with lots of points even negative. Strange thing is that when I do an IN query on MongoDb the result is always slower than querying the pois one by one by ID. My temporary conclusion is that Rtree is really fast but you only get the ids. The best solution would be that MongoDb could natively use the Rtree index or another spatial index. A small downside of Rtree is that the id can only be an integer value. So you can't store strings as ids. We have come to the end of my small exploration of Rtree as a mean to add a spatial index to MongoDb. I hope you liked it and I still welcome any comments/suggestions.

Samuel Bosch

Pages

Drag Drop from ArcCatalog

Python Toolbox 4 : Clone Digger

PostGIS : Loading and querying data

Python Toolbox 3 : Pythonnet

Spatial indexing a MongoDb with Rtree