Showing posts with label PostgreSQL. Show all posts
Showing posts with label PostgreSQL. Show all posts

Selecting comma separated data as multiple rows with SQLite

A while back I needed to split data stored in one column as a comma separated string into multiple rows in a SQL query from a SQLite database.

My table looked like this:

CREATE TABLE "predictor_sets" 
  (`id` INTEGER, `nvar` INTEGER, `predictors` TEXT, 
    `experiment` TEXT, PRIMARY KEY(`id`));

Insert some sample data:

INSERT INTO predictor_sets VALUES
  (1659, 5, 'BO_bathymax,BO_calcite,BO_parmax,BO_damean,BO_sstrange', 'bathymetry'),
  (1660, 5, 'BO_bathymin,BO_calcite,BO_parmax,BO_damean,BO_sstrange', 'bathymetry'),
  (1661, 5, 'BO_bathymean,BO_calcite,BO_parmax,BO_damean,BO_sstrange', 'bathymetry');

Splitting up the different predictors in seperate rows can be done with the following recursive common table expression:


WITH RECURSIVE split(predictorset_id, predictor_name, rest) AS (
  SELECT id, '', predictors || ',' FROM predictor_sets WHERE id
   UNION ALL
  SELECT predictorset_id, 
         substr(rest, 0, instr(rest, ',')),
         substr(rest, instr(rest, ',')+1)
    FROM split
   WHERE rest <> '')
SELECT predictorset_id, predictor_name 
  FROM split 
 WHERE predictor_name <> ''
 ORDER BY predictorset_id, predictor_name;

Check out the documentation for more info on writing your own common table expressions in SQLite, PostgreSQL or your favorite database.

What I learned: Part 2 PostgreSQL and PostGIS

In my previous post in the "What I learned" series on Linux commands. Today its time to share some things that I learned while working with PostgreSQL and PostGIS.

Problem: Where are the PostgreSQL logs in ubuntu?

On Ubuntu you can find them here:
cd /var/log/postgresql
With the tail command you can keep an eye on the logs while it is being written to. On my system the exact command for doing this is:
tail -f /var/log/postgresql/postgresql-9.1-main.log

tail postgresql log

Problem: Copying a subset of a table

I needed to develop some SQL scripts for some very big tables. To be able to quickly test my work I decided to copy the tables to a new schema but with a limited set of records.
CREATE TABLE  AS (SELECT * FROM  LIMIT );
The PostgreSQL documentation contains more information on this topic and some extra samples.

If you also want to copy the original column constraints then its better to create your table with the following two commands:
CREATE TABLE tbl_new (LIKE tbl_org INCLUDING CONSTRAINTS);

INSERT INTO tbl_new 
SELECT col_a, col_b, array[col] aS col_c
FROM tbl_org;

Note that primary and foreign keys and indexes aren't created so if you want them you'll have to create them for your new table.

Problem: Typing the same query over and over again in pgAdmin

Fortunatly you can add macros to pgAdmin with following steps:
  1. Open a query window in pgAdmin
  2. Click Macros -> Manage Macros
  3. Click the key combination that you'd like and type a name and the actual query and hit save.
  4. Repeat for all other macro queries that you'd like
pgAdmin macros screenshot

Some sample macros:
-- inspect the first 100 records of a table. 
-- Select the table and hit the keyboard shortcut you've picked.
SELECT * FROM $SELECTION$ LIMIT 100
-- how many records does a statement return. Also useful for timing queries and subqueries without having to load all the results in your Data Output window.
SELECT COUNT(*) FROM ($SELECTION$) counttable
Some extra information can be found in the official docs.

Problem: Getting the centroid of multiple points in PostGIS

SELECT ST_CENTROID(ST_COLLECT(geom)) FROM points
Note that ST_Collect is many times faster than ST_Union. The difference is that ST_Collect puts the geometries into one multi geometry and ST_Union combines geometries whenever they overlap.

Problem: Creating an array from a query in PostgreSQL

SELECT array(SELECT id FROM table)
Or
SELECT array_agg(id) FROM table

Problem: Repair polygons that are not closed in PostGIS 1.5.4

Solution add the first point at the end of the exterior ring of the polygon when the first point is not the same as the last point. Note that polygons lose their inner rings if there were any. See also: http://lists.osgeo.org/pipermail/postgis-users/2011-December/031615.html
WITH fixed_geom As (
  SELECT gid, ST_Collect(ST_BuildArea(ST_AddPoint(ST_EXTERIORRING(geom), ST_StartPoint(ST_EXTERIORRING(geom)), -1 /* append the point */))) geom
    FROM (SELECT gid,(ST_DUMP(geom)).geom geom FROM qc.global_shorelines) l
   WHERE NOT ST_StartPoint(ST_EXTERIORRING(geom)) = ST_EndPoint(ST_EXTERIORRING(geom))
   GROUP BY gid
)
UPDATE qc.global_shorelines gs SET geom = f.geom
  FROM fixed_geom f
 WHERE gs.gid  = f.gid

Problem : Executing a SQL script within psql

If you want to execute a SQL script within psql then start psql and type:
\i "path .. to .. file"

Problem: Cancelling a command with psql 

To cancel a command hit Ctrl+c and to quit Ctrl+d. An alternative for quitting is typing \q.

Problem: Writing output of psql to a file

If you want to redirect psql output to a file instead of to the prompt then type:
\o filename.here
Type \o again to restore to the standard output.

Problem: Get execution time from query in psql

\timing

Problem: Listing the open connections and active sql queries

SELECT datname,usename,procpid,client_addr,waiting,query_start,current_query FROM pg_stat_activity;

Problem: killing all open connections to a database

SELECT pg_terminate_backend(pg_stat_activity.procpid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = 'YOUR_DATABASE_NAME'
  AND procpid <> pg_backend_pid()

Problem: Copying data from one server to another with psql

\c dbname username source_db_server
COPY (SELECT * FROM your_table WHERE some_filter) TO '/data/table_copy';
\c dbname username destination_db_server
COPY '/data/table_copy' TO your_table

Problem: On the fly compression of the copy output

To solve this problem in PostgreSQL 9.3 an option was added to the COPY command to directly pipe the output through an external program.
This can be used to pipe the copy output to gzip in the following way:
COPY test TO PROGRAM 'gzip > /data/test.gz';
To achieve the same behavior on older systems you can copy to STDOUT with psql and then pipe this to gzip.
psql -c "COPY test TO STDOUT WITH CSV" | gzip > /data/test.gz

Problem: Tired of typing your password

After filling in my password for the n-th time while using tools like pg_dump and pg_restore I searched for a solution:
export PGPASSWORD=

Conclusions

I hope some of the tips on PostgreSQL and PostGIS might come in handy some day. If you want to learn more on PostgreSQL I would suggest you to check the excellent documentation. For all things related to performance there is the excellent book: PostgreSQL 9.0 High Performance.
The best book on PostGIS is definitely PostGIS in Action. The second edition is coming out in may 28, 2014.

Related Posts

Loading and querying spatial data with PostGIS
What I learned: tmux, top, ls and pscp

PostGIS : Loading and querying data

Today I'm going to load some POIs (points of interest) in a PostGIS database and query them back. To do this you'll first need an installed version of PostgreSQL with PostGIS. I suggest to first download and install PostgreSQL and then install PostGIS with the latest version. Then you should create a spatially enabled database. I called mine pois_db.

Now we are ready to import the POIs. I first extracted 500.000 POIs from a file geodatabase into a shapefile with the ArcGIS Select tool. Then I used the PostGIS shp2pgsql tool which is located in your bin folder of your PostgreSQL installation to create a text file. This file can then be used to import the data in your database with the PostgreSQL psql tool. My commandline looked like below.

rem -s : srid (spatial reference id)
rem -c : creates a new table and populates it
rem -D : use PostgreSQL dump format
rem -i create a GiST index on the geometry column
"C:\Program Files\PostgreSQL\8.3\bin\shp2pgsql.exe" -s 4326 -c -D -i poi_500000.shp pois_db > pois.sql

More information about the usage of shp2pgsql can be found here. Note that I set the srid to 4326 which stands for WGS84. On spatialreference.org you can find a list of spatial reference ids or you can take a look at the spatial_ref_sys table in your database.
Next thing we are going to is load the pois.sql in the pois_db with the following command line.

"C:\Program Files\PostgreSQL\8.3\bin\psql" -d pois_db -f pois.sql

To query the POIs we issue the following sql command. This query checks whether the bounding box of the point intersects with the bounding box of a created rectangle polygon.

SELECT *
FROM pois
WHERE the_geom && GeomFromText('POLYGON((4.5 50.5, 5.0 50.5, 5.0 51.0, 4.5 51.0, 4.5 50.5))',4326);

This query takes in pgAdmin III with a gist index on the geometry column only 515 ms for more then 4000 found POIs. If I only ask the ids the query is down to 67 ms which is slightly slower then with Rtree but faster then with MongoDb.

I've come to the end of this post. I hope you liked it and leave a comment if you want to.

Related Posts
Pythonnet
MongoDB
Rtree