Tuesday, 8 September 2015

GPX Reading for latter processing

Whenever I go for hiking, running or climbing to the mountains, I usually record all my tracks with a GPS, so, I end up with hundred of GPS files on my computer. GPX is the universal format to store GPS track data, and it has been defined as an schema of XML. The definition of the GPX format is described in the topographix website.

In order to extract interesting data from all the tracks I have along these years, I have written several snippets of code in different languages to convert the GPX file to another format better adapted for further processing. Once we have the track data in a suitable format, we can look deeper into the data of our outdoor activities, and even apply some machine learning on them to get interesting information, as we will see in future posts.

When dealing with GPX files, first option should always be to have a look at the great open-source GPSBabel. GPSBabel is a command line application able to translate GPS data between a plethora of formats. GPSBabel is usually launched directly from a terminal windows, but, as a first step to deal with  GPX data, we can also call it from another language such as R or Python.

In the case of R language, we can use the system() call function to launch the gpsbabel command with the correct arguments so that we read a .gpx file and convert it to a text file. We can apply the filters provided by gpsbabel to include or exclude waypoints, routes or other information within the gpx we are not interested in. The system() call will return a file (pipe) descriptor which can be read with the textConnection data. Then, passing the return of textConnection() to the read.table() function, we will have a data.frame object with the points of the GPS track. Further processing will be needed to extract the columns we are interested in.

A better and simpler option is to read directly the GPX file from R, using the XML library. The XML library allows to parse XML file and run X-Path query on the data. With four simple X-Path query, we will be able to read the latitude, longitude, elevation and timestamp information for each of the points in the GPS track. If there is more information on the file, for example, Suunto watches also stores heart bpm, we can easily modify the code so that we include this information as well.  The following code shows how to apply this method in R. The code is part of a bigger project in R which I will release as an R-package:

Sometimes I have needed to do some processing directly from the command line. To that purpose, I wrote this bash script using xmllint and sed which extracts the information from the GPX file and store it in a text output file.

Using an XML library able to query X-Path expressions, is easy to extend this method to other languages such as Python or C++. I will add the code at some time in the future at this post If I ever need to write such functions in other programming languages.

1 comment:

  1. hello, could you please give the xpath query for gte:gps speed?