Geo-spatial in-memory caching using Rtree

Having to save a lot of geo-spatial data is a task rarely one passes without being scared for one’s life. Whether you are using a DB like postGIS, an in-memory data store like geoRedis or some other tool, handling geo-spatial data usually mean using RTree. Using RTree directly can ave some very positive effects on your system performance, so we thought we’ll let you in on this industry secret.

What is RTree, you ask? It is a N-dimensional tree data structure for storing and indexing geometrical spaces. The documentation (which seems to be available only for v0.7) gives example for typical commands. Just to have a more visual example, here’s what Wikipedia has to demonstrate this useful data structure:

R-tree.svg
By Skinkie, w:en:Radim BacaOwn work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=9938400

Today we will use it’s Python variant, just for the kicks. This Python package (v0.8.2 at the time of writing) is wrapping libspatialindex (written in C), providing a simple-to-use-not-so-simple-to-optimize RTree. Not perfect bud good enough for most needs – we handled several millions of requests per second on a single machine with this, so you know it’s just OK.

Yeah Yeah, now how would I use it?

Suppose you have a big geo-spatial DB and you’re worried about request latency. Sure, there are some fine solutions out there, with caching and parallelization, but what if you want some small custom query, done on a small subset of items? well – look no further than python RTree.

Here are some typical examples for integrating RTree, possibly along-side with your fancy geo-spatial DB thingy (postGIS anyone?):

  • Match-making: if you want to match two (or more) items based on location, buyers to sellers,  questions to answers, boys to girls – a set of two (or more) RTrees fits like a glove. Each type of request compared against the other’s RTree, and if a match is not found – put into its own RTree for future matching.
  • spatial-based load-balancing: if you had several servers, each handling a different area, you could use RTree to decide which server to defer the request to. Furthermore, you could even add servers and change the area partitioning in flight, by changing the RTree (thread-safety caution is advised).
  • Manual caching: if you want to cache items not by time of insert. You can have one or more RTree(s) by different criteria, to check against upon every incoming request (even in parallel!) before giving up to a sluggish DB with a complex query.

how fast is it?

Well, pretty darn fast, thanks for asking! We ran an incoming queue (using kombu) and were able to handle millions of request per second – RTree intersection() call per-request on a tree with approx. 10K items. Keep in mind that the real performance comes from the C library, and the wrapper just makes it easy for Python coders to use.

Pitfalls & CAVEATS?

Well, a nasty (and of course, undocumented) bug in tree initialization with Index.bounds, for starters. It’s supposed to give you coordinates capturing all the items in the tree, but for some reason when the tree is almost empty (single node, I assume) the coordinates have X and Y swapped.

Another one, documented this time, is the objects="raw" keyword for enumerating functions (intersection(), nearest()) which saves some time on python-wrapping actions.

The GEOREDIS DISCUSSION

As we mentioned in the beginning of this post, RTree is used be many other tools we use daily, including Redis. Moving from an in-code RTree may save you a lot of headache but due to the BETA nature of the geo-spatial part of Redis, and the fact that you will be unable to tweak performance by hand, using RTree in an end product may be inadvisable. Just make sure you do your math.

oh, You’ll need this

When using RTree it’s very helpful to know how to convert from polar to Cartesian coordinates in Python? (in plain English: I have (X,Y) and I want a rectangle of Z meters around it)

from geographiclib import geodesic
def polar2cartezian(latitude, longitude, radius):
    if radius is None:
        return (latitude, longitude, latitude, longitude)
    lat1 = geodesic.Geodesic.WGS84.Direct(latitude, longitude, 0, radius)["lat2"]
    lon1 = geodesic.Geodesic.WGS84.Direct(latitude, longitude, 90, radius)["lon2"]
    lat2 = geodesic.Geodesic.WGS84.Direct(latitude, longitude, 180, radius)["lat2"]
    lon2 = geodesic.Geodesic.WGS84.Direct(latitude, longitude, 270, radius)["lon2"]
    return (min(lat1,lat2), min(lon1,lon2), max(lat1,lat2), max(lon1,lon2))

Adam Lev-Libfeld

A long distance runner, a software architect, an HPC nerd (order may change).

Latest posts by Adam Lev-Libfeld (see all)

Geo-spatial in-memory caching using Rtree

3 thoughts on “Geo-spatial in-memory caching using Rtree

  1. Simplification of polar2cartesian:

    def polar2cartesian(latitude: float, longitude: float, radius: float) -> typing.Tuple[float, float, float, float]:
    “””
    Returns rectangle of `radius` meters around `latitude`, `longitude`
    :param latitude:
    :param longitude:
    :param radius: distance in meters
    :return: (lon, lat, lon, lat)
    “””
    # having (lat, lon), returns a rectangle of Z meters around it
    # from: http://www.nullterminator.org/geo-spatial-in-memory-caching-using-rtree/
    if radius is None:
    return latitude, longitude, latitude, longitude

    max_lat = Geodesic.WGS84.Direct(latitude, longitude, 0, radius)[“lat2”]
    min_lat = 2 * latitude – max_lat

    max_lon = Geodesic.WGS84.Direct(latitude, longitude, 90, radius)[“lon2”]
    min_lon = 2 * longitude – max_lon

    return min_lon, min_lat, max_lon, max_lat

  2. Alexander, as you noted, this is far from the fastest way to implement the polar2cartesian function. That said, we are really not that concerned with the performance of this specific code segment, as the data store search is WAAAY more costly.
    As for the 8th decimal place for large distances, HPC is the epitome of the battle performance vs. accuracy, and this is a very good example.

Leave a Reply

Your email address will not be published. Required fields are marked *