Fast way to grid and sum coordinates?


Hi everyone. Say you have a bunch of x and y coordinates ranging between 0 and 100 (on both axes).

I need to count how many points are found in variously-sized grids covering this space. For example, increments of 10:

I can calculate this with two nested loops, returning (for this example) a vector 100 long with counts of point in each grid square. However, this gets really slow for large datasets, very small grid sizes, and when including a third dimension.

Anyone know a really fast way to do this kind of thing? (would be especially good if it uses lapply, because you can ship this out to multiple processors using mclapply)

Here’s the code to generate the data and plot:

x <- rnorm(500, 50, 10)
y <- rnorm(500, 50, 10)
plot(x, y, xlim=c(0, 100), ylim=c(0, 100))
abline(h=seq(0, 100, 10), col=“grey”)
abline(v=seq(0, 100, 10), col=“grey”)

Thanks. Josh


Hi Josh, did you actually try the for loop example? if so can you paste code for comparison? A solution via lapply could of course be coded (any loop can be recast as an lapply object), but may not necessarily be any faster, as you’d still have n^2 function calls for an nxn grid.


And what exactly do you need as return value?

  1. The number of points in each grid cell (return value is n*n matrix)
  2. The total number of cells occupied? (return value is single scalar)
  3. Other?


Hi Dan. I need the total number of cells occupied. The points are actually in 3D space, hence my reluctance to paste the code. I overlay progressively smaller cube arrays onto the surface, and count the number of cubes that contain at least 1 xyz point. I’ll paste the code below.

The data is called temp and has columns for x, y and z coordinates:

Here’s the code (with three nested loops):

# Define the corner of a 10m bounding box
x1 <- min(temp[,1]) + (max(temp[,1]) - min(temp[,1]))/2 - 5
y1 <- min(temp[,2]) + (max(temp[,2]) - min(temp[,2]))/2 - 5
z1 <- min(temp[,3])

# Vector of cube dimensions, 10, 5, 2.5, etc.
i_vec <- 10 / (2^(0:7))

store <- c()

for (i in i_vec) {

  steps <- 10/i
  count <- 0

  for (x in seq(x1, by=i, length=steps)) {
    for (y in seq(y1, by=i, length=steps)) {

      z_seq <- seq(z1, by=i, length=steps) 
      # note that the z (height) dimension is small, so truncate this to save time counting in empty space above the surface
      for (z in z_seq[z_seq < max(temp[,3])]) {

        temp2 <- temp[temp[,1] > x & temp[,1] <= (x+i) & temp[,2] > y & temp[,2] <= (y+i) & temp[,3] > z & temp[,3] <= (z+i),]

        if (length(temp2) > 0) {
          count <- count + 1
  store <- c(store, count)


Thanks, helps me see what you’re aiming for. But otherwise that was a great idea to paste a minimal working example. I think this algorithm should scale pretty well to 3D. The example shows how you would do it for a single grid size, then just wrap into a function taking n as argument:

# start with example code from above
x <- rnorm(500, 50, 10)
y <- rnorm(500, 50, 10)
plot(x, y, xlim=c(0, 100), ylim=c(0, 100))
abline(h=seq(0, 100, 10), col="grey")
abline(v=seq(0, 100, 10), col="grey")

# now define the number of bins we want
n <-10
output <- matrix(NA, 10, 10)

# now define a function that takes original data and rescales it onto axis 0...n 
rescale <- function(x, x0, xm, n) {
  (x - x0)/(xm - x0)*n

xi <- rescale(x, 0, 100, 10)
yi <- rescale(y, 0, 100, 10)

plot(xi, yi, xlim=c(0, 10), ylim=c(0, 10), col="grey", pch=16)
points(xi, yi)
abline(h=seq(0, 10, 1), col="grey")
abline(v=seq(0, 10, 1), col="grey")

# now we can do something similar, but this time taking the index part as the exact bin

xii <- ceiling(rescale(x, 0, 100, 10))
yii <- ceiling(rescale(y, 0, 100, 10))
points(xii-0.5, yii-0.5, col="red", pch=16, cex=0.5)

# finally, use table to count the number in occupied cells
(ret <- table(xii, yii))

# Total number occupied cells
sum(ret > 0)

# And just to be sure let's plot our results 
x2 <- as.numeric(rownames(ret))[row(ret)]
y2 <- as.numeric(colnames(ret))[col(ret)]
text(x2-0.4, y2-0.4, as.vector(ret), col="red")

A benefit of this approach is that the last step only counts filled cells, as you can see in the image.

So you’re eventual solution might be:

rescale <- function(x, x0, xm, n) {
  (x - x0)/(xm - x0)*n

count_filled_cells <- function(x, y, xmin, xmax, ymin, ymax, ngridx, ngridy) {
  ret <- table(ceiling(rescale(x, xmin, xmax, ngridx)), ceiling(rescale(y, ymin, ymax, ngridy)))
  sum(ret > 0)

count_filled_cells(x, y, 0, 100, 0, 100, 10, 10)

# now apply across range of grid sizes

sizes <- c(2, 5, 10, 20)
answer <- sapply(sizes, function(n) count_filled_cells(x, y, 0, 100, 0, 100, n, n))


Hi Josh,

I have used code similar to the following for years for 2-D work:

# Establish grid parameters
xMin <- 0
xMax <- 100
xStep <- 10
nXbins <- trunc((xMax - xMin)/xStep)
yMin <- 0
yMax <- 100
yStep <- 10
nYbins <- trunc((yMax - yMin)/yStep)

# Ubiqitous test data...
x <- rnorm(500,50,10)
y <- rnorm(500,50,10)

# Compute row and col indices for the points
xBins <- trunc((x - xMin)/xStep) + 1
yBins <- trunc((y - yMin)/yStep) + 1

# Convert row-col pairs to a linear cell index
cellInd <- xBins + (yBins - 1)*nYbins

# Bin them
zz <- rle(sort(cellInd))

# Convert back to row and col values
yOut <- zz$values %/% nYbins
xOut <- zz$values %% nYbins

Dan’s approach is very intersting. It turns out that his idea has been used for 3-D counts in cubes.
See the following post which you should see oif you Google “R count points in cubes”.

(Sorry, I had to be obtuse with the stackoverflow post because I cannot include a link in this post - I have been attemping to workaround this since yesterday evening and have resorted to this ugly hack. The error message I’m given is:

“Sorry, new users can only put 2 links in a post.”

But I only included 1 link!!??)


Peter Wilson


Thanks, Dan and Peter! I just implemented Dan’s code and added the third dimension like this:

count_filled_cells <- function(x, y, z, xmin, xmax, ymin, ymax, zmin, zmax, ngridx, ngridy, ngridz) {
  ret <- table(ceiling(rescale(x, xmin, xmax, ngridx)), ceiling(rescale(y, ymin, ymax, ngridy)), ceiling(rescale(z, zmin, zmax, ngridz)))
  sum(ret > 0)

The clever use of rescaling and ceiling sped my code up by over 2000-fold! My approach took 3 hours yesterday… Dan’s approach took 10 seconds. I then used mclapply (instead of sapply) and got it down to 5 seconds (I guess my computer has two processors!).

Peter, it looks like your code is a similar approach to Dan’s. I did do a bunch of searches yesterday, but must have been using the wrong search terms; kept getting GIS rasterize-like solutions.

Anyway, problem is definitely solved!