Fast(er) Downloads

Downloading lots of small images using the Earth Engine task system isn’t always the best fit.
  • getRequests() figures out which items to download, and
  • getResult() does the downloading and save-to-file for one item

Caveats

  • If you’ve never used Earth Engine from Python, you’ll need to install the earthengine-api package (I use the pip installation method). That will also install the earthengine command line tool. You’ll also need to install the retry package (also from pip).
  • Make sure you’ve got logging turned on or retry will end up retrying (and hiding) any other errors your script might have.
  • The final results of getDownloadURL() and getThumbURL() are limited to 32MB of data per request. For more than that, you should use the batch system.
  • This script is likely to generate some errors on startup that you can probably ignore (the very first request from each worker might fail), but if you’re getting more than a few errors per second after that, try using a smaller number of pool workers.
  • The division of labor between getRequests() and getResult() is tricky to get right, and it can be tough to decide what to use as a work unit. For instance, you might try to generate a bunch of points with randomPoints() or stratifiedSample(), but if you then refer to one of those points in getResult() by its position or its system:index instead of by its geometry, then each of the download requests would end up having to regenerate the entire collection of random points (from scratch), just to get to the one you’ve specified. The good news is that getting this balance wrong isn’t catastrophic; it’ll just cause the workers to take longer.
  • If you’re just trying to download a bunch of images clipped to an ROI (maybe every Sentinel-2 image over a city), then your getRequests() function might just extract the system:index of every image in the original collection that meets your requirements. Then your getResult() function can filter the original collection down to an ID with: collection.filter(ee.Filter.eq(“system:index”, id)).first() Unlike the point above, in which this was potentially expensive for a programmatically generated FeatureCollection, this is usually quite cheap for filtered/clipped/transformed image collections. (Note, this is probably not true for any image collection generated by mapping over a list, like a collection of temporal composites. In that case, the best work unit might be a date range.
  • If you’re just trying to sample bands from one image (or composite), it can be a lot faster to chop your ROI into a grid and use the grid cells as work units, because a query can handle the points near each other in a single request (using, for instance, reduceRegions()). This would potentially be faster because any input tile might only need to be fetched once for multiple points. (Of course, this only helps if you have more than 1 point per 256x256 input tile).

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Noel Gorelick

Noel Gorelick

I’m a software engineer at Google and one of the founders of Google Earth Engine.