(Proposed) GAIA Node Selection Process


Firstly, the following assumptions are made about the datasets and nodes.

  • Nodes will not contain all datasets (space, manpower, politics).
  • Nodes will be out of date (it is not possible to synchronise new images to all participating nodes simultaneously).
  • Some nodes will be unavailable due to network reasons.
  • Some data providers will not let us copy their datasets, and thus we must use a non-GAIA web server to access those images. (An equivalent situation is that a new contributor will wish to contribute far too many images for the existing nodes to handle).
  • Image locations will be consistent across all nodes which host a given dataset.

Proposed Node Selection Process

  • User chooses an appropriate node. HTML documents, REST web services etc. are always served by that node.
  • Client program checks data availability for given data channel for chosen day; if it is known that no data exists display blank image and stop, otherwise assume it exists somewhere.
  • Client program requests image from the node closest to the selected node (with luck it will be the selected node).
  • If the image cannot be fetched (web server unavailable, image missing etc) then the image is requested from the next closest node which hosts that dataset. Repeat until no more nodes can be tried.
  • If the image (also) resides on a backup server request data from that location.
  • If the image still missing then redirect to a missing data image (a red cross like IE uses?)

info While it may be preferable to try loading from a non-GAIA server when it is closer than a GAIA node it adds extra complications to the node selection procedure.


From the assumptions and proposed node selection process described above the following requirements are derived:

  • Switching to alternative image locations must be performed in the browser (no where else is it possible to see any network problems the user is experiencing).
  • Document links must not contain hostnames (so that HTML documents are fetched from the selected node).
  • Data image locations must contain hostnames for node selection to work correctly.
  • Latitude/longitude values for all nodes must be known so that the nodes can be ranked by distance.


  • The contributed PostgreSQL earthdistance function will be useful to rank nodes by distance from the selected node.
  • By only measuring distances between the selected node and other nodes it is not necessary to know the user's location.
  • It will be helpful to know the latitude and longitude of non-node data sources so that if appropriate they can be used before nodes further away are tried. However, if not known then it can be arranged that they are tried last.
  • PHP has a GeoIP function which we could use for identifying the node closest to the user. However, I (SteveMarple) suggest we don't try to be too clever, and instead allow the user to select the node closet to him/herself.
  • The GeoIP function may be useful to redirect from (say) www.gaia-vxo.org to canada.vxo.org when the user loads the first page. However, if the user can't connect to canada.vxo.org then there's trouble! Perhaps stick to using a round-robin DNS scheme for www.gaia-vxo.org.

Node selection process as implemented (2007-01-21)

  • The node closest to the selected node is contacted first. If the data cannot be loaded (either not accessible, missing or the server is offline (detected by a 10s timeout) then the next node is tried. Nodes which appear to be offline are marked as such, and so are not used when loading other images.
  • Process continues until the image is successfully loaded (cancel timeout!) or no more nodes can be tried.
  • todo.gif Try fallback server if it exists.
  • todo.gif Redirect to missing image icon (pass width and height as query parameters, or use path_info, so that the most appropriate image can be returned.

tip Could use a REST webservice to notify GAIA if nodes appear to be offline. To avoid denial-of-service attacks it should be informational, rather than causing a server to marked offline in the database.

Topic revision: r7 - 2007-03-16 - 22:57:51 - SteveMarple
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback