THEMIS Data
THEMIS high resolution image data consist of grayscale frames with 256x256 16-bit pixels. They are typically acquired at a 3-secoond cadence with all frames from each minute (usually 20) placed in a single PGM file. These PGM files are compressed using
gzip to reduce storage requirements and to provide integrity checking. The central PostrgreSQL database in Calgary provides a catalog of file and frame metadata.
A set of RESTful web-services will be implemented to provide access to the high resolution ("stream0") image data. These will be used as the basis for a THEMIS data browser but are also intended to meet the needs of GAIA. This page is intended to formalize and dodument the interface details.
An earlier implementation of RESTish data access is given at the bottom of this page
Questions
- What is the "best" way of selecting sub-frames within a file? The existing interface uses a dot eg. id/99.3 to select the 4th (zero-based) frame from the file with database ID 9. The potential for conflict with file name extensions is obvious. Other possibilities are
- "glob" eg. 99[3] -same as ImageMagick
- number eg. 99#3 -like URL fragment
- cgi eg. 99?frame

The
#name URL fragment would normally be appended after the URL. However, a web browser may not be able to distinguish that
http://example.com/image.png#1 is different from
http://example.com/image.png#2, since in normal HTML terms they would be the same resource. Using a question mark would probably stop webcaches from caching the image. Perhaps the image number can be appended as
/n and retrieved by the script using path info. --
SteveMarple - 16 Feb 2007
http-img (AKA data)
This "channel" returns a single image using HTTP. The image is determined by using one of the available unique identifiers (see below) to select a data file. Sub-frame selection is optional. No search or wild-card matching will be provided. Image format defaults to JPEG to minimize network traffic, but can be overridden. Additional processing and color mapping capabilities may be available.
dbid
Each file is assigned an identifier (ID) that is guaranteed to be unique within a particular insubstantiation of the database.
eg. <img src="http://themis-data/rest/stream0/http-img/dbid/99#3.png">

How many images will be stored? Will you need to use
bigserial?
md5
A md5 checksum of each uncompressed file is stored in the database. This is primarily intended as a compact record to confirm file integrity. However, it can also be used as a universal identifier that is statistically unique.Thi primary drawback is that the 128-bit checksum must(?) be stored in
PostgreSQL as a
char(32) and searching is slower than using an
int4.
eg. <img src="http://themis-data/rest/stream0/http-img/md5/bdd3cd1f940d7dbbc3b0dc402f6c3df9#3.png">
file
This is primarily intended for debugging, as it requires an intimate knowledge of file naming conventions.
eg. <img src="http://themis-data/rest/stream0/http-img/file/20050221_0232_ekat_themis01_full_1000ms.pgm.gz#3.png">
uuid
Join site UID, device UID, and ISO time with underscores. Very slow at the database side but nicer for humans.
eg. <img src="http://themis-data/rest/stream0/http-img/uuid/ekat_themis01_20050221T0232#3.png">
img-meta (AKA info)
Same as http-img, but return image metadata in one of the following formats
- flat text (ASCII)
- HTML (eg. unordered list)
- XML
- JSON
sql-list (AKA list)
Allow users to query the database for images matching time and site (location) constraints. Return zero or more matches as URLs to either http-img or img-meta in one ot the following formats

The
PHP_Element class GAIA is using supports all but flat text. Also support Matlab output. --
SteveMarple - 16 Feb 2007
SQL excerpts
themis0-files
CREATE SEQUENCE file_seq MINVALUE -2147483647 INCREMENT 1 START 1 CYCLE;
CREATE TABLE files (
-- core values required for complete "registration"
id int4 PRIMARY KEY DEFAULT nextval('file_seq'),
path varchar(40) NOT NULL CONSTRAINT valid_path CHECK
(path ~* '^[0-9]{4}/[0-9]{2}/[0-9]{2}/.+_.+/ut[0-9]{2}$'),
name varchar(64) NOT NULL CONSTRAINT valid_name CHECK
(name ~* '^[0-9]{8}_[0-9]{2,6}_.+_.+_.+\.(pgm)|(pnm)(\.gz)?$') UNIQUE,
mtime int4, --y2037 bug
--timestamp without time zone NOT NULL, --last modified
-- remaining values may be NULL which means "not done yet"
nbytes int4 CHECK (nbytes>=0), --number of bytes in uncompressed file
nbytes_packed int4 CHECK (nbytes_packed>=0), --number of bytes after compression (null if not compressed)
md5sum char(32) CONSTRAINT valid_md5sum CHECK
(md5sum ~* '^[0-9a-f]{32}$'), --should be UNIQUE but don't enforce here
-- nframes int2 CHECK (nframes>=0), --number of image frames in the file
-- mode_id int2 REFERENCES modes(id)
) WITHOUT OIDS;
themis0-frames
CREATE TABLE frames (
file_id int4 REFERENCES files(id), --4 bytes
mode_id int2 REFERENCES modes(id), --2 bytes
timestamp timestamp without time zone NOT NULL, --8 bytes date/time with 0.01s resolution
offset int2[] --2 bytes * nframes
duration int2[] CHECK(duration>=0), --2 bytes * nframes
-- imager_id int2 REFERENCES imagers(id), --2 bytes, could get from mode_id
-- site_id int2 REFERENCES sites(id), --2 bytes, could get from imager_id
-- UNIQUE(file_id,number) --how much does this slow things down?
) WITHOUT OIDS;
Proof of concept
My first attempt at providing web-based access to high-resolution THEMIS data was implemented last spring.
Time requirements (on themis-data) are roughly
- 15ms to initialize the CGI script and access the database
- 30ms to unzip the data file, extract a frame, convert to jpeg, and write to stdout
Examples:
http://themis-data.phys.ucalgary.ca:8080/db/data/id/99999
http://themis-data.phys.ucalgary.ca:8080/db/data/file/20050221_0232_ekat_themis01_full_1000ms.pgm.gz
http://themis-data.phys.ucalgary.ca:8080/db/data/id/99999.4
http://themis-data.phys.ucalgary.ca:8080/db/data/id/99999?rotate=clockwise&transform=normalize&colormap=red-green&format=png
--
BrianJackel - 16 Feb 2007