This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Susan, > I got a note from Dave Johnson that Serafin has > okayed the satellite receive station. Do you have > any knowledge of the Terascan format they use... > isn't it some variant of netCDF? It's TDF, the TeraScan Data Format. It was developed by Joe Fahle, one of the early contributors to the netCDF interface design. TDF and netCDF have diverged quite a bit since then, with TDF not being constrained by the necessity for a Fortran interface or use of XDR for storing data portably. TDF is a more advanced interface in many ways, implementing - a way to subset data without copying it and to reference data in other files ("assemblies"); - a way to import ASCII or binary data from files in other formats without copying or converting it ("instant import"); - built in handling of time; - built in support for georeferencing using rectangular, mercator, utm, polyconic, oblique stereographic, and polar stereographic projections; - support for "relations" (ordered lists of variables) besides dimensions, variables, and attributes; - support for a "string" data type; - built-in (required) attributes (e.g. units, badval, scale, offset, ...); Here's some comments from Joe in recent correspondence with him: > We have an hdf-to-tdf converter (and vice-versa). Dave Wilensky has > done a lot of netcdf <-> tdf conversions at LSU. > > Because of the XDR issue, we probably cannot do and 'uninstantiated' > import of a netcdf dataset, which would be the best solution. > > The converter approach pretty much is the way to go. > > Joe > > PS. Not having the XDR stuff has kept Terascan from running on PC's. > I have programmed up a solution that takes care of this without using > XDR's, but no questions, XDR's are the way to go for data portability > > As far as a C++ interface, that wouldn't take more than a day to whip > that up. But I'll never go back to Fortran. I've appended a man-page overview of TDF. --Russ DATASETS(7) TeraScan Overview DATASETS(7) NAME datasets - TeraScan common data format (TDF) SYNOPSIS lib/libcdf.a DESCRIPTION Introduction Each TeraScan dataset is a separate UNIX file organized in the TeraScan common data format (TDF). The TDF is an extremely versatile file format that is capable of assimi- lating a wide variety of data types, shapes and sizes. For example, a single dataset could contain satellite image data, random _i_n-_s_i_t_u data, and 3-D model data. The TDF was developed during the same period that NASA developed the Common Data Format (CDF) [Treinish and Gough, 1987], and served as a basis for the UNIDATA Network Common Data Format (netCDF) [Rew, 1988]. The TDF has been substan- tially upgraded since then. Dimensions, variables, relations and attributes are the basic dataset components. Variables are simply arrays of data; dimensions define the sizes of these arrays. Rela- tions are ordered lists of variables. Attributes hold information about the dataset as a whole, or about indivi- dual variables, dimensions or relations. Only datasets, variables, and relations can currently have application defined attributes. The following datatypes may be used to define variables and attributes; byte, short, long, float, double, and string. Codes and ranges for these datatypes are defined in include/gp.h. String is a variable-width datatype, i.e., the number of bytes required to store one element is appli- cation defined. Applications can implement a complex-valued variable by adding an extra dimension of length 2 to the variable. Normally all dataset definitions and data are stored in a single UNIX file. However, a dataset can reference vari- ables from several files using _l_i_n_k_s. _L_i_n_k_s allow rapid import of non-TDF data, and support lightweight dataset sub- sets and assemblies. Programming Interface TDF access routines are independent of any other TeraScan software component except lib/utils.a. See dirfile(3), misc(3), and terrno(3). Therefore, TDF applications can be written without using TeraScan user interface or earth TeraScan Last change: 1/13/93 1 DATASETS(7) TeraScan Overview DATASETS(7) transform facilities. TDF calls can be embedded in existing non-TeraScan applications as desired. TDF datatypes, constants, and error status codes are defined in include/gp.h. Object Pointers The basic TDF objects are sets, dimensions, variables, and relations. Application-defined attributes are not con- sidered objects, even though they can be treated as such. Files are secondary objects, and are of only passing concern to applications. Pointers to objects (actually object data structures) are returned by search or definition functions. These pointers are used as arguments to other functions. All data struc- tures have magic numbers and alignment criteria which helps to identify bogus pointers. A pointer to an object's data structure is "pinned" (i.e., can never change) until the object is no longer available (i.e. the containing dataset is closed). All application accessible data structures exist in memory that is allocated using UNIX malloc(). malloc is used spar- ingly and in an unfragmented manner, so as not to impact applications which also use and (possibly abuse) malloc. Applications cannot be prevented from modifying data struc- tures, even for datasets opened as readonly. Given this, it was decided to let applications perform all operations except variable I/O for readonly datasets, including defin- ing new variables, relations, and attributes. One obvious disadvantage of having application accessible data structures is that applications will undoubtably trash them more easily than if they were hidden. All data struc- ture components should be considered readonly, unless other- wise specified. Applications can loop through a list of similar objects (e.g. all dimensions belonging to a dataset) using while (pointer != NULL) pointer = pointer->next; Attributes Attributes refine the definitions of datasets and their com- ponents. There are two kinds of attributes: - Built-in attributes, i.e., fields in application acces- sible data structures (See include/gp.h) TeraScan Last change: 1/13/93 2 DATASETS(7) TeraScan Overview DATASETS(7) - Application-defined attributes, created using the define or copy attribute functions Dimensions and files do not have application-defined attri- butes. The only file attribute of any interest to applica- tions is _f_i_l_e->_p_a_t_h, which is built-in. Application-defined dimension attributes may be added in the future. Note that applications are free to change names directly and potentially generate name conflicts within a dataset. This is the least harmful of all the ways applications can damage datasets. Different objects can have attributes with the same name, but with different datatypes or lengths. This new flexibil- ity should be used cautiously; two attributes with different meaning should never have the same name. The following built-in attributes are intended for use by applications; only those marked (*) can be set directly by applications. * dim->name - dimension name dim->unlimited - non-zero if dimension can grow dim->size - current size * dim->coord - dimension coordinate * dim->scale - orig index = index * scale + offset * dim->offset * var->name - variable name * var->units - units var->type - datatype * var->badval - missing value as stored on disk * var->usemin - minimum valid stored value * var->usemax - maximum valid stored value * var->scale - true value = scale * stored value + offset * var->offset * rel->name - relation name * rel->kind - relation kind (analogous to variable units) * att->name - attribute name * att->units - attribute units att->type - datatype att->size - number of elements in attribute file->path - file path name Application defined attributes are normally not accessed like objects. Their values are set and retrieved by name, rather than by pointer. Pointers to attribute definitions are available for getting attribute datatype, lengths, and TeraScan Last change: 1/13/93 3 DATASETS(7) TeraScan Overview DATASETS(7) units, as well as looping through lists of attributes. Application Defined Relationships The new abstraction, "relation", has been added to datasets. A relation consists of an ordered list of variables all belonging to the same dataset. Relations have built-in attributes "name" and "kind", where relation kind is analo- gous to variable units. Relations also can have application-defined attributes. The number and order of the variables associated by a relation, as well its application-defined attributes, are determined by its kind. The following is an example of how relations can be used: Given a variable "date" that contains an ordered list of dates, a variable "year" that contains an ordered list of years, and a variable "year_index" that is defined as fol- lows: index[i] = j if k > j => date[k] >= year[i] define the relation "year_index" of kind "sparse_index", consisting of the ordered tuple (date, year, year_index). (Obviously, "date" and "year" must have the same units for this to work.) Builtin Relationships The following relationships are built-in to application accessible data structures; only those marked (*) can be changed directly by applications: var->dim[], var->ndims - variable has dimensions rel->var[], rel->nvars - relation relates variables * dim->var - a dimension can get its values from a variable i.e., value coresponding to dim=i is var[i] var->file - a variable's data is stored in a file set->natts, set->att, att->next - dataset has attributes var->natts, var->att, att->next - variable has attributes rel->natts, rel->att, att->next - relation has attributes set->ndims, set->dim, dim->next - a dataset has dimensions set->nvars, set->var, var->next - a dataset has variables set->nrels, set->rel, rel->next - a dataset has relations firstset, set->next - a program has a list of datasets dim->owner - a dimension belongs to a dataset var->owner - a variable belongs to a dataset TeraScan Last change: 1/13/93 4 DATASETS(7) TeraScan Overview DATASETS(7) rel->owner - a relation belongs to a dataset att->owner - an attribute belongs to a dataset, variable, or relation Pointers are used to represent all built-in relationships. Linked lists are used for all "has" relationships except two: var->dim[] and rel->var[]. In both cases, these asso- ciations are "many-to-many". Linked lists are impractical due to multi-threading. Instead, variable dimensions and relation variables are stored in arrays. The number of vari- able dimensions is limited (e.g., GP_VAR_DIMS = 5). There is no limit on the number of relation variables. Some built-in relationships are circular; e.g. var->dim[] and dim->var, or set->var and var->owner. Due to the hierarchical nature of declarations in C, some of these pointers have to be declared of type "char", which is unfor- tunate. Scaled Variable Data In original TeraScan datasets, information for converting 8-bit or 16-bit data to real values was stored in applica- tion defined scaling attributes. Now, scaling attributes are built-in to all variables, regardless of datatype. var->scale and var->offset are used to convert stored data to its true form: true value = var->scale * stored value + var->offset Note, built-in attributes var->badval, var->usemin and var- >usemax all refer to stored values. When presenting these attributes to users, applications may want to apply scaling to at least var->usemin and var->usemax. The most common use of scaling is to store real-valued data with a minimum yet appropriate number of significant bits. However, scaling can be used to help change variable units without changing actual data; e.g., to change from degrees Celsius to degrees Fahrenheit: gpputname(var->units, C_FAHRENHEIT); var->scale *= 1.8; var->offset += 32.; Another benefit of builtin scaling is that it allows appli- cations to pretend they are working with a single type of data: double precision. Variable read and write routines, that respectively scale and unscale data, are provided as part of the standard interface. This does not preclude the writing of applications that treat each type of variable differently. TeraScan Last change: 1/13/93 5 DATASETS(7) TeraScan Overview DATASETS(7) Dimension Coordinates Applications may use the coord, scale, and offset builtin dimension attributes to relate different dimensions. For example, if two dimensions have the same coord attribute, applications may choose to decide that the two dimensions are parallel. The scale and offset attribute can then be used to determine the exact correspondence between the two dimensions, assuming that correspondence is linear. Coordinate types GP_X_COORD, GP_Y_COORD, GP_Z_COORD, GP_TIME_COORD, and GP_NO_COORD are defined in include/gp.h for this purpose. Applications are not restricted to these coordinate types. Unlimited (Growing) Dimensions Unlimited dimensions can be defined using a size of GP_UNLIMITED, found in include/gp.h. The following guide- lines apply when working with datasets with unlimited dimen- sions: - Only one dimension in a dataset can be growing; defining a second unlimited dimension will fix the size of the former growing dimension. - If a variable is defined with a growing dimension, that dimension must be the variable's leading dimension. - All variables to be defined with a unlimited leading dimension must be defined prior to writing any data corresponding to that dimension. The size of the unlim- ited dimension will be fixed at the point where the new variable is defined. Cloning Objects. Cloning an object refers to the process of creating a like object with the same attributes, optionally with a new name. When a variable is cloned, the new variable is created with the same named dimensions. These dimensions must exist in the output dataset, but do not have to have the same sizes as the corresponding dimensions of the original variable. Similarly, when a relation is cloned, the new relation is created, associating the same named variables. When a dimension is cloned, its corresponding variable (if one is defined) is not carried over to the new dimension. This would present a _c_h_i_c_k_e_n _a_n_d _e_g_g problem, because the dimension could not be created without the variable, and the variable could not be created without the dimension. TeraScan Last change: 1/13/93 6 DATASETS(7) TeraScan Overview DATASETS(7) Definitions vs. Variable Data Everything about a dataset with the exception of variable data is maintained in virtual memory until the dataset is closed or synced. If a dataset is opened for read access and then is closed, nothing is written to disk regardless of whether the application changed attribute values or defined new objects. If a dataset is opened with write access and then is closed, all object definitions and attributes are saved to disk. Saving definition and attribute changes can be suppressed by aborting the dataset rather than closing it. However, changes to variable data occur at the whim of the underlying file system. Variable data is not maintained in virtual memory, but is written to directly to the file sys- tem. Aborting a dataset in the midst of writing variable data will leave the dataset in an undefined, probably unreadable state. TeraScan datasets support random _h_y_p_e_r_c_u_b_e access to vari- able data. A hypercube is defined by a starting 0-relative coordinate, (_i_1,_i_2,...) and a cube size (_n_1,_n_2,...). Vari- able indexing is similar to array indexing under C; i.e., the index of the last dimension is the fastest moving. Link Subsets and Assemblies Any array data that can support random hypercube access can be linked to a TDF variable. For example, data for a vari- able or variable hypercube in one TeraScan dataset can be linked to a variable in another (or the same) TeraScan dataset. This _l_i_n_k mechanism allows data from one or more datasets to be linked to a single dataset without instantia- tion, i.e., without moving any data around. The following TeraScan applications take advantage of this link mechanism. subset Creates a variable and/or dimension subset of input datasets. assemble Gathers selected variables from input datasets into a single output dataset. burst Slices variables along any dimension, creat- ing link variables for each of the slices. impbin Imports structured array data from non-TDF files. TeraScan Last change: 1/13/93 7 DATASETS(7) TeraScan Overview DATASETS(7) This link mechanism is similar to the UNIX facility for creating symbolic file links. One drawback of using links is that links can be orphaned. If data in file _X is linked to a variable _V in dataset _A, and then _X is removed, then the link variable _V is orphaned. As a special case, a _N_U_L_L file can be linked to a TDF vari- able. In this case, all stored values for the variable are assumed to be 0. Automatic Uncompression Datasets that have been compressed using the UNIX compress function can be uncompressed automatically by TeraScan. TeraScan uses the UNIX zcat function to uncompress datasets, redirecting the output to the scratch directory defined by the environment variable UNCOMPRESSDIR. If UNCOMPRESSDIR is undefined, uncompression is not attempted. A list of automatically uncompressed files is kept in the Registry file in the UNCOMPRESSDIR. This file is ASCII but is not intended to be edited. For each automatically uncompressed file, the following information is shown: true path name of original, full path name of uncompressed copy, last modification time of original is seconds, and the max idle time in seconds. Idle time is defined to be difference between the current time and the last access time of the original. The environ- ment variable UNCOMPRESSIDLE specifies the maximum idle time in minutes for automatically uncompressed files. If UNCOMPRESSIDLE is not set, the maximum idle time is assumed to be 60 minutes. Different files can have different max- imum idle times. The environment variable UNCOMPRESSMAX specifies the maximum space in megabytes to be allocated in the UNCOMPRESSDIR for automatically uncompressed files. If UNCOMPRESSMAX is not set, the maximum is assumed to be 10 megabytes. This max- imum is only a rough limit; see the algorithm outlined below: Given input compressed file _F If UNCOMPRESSDIR is not defined, can't uncompress _F If _F is in Registry, _F's last modification time matches what's in the Registry, and _F's uncompressed copy still exists, use it Delete all entries in Registry if original no longer exists, original's last modification time does not TeraScan Last change: 1/13/93 8 DATASETS(7) TeraScan Overview DATASETS(7) match Registry, uncompressed copy does not exist, or idle time (e.g., current time - last access time of original) exceeds the max idle time While the total space occupied by uncompressed copies plus the size of _F (not its uncompressed copy!) exceeds UNCOMPRESSMAX, delete the entry in Registry closest to exceeding its max idle time Uncompress _F and put it in the Registry, setting its max idle time to UNCOMPRESSIDLE. Hard Limits There are currently only two hard limits for TeraScan datasets: length of names and number of variable dimensions. The name length limit applies not only to names, but to such built-in attributes as var->units and rel->kind. Arbitrary name lengths were not implemented for the following reasons: - Applications are invariably written assuming a maximum name length, which may as well be constant across custo- mer sites. - If names have unlimited length, built-in attributes var->units and rel->kind also would have unlimited length. - Unlimited length names mean more extensive use of mal- loc, which has been avoided. Error Handling Pipeline processing applications, interactive display appli- cations, and application subsystems (e.g., TeraScan earth trasnform) have very different error handling requirements: - Pipeline processing applications typically take a very brutal approach to errors; i.e., abort! - Interactive display applications must always return con- trol to the user, even on such _s_h_o_w _s_t_o_p_p_i_n_g errors as running out diskspace or memory - Application subsystems must always return control to the application after converting lower level error codes into higher level ones (e.g., no such attribute => dataset does not have earth location). In order to support these different cases, a switchable error handler is used by all the dataset interface routines. (See CALLING SEQUENCES.) An application subsystem can switch TeraScan Last change: 1/13/93 9 DATASETS(7) TeraScan Overview DATASETS(7) its own error handler in and out several times while an application is running. The default error handler simply sets the Terascan global variable terrno to the approriate error code. In addition to UNIX file open and memory allocation errors, the follow- ing errors may be encountered. These error codes are defined in include/gp.h. FILES include/gp.h, lib/libcdf.a, lib/libutils.a, /usr/include/errno.h SEE ALSO gpatt(3), gpdim(3), gperr(3), gpio(3), gplink(3), gpname(3), gprel(3), gpset(3), gptype(3), gpvar(3), dirfile(3), misc(3), terrno(3), open(2), close(2), malloc(3), compress(1) One of the strong points of the TDF and its pro- gramming interface is that applications do not depend on the physical layout of data on disk. The physical layout of a typical dataset is as follows: - dataset header of 644 bytes (historical) - data for non-link variables - file descriptions for link variables - dataset attributes - dimension descriptions - variable descriptions and attributes - relation descriptions and attributes The start of data for a given variable is defined by _v_a_r- >_d_a_t_a_s_t_a_r_t. Data for non-link variables is guaranteed either to be completely contiguous or _r_o_w-_w_i_s_e contiguous. The _i_t_h row of array _A is defined to be all elements of _A with leading index _i. The distance between rows is _v_a_r- >_d_i_m_d_i_s_t[_0]. TeraScan Last change: 1/13/93 10