DATA STRUCTURES FOR BIOLOGICAL RECORDING

Specifications for databases to record occurrences of living organisms

THE LOCALITY DATABASE CLOX

Introductory notes

To the average field biologist, only one location is important in recording, and that is the location where the organism was observed. Unfortunately and particularly for fungi other locations may also be important. A fungus may be collected in Borneo, isolated into pure culture in Japan, intercepted by quarantine officials in Australia, the USA and Argentina, and stored in a culture collection in the UK. The fungus has travelled in a living condition to each of these locations, albeit sometimes with the help of that useful symbiont Homo sapiens, and very often the only locality information available will be that of place of isolation or place of quarantine intercept. It is thus necessary to recognize a many-to-one relationship can exist between relevant locations and an individual Collections Database observation. The Collections Locality Cross-reference Table permits this many-to-one relationship to be expressed. The BMS databases permit only one location, that of collection, to be recorded for each record, thus enabling them to remain flat-field rather than relational database systems. For their purpose as data-entry or feeder systems, this is adequate, but the need to store resulting information in a proper relational structure should now be recognized by the BMS.

[CloxLink_N]

A link to the unique observation identifier, 8 characters indexed

This field stores the numeric link between the Collections Locality Cross-reference Table and other tables in the Collections Database.

[CloxLocnoN]

A unique locality identifier, 8 characters indexed

This field stores a number which is unique for each record in the Collections Locality Cross-reference Table. The field is useful for example in cases where more than one quarantine intercept has been made of the current material, since it permits an unambiguous link between the locality of each quarantine intercept and records in the Collections People Cross-reference Table with information about who made the interception, and when.

[CloxLoclkN]

A link to the Geography Database, 8 characters

This field permits a numeric link for the current locality with the Geography Database, to provide access to, for example the boundaries of the area referred to (e.g. `Rocky Mountains', `Westerness' etc.), temporal limits to those boundaries (e.g. `USSR, 1945-1991') and other centralized information on that locality. It is not yet in use.

[CloxCategA]

The category of location, 1 character

One character is allocated to this field, to record the category into which the current location falls. At present three categories are in use: `c', collection; `i', isolation; `q', quarantine intercept. Others will be made available as the need for them becomes apparent. Because the BMS database are flat-field databases, they have no need for this field. All localities in those database are assumed to belong in the `c' category. As the BMS keyboards data relating to British fungi from its various publications, it will inevitably encounter records of fungi isolated rather than directly collected, and problems in distinguishing these different categories of records will then arise: you have been warned!

[CloxLat__A]

Latitude, 7 characters

[CloxLong_A]

Longitude, 8 characters

To locate any locality associated with a fungus on a map, it is necessary to have co-ordinates which are as accurate as possible. The cardinal co-ordinates of latitude and longitude, based on the Greenwich meridian, are in almost world-wide use, and information on these co-ordinates should be added to Collections Database records wherever possible. [CloxLat__A] is the first of two fields storing this information. It contains information about the latitude of collection in the following format: the first character must be either `N' or `S', and identifies the northern or southern hemisphere respectively; characters two and three are reserved for the number of degrees of latitude (`00' to `90'); characters four and five are similarly reserved for the number of minutes (`00' to `59'); characters six and seven are reserved for the number of seconds (`00' to `59').

Information should be entered into this field to the greatest possible accuracy: it should be noted that for example `N04' implies four degrees north while `N4' implies somewhere between forty and fifty degrees north. The addition of `x' to indicate null data (e.g. `N45xxxx') is also permissible. For making maps, the user will then be able to define the permissible level of accuracy for the current map (e.g. only records accurate to more than ten minutes), and will be able to instruct the computer how to deal with less accurate data (e.g. place dot on the centre of a notional square, or place dot at intersect of lowest common denominator co-ordinate etc.). [CloxLong_A] is the second of these two fields, and stores the longitude, following similar rules to [CloxLat__A] except that hemispheres are defined as `E' and `W', and three characters are reserved for degrees of longitude (`000' to `180'). As the technology for recording exact position by satellite becomes more available, the provision of accuracy only down to the nearest second may become inadequate, and the sizes of these two fields may need to be revised.

The BMS databases have no provision for this information. Most foray records are generated within the British Isles, for which the UK and Irish National Grids are available for mapping. The relationship between these grids and latitude and longitude is mathematical, but rather complex, so for British foray records, the absence of latitude and longitude data may be rectifiable by machine. It seems a shame, nevertheless, that forayers are not being given the opportunity to record latitude and longitude information which is so readily available from Ordnance Survey maps. For foray records held outside Britain, which now number several thousand, the absence of provision to store this information is more serious.

[CloxGridlA]

The national grid 100 km square identifier, 2 characters

[CloxGrideA]

The national grid easting identifier, 3 characters

[CloxGridnA]

The national grid northing identifier, 3 characters

Many countries employ a national grid to provide an easy system of identifying locations. The UK and the Irish national grids, and those of several other countries, particularly those with a British tradition of surveying (e.g. Zimbabwe), have similar formats. These three fields enable locality information to be stored in these formats, as accurately as possible, thereby facilitating distribution map production specific for those countries. Two characters are allocated to [CloxGridlA], and they identify the relevant 100km square of the grid. Three characters are allocated to [CloxGrideA] to enable eastings to be identified to within 100m. Three characters are allocated to [CloxGridnA] to enable northings to be identified to within 100m. Consult mapping systems of individual countries for further information. The BMS Foray Records Database and the BMS/JNCC Database provide all three fields to the same specifications ([Grid Ref A], [Grid Ref B], [Grid Ref C] and [Grid reference A], [Grid reference B], [Grid reference C] respectively).

[CloxVc___A]

Vice-county (for records from the British Isles only), 3 characters

The British vice-counties are a system devised long ago to divide the British Isles into subnational units for biological recording. Although regarded by many as now archaic, they are still in widespread use among field biologists. This field enables the vice-county of the current record to be noted in the form of the conventional number allocated to that vice-county. The BMS Foray Records Database and the BMS/JNCC Database provide this field to the same specifications ([Vice-County] and [VC] respectively).

[CloxAlt__A]

A free text description of altitude, 200 characters

The altitude at which an organism occurred is often of interest from an ecological point of view. Many older collections of fungi contain information about altitude on the packet, and similar data can be encountered in many records in the literature. This older information is often unstructured, or uses measurement systems such as the imperial which are no longer acceptable to science internationally. The present field permits the storage of this original information. The BMS databases provide no equivalent. As records of fungi in literature published by the BMS are computerized, the absence of this field may be an occasional problem.

[CloxMinalA]

Minimum altitude, 4 characters

[CloxMaxalA]

Maximum altitude, 4 characters

These two fields are allocated to store a structured statement or an editorial opinion about the altitude at which the observation was made. In practice, these observations are often made over a range of altitudes, and the provision of two fields enables that range to be reflected. Where the source of information provides only one figure, or where the observation was made at only one altitude, both fields will contain the same information.

Each field contains data representing the number of metres above or below sea-level at which the observation was made. The size of the fields permit observations between -999 and 9999. [CloxMinalA] contains the minimum altitude of the observation: [CloxMaxalA] contains the maximum. The figure in [CloxMinalA] must never be greater than that in [CloxMaxalA]. Where the altitude is not known to the nearest metre, it is possible to enter null data to indicate a level of uncertainty, thus `11xx' would indicate that the collection had been made between 1100 and 1199 metres.

The BMS Foray Records Database and the BMS/JNCC Database each provide one field for altitude ([Altitude] and [Altitude (m)] respectively). For a flat-field database accumulating only new data, and with high quality maps generally available to the field mycologists supplying the data, this is unlikely to be problematic.

[CloxOrmajA]

Original locality major area information, 80 characters

[CloxOrcouA]

Original locality country information, 80 characters

[CloxOrstaA]

Original locality state, oblast or county information, 80 characters

[CloxOrplaA]

Original locality exact place information, 500 characters

Storing original locality data can be problematic. In practice the information can be divided and allocated to four fields, of which [CloxOrmajA] is the first. This stores any original information identifying the continent or other major area of the current locality. Common major areas encountered are, for example, the Caribbean, Oceania, the Middle-East, Scandinavia, the Andes, the Himalaya. The second field, [CloxOrcouA], identifies the original country. Over the past century, many countries have changed their names and boundaries. Storage of the original name, together with the relevant date (located in the Collections People Cross-reference Table), enables the user to have some idea of the boundaries in operation at the time of collection, isolation or quarantine intercept. Much the same comments apply to the third field, [CloxOrstaA], which stores the name of the original state, county, Land, departement or other subnational division, and the fourth field, [CloxOrplaA], which stores the original description of the exact area.

The BMS Foray Records Database provides one field 15 characters long for storing all textual locality information [Place Name]. The BMS/JNCC Database provides a field with the same name 30 characters long. Neither has provision to store original information separate from the current editorial opinion. This is an area where the structures used by the BMS need urgent up-grading.

[CloxAcmajA]

Accepted locality major area information, 80 characters

[CloxAccouA]

Accepted locality country information, 80 characters

[CloxAcstaA]

Accepted locality first level internal political boundary (state, oblast, provincia or county information etc.), 80 characters

[CloxAcparA]

Accepted locality second level internal political boundary (parish, raion, municipio, USA county etc. information), 80 characters

[CloxAcplaA]

Accepted locality exact place information, 500 characters

These next four fields form the editorial counterpart of the previous four fields. They store the current editorial opinion of the continent or major area ([CloxAcmajA]), country ([CloxAccouA]), county etc. ([CloxAcstaA]), and the current description of the exact place ([CloxAcplaA]).

In the fourth of these fields, [CloxAcplaA], where there is a string of localities, wherever possible these localities are stored in decreasing order of size, with each unit delimited by a comma, thus `Kew, Royal Botanic Gardens, 200m north of the pagoda' is preferable to `200m north of the pagoda, Royal Botanic Gardens, Kew'. Abbreviations such as `Mt' (mountain), `sw' (south-west) etc. are discouraged as confusing to those whose native language is not English. Imperial and other units used in the original data should be converted to their SI equivalents.

[CloxEcol_A]

A free text description of the ecosystem, 1000 characters

A free text account of the prevailing ecosystem is quite frequently encountered when abstracting data from dried reference collection packets, literature and other sources and, if requested, is also quite frequently provided by forayers contributing data. [CloxEcol_A] provides space to store this information. The BMS databases both provide a field of 25 characters [Ecosystem], which it may wish to enlarge.

In older flat-field databases with separate fields for the names of one fungus and one associated organism, this field, or its equivalent, has often been used to store the names of any other associated organisms. For example, in the case of `Amanita muscaria, on soil under birch, oak and hazel', the associated organism field might quite arbitrarily contain `birch', while `oak and hazel' will have been relegated to this ecology field. Of course this use of the field has been preferable to losing this information altogether, but when such databases are re-structured to a more durable form, editorial time will have to be allocated to deal with this problem.

[CloxEcocoA]

Encoded description of the ecosystem, 25 characters

In the same way that free text and coded fields are provided for developmental state and abundance, this field permits a structured statement of the prevailing ecosystem of the current location. Both BMS databases provide an analogous field 10 characters long [NCC Ecocode]: the difference in length is unlikely to be significant given the codes currently in use.

Both the IMI system and the BMS system use the habitat codes devised by the former British, and now dismembered Nature Conservancy Council (NCC). By and large, for recording in much of north-west Europe, these codes work pretty well, though there are no subterranean or aerial habitats, and the marine habitat is not subdivided. Further afield, the codes tend to fall down: coverage of categories of Acacia scrubland, or volcanic slopes, for example tends to be pretty minimal! The BMS should look for far wider habitat codes, but until found, the NCC habitat codes remain quite a good stop-gap.

[CloxInnotA]

Notes about localities not intended for publication, 1000 characters

Space for notes associated with each locality, but not intended for publication, has been provided, as there is frequently a need during editing of problematic locality information to record temporarily possible alternative correct identities for a given place. There is however no provision for notes about localities intended for publication. Neither BMS database has a field for this purpose.


Previous page