DATA STRUCTURES FOR BIOLOGICAL RECORDING

Specifications for databases to record occurrences of living organisms

THE PEOPLE DATABASE CLOX

Introductory notes

Many people can be associated with a Collections Database observation: several people may contribute to making the observation or collection, others may make the original identification, yet others may wish to comment on that identification, or redispose the record under a different name, or isolate the fungus into pure culture, or intercept it for quarantine purposes, and so on. The list includes all the possibilities which make the relationship of localities to individual observations many-to-one, and many others besides. To permit the expression of the many-to-one relationship, a Collections People Cross-reference Table is necessary. Furthermore, since most events during the existence of an observation only occur because of the action of humans, this table seems as good a place as anywhere to store the many dates associated with the observation.

The BMS databases, having a flat-field design, cannot cope properly with the many-to-one relationship recognized here. Each stores an indication of the identities of some of the people associated with each record, and the date of collection of the current fungus, but no more. Even some of this information is in a form which some observers might find unsatisfactory, and the possibility that more than one person might simultaneously be involved in, for example, the same collection or identification is not recognized.

[CpexLink_N]

A link to the unique observation identifier, 8 characters, indexed

This field stores the numeric link between the Collections People Cross-reference Table and other tables in the Collections Database.

[CpexClox_N]

A link to the Collections Locality Cross-reference Table, 8 characters indexed

This field stores the numeric link between the current record and the appropriate locality in the Collections Locality Cross-reference Table. For an explanation of its use, see the earlier notes on [CloxLocnoN].

[CpexPeolkN]

A link to the People Database, 8 characters

This field permits a numeric link for the current person with the People Database, to provide access to shared information about that person, for example their full name, address, interests, status as a referee of identifications etc. It is not yet in use. Before entering use, and before accession of data to the People Database the provisions of data protection legislation would need to be taken into account.

[CpexPeoplA]

The name of the current person as received by the database, 160 characters

Only one person is permitted for each record in this table, and this field stores the name of that person. It takes the form of original information, so it appears in an unedited form. The editorial opinion of the identity of that person will, when it becomes available, be expressed through the previous field, [CpexPeolkN]. The BMS database fields [Collector], [Determiner] / [Identifier], [Confirmer] and [Current Referee], each 4 characters long, store merely initials. The BMS should urgently consider upgrading its provisions for these fields, so that full names can be stored, as there is a great danger of losing the identities of many of the people behind the less frequent initials.

[CpexCategA]

The category into which the action of the current person falls, 1 character

One character is allocated to this field, to record the category into which the action of the current person falls. At present six categories are in use which are of interest in Collections Database recording: `c', collector; `o', original identifier; `i', current identifier; `f' former or other identifier; `s', isolator; `q', quarantine interceptor. Others exist, but are not of relevance to foray recording, or will be made available as the need for them becomes apparent.

In their current format, the BMS databases have no need for this field. The BMS databases recognize the need to store information identifying the collector, original identifier, a `confirmer' (this is usually the same person as the original identifier, or is a different identifier with a higher standing as a taxonomist than the original identifier), and a `referee'. These categories are all catered for in the present structure. When information from the BMS database is restructured to a more durable form, data from these different fields will form separate records in the present table.

[CpexPeonoA]

Personally-allocated collection and observation numbers, 50 characters

Many collectors have personal systems for allocating numbers to the collections and other observations they make. These numbers are usually different from the numbers allocated to material when it arrives in, for example, a national herbarium, or from the numbers allocated to observations when they are entered into a computerized database. [CpexPeonaA] provides space for these personal numbers. It is worth noting that, when several different people participate in making one collection, it is quite possible that several different personal collection numbers might get allocated to the same collection. That eventuality is no problem for this database structure: each individual collector is allocated a different record in this table in respect of that collection. The BMS databases do not allocate space for such personal numbers.

[CpexSday_A]

Date fields: start day of action, 2 characters

[CpexSmontA]

Date fields: start month of action, 2 characters

[CpexSyearA]

Date fields: start year of action, 4 characters

[CpexFday_A]

Date fields: finish day of action, 2 characters

[CpexFmontA]

Date fields: finish month of action, 2 characters

[CpexFyearA]

Date fields: finish year of action, 4 characters

Dates are important things to record. By them we can tell what is the earliest record of an organism, and when that organism was last observed. We can use dates to keep track of changes in distribution or abundance through time, and to relate developmental stages observed to the seasons of the year. There are many things we can do with dates. Not surprisingly, many computerized databases provide specialized date fields to cope with date information. Unfortunately these are generally not suitable for our purposes. Many only provide date services back to the start of the twentieth century, whereas biological recording has a need to store dates going back at least to the middle of the eighteenth century. The other problem with using specialized date fields is that they cannot usually cope with null data, while a very large number of biological records exist in which, for example, the month is known but not the day or year, or the year is known, but not the day or month, etc.

To get round that problem for biological recording, it is necessary to express any date through three fields: one for the day, another for the month, and the third for the year. Even then, the problem of null data has some bearing on the choice of field. There are many records which are known to date from the nineteenth century, but for which the exact year is not known. The fact that the information is over one hundred years old is itself valuable, and needs to be stored. `1800' is unsuitable, since it implies an exact year, as does `18'. What is needed is `18xx'. Similar considerations apply to the month and day fields. As a result, each of these fields needs to be alphanumeric rather than numeric.

The third difficulty is that, for a small (not that small!) but significant proportion of dates associated with records, the original information supplied is a spread of dates: `May-July 1882' or `23-24 February 1933' etc. To deal with that eventuality, two sets of date fields, a start set ([CpexSday_A], [CpexSmontA], [CpexSyearA]), and a finish set ([CpexFday_A], [CpexFmontA], [CpexFyearA]), are needed. Where (as is usual) the source information comprises only a single date, both sets of fields contain the same data. The BMS databases provide one set of three date fields for the date of collection / observation ([Day], [Month], [Year]). They make no provision for recording any other date.

[CpexTaxnaA]

Text link for alternative identifications, 100 characters

[CpexTaxlkN]

Numeric link for alternative identifications, 8 characters

Apart from the original name given to the organism when the observation or collection was made, and apart from the currently accepted name provided through the central editor of the database, other people may wish to express an opinion about the identity of the organism, and there may be a need to record the history of earlier currently accepted names which are no longer acceptable as opinion shifts.

[CpexTaxnaA] provides space to make a text link to the Nomenclature & Taxonomy Database, thus linking an identification of the organism with the person of the current record. As for [Cco0AccnaA], the contents of this field must conform to the standards for the text link field of the Nomenclature & Taxonomy Database. Records with information in this field will all relate to identifications other than the original or current identifications, and so must all contain `f' in [CpexCategA]. [CpexTaxlkN], which is not yet in use, has the same function as [CpexTaxnaA], but makes the link through a number rather than by text.

The BMS databases have no provision for these fields. Other flat-field collections databases exist which have a field called [Redisposition history] or something similar, in which the histories of different opinions as to the identity of the organisms are stored. Such fields are better than nothing, but represent an inadequate attempt to deal with the informational problem. A history of different opinions will inevitably build up over time: examination of packets in any dried reference collection will show the polite crossings-out and re-identifications which are the evidence that our paper-based databases record such information. For properly structured long-term storage, our computerized databases need the facility to store such histories too.

[CpexBiblkA]

Text link to a bibliographic source, 100 characters

[CpexBiblkN]

Numeric link to a bibliographic source, 8 characters

Revisionary work on the systematics of organisms is being published all the time and, particularly in the case of the fungi, this can result in large-scale name changes which may, or may not be adopted by the editors of the database. Specialists who wish to keep track of the opinions of others about the organisms in their group may wish to record these different opinions, even if they are not accepted. These fields enable a link to be made in such cases between the current record and the record in the Bibliography Database representing that revisionary work. They should not be confused with the facility for bibliographic links provided through the Collections Bibliography Cross-reference Table which is far more general in its application.

As with links to the Nomenclature & Taxonomy Database, there are arguments for and against text versus numeric links. At present, at IMI, links between the Collections Database and the Bibliography Database are textual rather than numeric, though numeric links are a long-term aim. A discussion about the editorial standards used in creating textual links with the Bibliography Database can be found later in the discussion on [CbixBiblkA]. The BMS databases have no provision for either of these or the following field.

[CpexPage_A]

Identification of the exact page within that bibliographic source, 20 characters This field is used in the case of a link between the current record and the Bibliography Database. It identifies the exact page or page spread on which, within the publication, the identification to which the current record relates was made.


Previous page