DATA STRUCTURES FOR BIOLOGICAL RECORDING

Specifications for databases to record occurrences of living organisms

THE SUPPLEMENTARY CORE DATABASE CCO0

Introductory notes

.

[Cco0Link_N]

A unique record identifier, 8 characters, indexed

.

[Cco0OrinaA]

The original name of the organism, 100 characters

At the heart of any Collections Database record is the statement `an organism was observed'. For each record generated, the Collections Database needs to store information about the identity of that organism. At first sight, the information might seem to comprise a name and nothing else, but things are not that simple. One organism can have many different names, some vernacular and others scientific (e.g. `raspberry' and `Rubus idaeus'). In English there are many vernacular names for the same organism (`lapwing', `peewit', `green plover'), and English is not the only language in the world nor is the Latin alphabet unique! Scientific or quasi-scientific names may be used (e.g. `Fuchsia' or `fuchsia'), and it may be difficult to distinguish which is meant. Often the person generating the information will have used such a name unaware of the potential problem.

Even if the name received is scientific, it may not be the only name in use for the organism (e.g. `Rhytisma acerinum' and `Melasmia acerina' both quite correctly refer to the same fungus), or its meaning may not be clear (`Euphrasia officinalis', `Rubus fruticosus' and `Taraxacum officinale' are three examples of binomials used for species aggregates), or the name may be used in different senses (`sensu lato', `sensu stricto' etc.), and again the person supplying the information may be unaware of such subtleties.

Schemes designed to collect fresh records try to get round this problem by encouraging participants to supply only correct scientific names. This certainly helps to ensure that incoming records meet certain basic standards, but even such schemes can only compel by imposing the artificial constraint of a monolithic taxonomy on their participants. This is undesirable in that the nuances of different opinions themselves comprise a valuable part of many records. In any case, sooner or later all schemes start to become interested in the question of assimilating the vast numbers of floristic records which were generated years before such standards were erected.

The result of all of this is that, when designing a Collections Database, it is necessary to recognize that those who curate the data have little control over incoming information until it has been received. It is therefore necessary to have a field in the Collections Database devoted to the organism name as it was received [Cco0OrinaA]. That field contains the primary source of data on the identity of the organism to which the current record relates, and it should not be subject to editorial control. Both BMS databases have a field for this information, [Name of Fungus] (though at 40 characters it may be a little short), and another field of 30 characters ([Associated Organism] in the Foray Records Database, and [Associated with] in the BMS/JNCC Database) to store the original name of the associated organism, data which on restructuring would be placed in [Cco0OrinaA]. In the IMI system, this field is located in the Collections Core Subsidiary Table ([Cco0......]) rather than the Collections Core Table ([Ccor......]) because it contains text of variable length.

[Cco0AccnaA]

A text link to the currently accepted organism name, 100 characters, indexed

You cannot easily use [Cco0OrinaA] to search for records relating to a particular organism, however, because the information it contains does not conform to any editorial standards. For that job, a different field is needed, which stores an editorial opinion on the identity of the current organism. In its most simple form, this field contains the accepted scientific name of the organism as a piece of text. If the field is indexed, searching for records relating to a particular organism becomes a simple matter. Nevertheless, such a solution is problematic, because of homonyms (`Hypoderma' is an ascomycete and an insect, `Oenanthe' is a flower and a bird, `Hypoderma brachysporum Speg.' refers to a different organism from `Hypoderma brachsporum Rostr.'). Names with accented characters (`Na‹s', `Elsino‰', `O‹dium' etc.) present a further problem: a decision has to be made about whether or not to use these special characters and after that it is very difficult in practice to ensure that keyboarders remember to adhere to the decision.

A more satisfactory solution is to have a field which makes a link between that Collections Database record and the unique and correct record in the Nomenclature & Taxonomy Database. Through such a link field the user can be provided with access not only to the correct name, correctly spelt, without confusions over homonyms, but also to a lot of other information relating to that name and its use, e.g. its authors, the date and place of publication, its status and its taxonomic position.

There are two options for such a link field. The first is to make the link using a unique piece of text based on the scientific name of the organism. Since most scientific names are not homonyms, and do not contain accented characters, almost all of the links will be text comprising the scientific name in an unedited form, and nothing else. In the case of scientific names containing accented characters, the edited form of the name used at IMI to make the link is simply the scientific name with the accent removed from the character (`Elsino‰' for example becomes `Elsinoe'). For names where the rank is normally indicated, this rank is included in the link data (e.g. `Entoloma hirtipes forma bisporicum'). In the case of homonyms, the link comprises the scientific name (minus any accents) plus the name(s) of its author(s) in the standard abbreviated form prescribed by Brummitt & Powell (1992) or, for fungi, the subset of those data provided by Kirk & Ansell (1992). Thus `Hypoderma brachysporum Speg.' is the link for this fungus, to distinguish it from Hypoderma brachysporum Rostr. In the very rare case of homonyms described by the same author(s), the year of publication and, if necessary, a further distinguishing factor are added. In the case of links at ranks higher than species, only the scientific name is used, without the addition of `sp.', `gen. indet.' or similar words.

[CcorAccnaA] is the field used at IMI for this link information. At IMI a large number of Collections Database records are linked to the Nomenclature & Taxonomy Database using this option. The BMS has no Nomenclature & Taxonomy Database to accompany its Foray Records Database (nor should it consider duplicating the efforts of IMI and other bodies to set up such a database), but it would be a comparatively easy task to check and edit [Current Name] (the analogous field, 40 characters long, in the BMS databases) so that it conformed to the link standards used at IMI (terms like `sp.' would have to be removed). There is no field in the BMS Foray Records Database to store the accepted name of an associated organism, though if the necessity of the major restructuring already pointed out is faced, such a place to store that information would become available. If the BMS databases had data conforming to the link standards in use at IMI, the 2 character long field [Order Code] would become superfluous. In the IMI system, [Cco0AccnaA] is located in the Collections Core Subsidiary Table ([Cco0......]) rather than the Collections Core Table ([Ccor......]) because it contains text of variable length.

[Cco0DevtdA]

A text description of the developmental state of the observed organism, 200 characters

Organisms are encountered in different developmental states. Fungi are probably no less complex than most in this respect: from symptomless occurrence as, for example, an endophyte, through the symptoms of a plant or animal pathogen, to mycelial features such as stromatic lines, then one or more different anamorphs, and finally a teleomorph, different states can occur, often in more than one combination. This field provides the opportunity for the developmental state of the organism observed to be noted in free text form. It is particularly valuable as a place to store original information about developmental states. Neither BMS database provides this field.

[Cco0DevtcA]

An encoded description of the developmental state of the observed organism, 20 characters

This field permits the storing of a standard or editorial opinion of what developmental states were observed. Because the contents of this field are structured, it can be used for mechanical searching and manipulation in a way not possible with [Cco0DevtdA]. There seems to be no universally agreed coding system for recording developmental states in the fungi, and only with the rusts is any system widely used. The BMS Foray Records Database and the BMS/JNCC Database each have a field 1 character in length ([Morph Code] and [Morph] respectively), which permit the keyboarder to note whether the anamorph (`a'), teleomorph (`t') or holomorph (`h') was present, but that system seems rather simplistic.

At IMI, for rusts, the codes commonly used within that group to note developmental states are used. For ascomycetes, other basidiomycetes and zygomycetes, the following simple annotation is employed: symptomless occurrence (`a'), symptoms (`b'), mycelial features (`c'), anamorph (`d'), and teleomorph (`e'); after `d' and `e' three suffixes may be used: not yet sporulating (`1'), sporulating (`2'), and sporulation finished (`3'). Thus `cd3e2' would indicate that a mycelial feature (for example a stroma), an anamorph, and a teleomorph had all been observed, and that the anamorph appeared to be effete, while the teleomorph was actively sporulating.

It is clear that this code also has deficiencies, particularly for fungi with more than one anamorph. The BMS should recognize that its own field for storing this information is inadequate, and should help to devise either a more effective code which has universal use in the fungi, or a series of codes which have effective application within individual groups of fungi.

[Cco0AbundA]

A text description of the abundance of the observed organism, 200 characters

Many observations of fungi are accompanied by a comment by the observer on abundance. These comments are almost invariably subjective (`rare', `very rare', `common' etc.), but even in this condition they have a value worth preserving. [Cco0AbundA] provides a location to store these comments in free text form. Like [Cco0DevtdA] this field is particularly valuable as a place to store original information. Neither BMS database provides this field.

[Cco0AbuncA]

An encoded description of the abundance of the observed organism, 2 characters

This field permits the storing of a standard or editorial opinion of the abundance of the observed organism. Because the contents of this field are structured, it can be used for mechanical searching and manipulation in a way not possible with [Cco0AbundA]. There seems to be no universally agreed coding system for recording abundance in the fungi, and the subject seems to be rather controversial. An often-quoted criticism of abundance recording in the fungi is that the observation of, say, fifty basidiocarps may genetically represent the observation of only one individual united by mycelium beneath the soil. To get round this problem, the first step is to ensure that anyone scanning the data can see on what basis the assessment of abundance was made.

To do that, the following main factors need to be noted: what was observed (ascocarps, basidiocarps, colonies, conidiophores, pycnidia, stromata, symptoms, zone lines etc.), how many of each item were observed, and how long it took to observe that number. Thus, the user of the data can distinguish the abundance assessment of a recorder who observed fifty toadstools, from another who observed one colony. Since what was observed is already (at least in potential) the subject of the field for recording developmental stages in code, it remains for the present field to record the number of items observed, and the time taken in a similar codified format. In doing so, the practicalities of gathering data in the field need to be taken into account.

At IMI I use a system experimentally which requires two characters of data. The first is a score from zero to five representing the base 10 logarithm of the approximate number observed (`0', no items observed; `1', 1 to 9; `2', 10 to 99; `3', 100 to 999; `4', 1000 to 9999; `5', more than 10,000). The second represents the time taken to make that observation (`1', less than 1 minute; `2', 1 to 5 minutes; `3', 5 to 15 minutes; `4' 15 to 30 minutes; `5', more than 30 minutes). Since chance is a major factor in observing many fungi, particularly those with ephemeral and unpredictable fruiting, a further option is permitted for the second character: `s' denotes a serendipitous observation. The BMS Foray Records Database and the BMS/JNCC Database both have a field 1 character in length [Abundance] which is not yet in use. The idea of using it to record the base 3 logarithm of the approximate number observed has been discussed, but there is no provision to record the time taken to make that observation, and there is no explicit relationship between the information in that field and information about what was observed.

Neither of these two systems is ideal. The system used at IMI represents an adaptation which favours recording microfungi, where far larger numbers of items are likely to be encountered. The system discussed by the BMS favours the recording of larger fungi, with the consequent greater numeric precision at the lower end. In cases where, say, an anamorph and a teleomorph are both recorded, neither system can distinguish the relative amounts of each state. The BMS should recognize that its own field for storing this information is inadequate, and should help to devise either a more effective code which has universal use in the fungi, or a series of codes which have effective application within individual groups of fungi, making sure that such a code distinguishes between deliberate searches and serendipitous discoveries.

[Cco0Foss_A]

A flag to mark fossil records, 1 character

Fossil fungi have occupied a dark corner in the past of traditional mycology! Records are infrequent, and compilations have generally treated them separately from those of living fungi. Even the names used for genera have tended to be different, `Rhytismites' for example being used for fossils looking like present day species of Rhytisma. This particular habit of traditional mycology does not fit well with other biological disciplines and, since the Collections Database deals with records of all organisms, it needs to be scrutinized.

It is well known that taxa originally known as living organisms can also be found as fossils. The examples of the Ginkgo and coelocanth, among others, show that the reverse can also be true: taxa originally known as fossils can subsequently be discovered alive. It follows that the practice in mycology of using different scientific names for fossils, just because they are fossils, is suspect. There is no such thing as an intrinsically fossil genus: only individual records in the Collections Database can be labelled as fossil or non-fossil.

[Cco0Foss_A] was introduced to deal with this possibility. The default condition of an empty field indicates that the current observation does not relate to a fossil. Each year a few fungal taxa are described with fossils as their types, and this field is used at IMI to distinguish them. Neither BMS database makes provision for such a field.

[Cco0InnotA]

Internal notes, 2000 characters

.

[Cco0ExnotA]

External notes, 2000 characters

.


Previous page