DATA STRUCTURES FOR BIOLOGICAL RECORDING

Specifications for databases to record occurrences of living organisms

THE SUBSTRATUM DATABASE CSUX

Introductory notes

Dealing with information about the substratum on or in which a fungus is observed is tricky. The material on which the fungus is observed is always its physical location, but it may or may not also be the source of its nutrition. Thus this table has to cope with information about the rocks on which lichens are found, the soil bearing toadstools, the leaves carrying ascomycetes and many other very diverse materials, and it has to cope with them in a way which will allow at least some meaningful information to be extractable mechanically.

In every record in this table, by definition, a substratum must be linked with an observation of an organism: the fungus, insect, or plant etc. which was found on the substratum. An example might be the link: fungus-soil. A further complication however is that for many records, and certainly most records involving fungi, the substratum is part of another organism. As a result, a second link to an observation of that organism needs to be made. The result is a double link: e.g. fungus-leaf-plant. These two links, the first mandatory, the second optional, are made with different records in the Collections Core Table. One must (arbitrarily) be assigned the `driver' link, and the second (if it exists) must be the `slave' link.

It often happens that in a single observation a fungus, for example, is noted on the leaves, twigs and bark of a plant. Under the structure adopted here, that would come out as three records: fungus-leaf-plant, fungus-twig-plant, fungus-bark-plant. Similarly, if a fungus were observed on the thorax of a beetle which was on the leaves of a plant, that would result in two records: fungus-thorax-beetle, beetle-leaf-plant. In that case, the fact that the fungus was also associated with the plant would not be lost. The two observations would share the same unique collection identifier [CcorColnoN], but as the association did not involve a substratum linking the two, there would be no need to create a record for the association in this table. In the third example of a fungus on the thorax of a beetle which was on the leaves of a plant, but where the fungus also spread out over the leaves of the plant, three records would have to be made: fungus-thorax-beetle, beetle-leaf-plant, fungus-leaf-plant.

It may be seen from these examples that a many-to-many relationship exists between substrata and the organisms with which they are linked. The provision of this table and its structure permit very complex ecological inter-relationships to be observed. For long-term structured storage this table is essential. It is be possible to construct a flat-field feeder database which avoids these complications for most records during the data-entry stage, but even then it is likely that there will be a cost in terms of data lost simply because there is no provision for picking it up. The BMS databases each allocate one field ([Medium]), 20 characters long for the whole of this area of data.

[CsuxDrilkN]

`Driver' link to the unique observation identifier, 8 characters, indexed

[CsuxSlalkN]

`Slave' link to the unique observation identifier, 8 characters, indexed

[CsuxDrilkN] stores the numeric link for the `driver' observation, and [CsuxSlalkN] the numeric link for the `slave' observation between the Collections Substratum Cross-reference Table and other tables in the Collections Database.

[CsuxComb_N]

A combined identifier of `driver' and `slave' links, 8 characters, indexed

This field stores a number calculated using the equation [CsuxComb_N]=([CsuxDrilkN]*1000000)+[CsuxSlalkN]. This number is unique to the current combination of `driver' and `slave' observations, enabling such records to be identified quickly and mechanically through the index. This combination number is needed when outputting data, for example lists of substrata associated with particular species, or indexes of species found on or providing particular substrata.

[CsuxDesc_A]

A free text description of the substratum, 1000 characters

In this field, the original text describing the substratum is stored. The best example I have come across to date was `dead beatle'!

[CsuxAcdesA]

An edited description of the substratum, 160 characters

This field, the nearest equivalent to [Medium] in the BMS Foray Records Database, stores an edited opinion of the meaning of the original text in [CsuxDesc_A]. A well-defined structure and a comprehensive thesaurus of accepted terms are necessary. Both the IMI and the BMS databases systems attempt to provide these. Both attempts are similar, though not exactly alike, and neither is fully adequate. Both systems recommend the use of a thesaurus term (a noun for the BMS system, a noun or a phrase for the IMI system) first as a descriptor, followed by qualifying adjectives or nouns in apposition (or also phrases in the IMI system). In the BMS system, adjacent elements are separated by a space. In the IMI system they are separated by a comma and a space. Because the IMI system admits phrases as thesaurus terms, the comma is a necessary part of the separator: these differences are really quite minor.

The IMI system also contains an added tier of initial descriptors which enable certain otherwise diffuse categories to be brought together for indexing: `substance' always precedes any naturally occurring non biological object (e.g. `substance, soil', `substance, water'); `artefact' precedes any object which has been produced through the skill or creativity of a living organism (e.g. `artefact, concrete', `artefact, plaster', `artefact, nest'); `food' precedes any foodstuff used as a fungal substratum (e.g. `food, bread', `food, fruit'). It might be worth pointing out that `fruit' would imply fruit on, say, a tree, but `food, fruit' would imply fruit in the greengrocer's shop or something similar. Once at IMI, the term `artefact, dung' was entered by a keyboarder: most of us felt this was going a little too far, and at IMI we index `dung' as though it were a part of the organism that produced it!

Without this extra tier, it is impossible to produce a substratum index which gathers together such miscellaneous items into meaningful groups. This extra tier is, furthermore, merely a continuation of a long tradition in paper-based recording systems of such a classification. Efforts should be made to bring together the BMS and IMI systems. These efforts should include a standardizing (and extension) of thesaurus terms, an acceptance that phrases may sometimes be necessary as thesaurus terms, and hence that individual components of a statement should be separated by more than just a space, and the introduction to the BMS system of the extra tier of descriptors.

[CsuxPositA]

The position of the `driver' organism in relation to the `slave' organism, 8 characters

For some fungi, the position a fruitbody or colony occupies in relation to its substratum is a specific character which remains the same in all collections encountered. For others though, for example Diatrype stigma, the position varies from collection to collection: the teleomorph stroma may be on the surface of bare wood here, but under bark there. This field allows the user to specify what position was observed in the case of the current record. Strictly speaking, the field allows for a description of the position of the `driver' organism in relation to the `slave' organism: the syntax is `driver organism' is `position' the `slave organism'.

[CsuxRelatA]

The ecological relationship of the `driver' organism to the `slave' organism, 1000 characters

For records where there is both a `driver' and a `slave' organism, there is frequently ecological information about the relationship of those two organisms. The fungus may be parasitic on the plant, or mutualistic with the plant etc. This field is set aside for a free-text description of that relationship. The BMS databases make no formal provision for such recording, nor for the recording of symptoms, and in the IMI system this and the following three fields have been introduced comparatively recently.

[CsuxAcrelA]

An edited description of the releationship between the `driver' organism and the `slave' organism, 100 characters

This field contains an edited opinion on the meaning of the data in [CsuxRelatA]. Like [CsuxAcdesA] this field is used for mechanical retrival and processing of information, so the description must be structured. The field therefore stores a thesaurus term decribing the ecological relationship of the `driver' organism with the `slave' organism. Thus, if the `driver' organism is Puccinia oxalidis and the `slave' is Oxalis, the field might read `parasitic'. If, however, the `driver' is Oxalis and the `slave' Puccinia oxalidis (there is no logical reason why this should not be the case), the field should read `parasitized'. There is a need to develop or identify and adopt a thesaurus of ecological relationship terms.

[CsuxSymp_A]

The symptoms caused by the `driver' organism to the `slave' organism, 1000 characters

For records where there is both a `driver' and a `slave' organism and their relationship involves parasitism, there is frequently information about the symptoms caused by the `driver' to the `slave'. This field is set aside for a free-text description of those symptoms.

[CsuxAcsymA]

An edited description of the symptoms caused by the `driver' organism to the `slave' organism, 100 characters

This field contains an edited opinion on the meaning of the data in [CsuxSymp_A]. Like [CsuxAcdesA] and [CsuxAcrelA] this field is used for mechanical retrieval and processing of information, so the description must be structured. The field therefore stores a thesaurus term decribing the symptoms caused by the `driver' organism to the `slave' organism, following a syntax similar to that of the preceding fields. There is a need to develop or identify and adopt a thesaurus of symptom terms.


Previous page