The Plant List — A working list for all plant species

About The Plant List

The Plant List is a working list of all known plant species. Version 1.1, released in September 2013, aims to be comprehensive for species of Vascular plant (flowering plants, conifers, ferns and their allies) and of Bryophytes (mosses and liverworts). It does not include algae or fungi. Version 1.1 contains 1,293,685 scientific plant names of which 350,699 are accepted species names. It includes no vernacular or common plant names.

Collaboration between the Royal Botanic Gardens, Kew and Missouri Botanical Garden enabled the creation of The Plant List by combining multiple checklist datasets held by these institutions and other collaborators.

The Plant List provides the Accepted Latin name for most species, with links to all Synonyms by which that species has been known. It also includes Unresolved names for which the contributing data sources did not contain sufficient evidence to decide whether they were Accepted or Synonyms, or where there were conflicting opinions that could not be readily resolved.

A description of the content, creation and use of The Plant List follows.

Contents

Overview

The Plant List is a widely accessible working list of known plant species and has been developed and disseminated as a direct response to the Global Strategy for Plant Conservation, adopted in 2002 by the 193 governments who are Parties to the Convention on Biological Diversity. The GSPC was designed as a framework for action to halt the loss of plant diversity. Target 1 of the Strategy called for the completion by 2010 of a widely accessible working list of all known plant species, as a step towards a complete world Flora. Released in December 2010, Version 1 of The Plant List aimed to be comprehensive for species of Vascular plant (flowering plants, conifers, ferns and their allies) and of Bryophytes (mosses and liverworts). This is consistent with the initial focus of the GSPC. Target 1 of the GSPC was revised in December 2010 and now calls for “An online flora of all known plants” by 2020.

This release of The Plant List, Version 1.1, resolves errors known to occur in Version 1.0, improves and extends our algorithms for detecting and resolving conflicting opinions; updates records from existing data sources; and includes a set of data for Rosaceae from Richard Pankhurst at Royal Botanic Gardens, Edinburgh. Sadly Richard passed away in early 2013 and we will all miss his enthusiasm, energy and insights. Please refer to Changes between The Plant List 1.0 and 1.1 for further detail on differences between versions.

The data was harvested for this release in May 2012. Many more partners wished to provide data, but we simply did not have the resources to broaden the scope for this release. Our aim is to produce future versions more dynamically which will allow us to be more inclusive. This will also be important to ensure that The Plant List is able to contribute to the developing World Flora On-line.

The Plant List is not perfect and represents work in progress. Our aim was to produce a ‘best effort’ list to demonstrate progress and stimulate further work (see Limitations.

The Plant List was produced as a collaborative venture coordinated by the Royal Botanic Gardens, Kew and the Missouri Botanical Garden and involving collaborators worldwide.

Data records from numerous existing global checklist databases (derived from primary taxonomic publications) were brought together and combined with regional and national checklist data and other records from Tropicos. These resources were then complemented by the inclusion of additional names found in IPNI (for Angiosperms, Gymnosperms and Fern & Fern Allies). The Plant List may omit some names and may include some duplicate names. Furthermore those names derived from nomenclators may not include any indication of whether they are Accepted names or Synonyms. Our purpose has been to detect inconsistencies between overlapping data sources and resolve them where possible.

The Plant List does not seek to duplicate the efforts of collaborators that have contributed data to the creation of The Plant List. This version will not be edited but feedback will be forwarded to our collaborators so that they can extend and improve their original data. (see Enhancing The Plant List and Recreating The Plant List). Feedback will arise from our own analysis of the data (and its comparison with other resources) and from users of The Plant List (see How to Submit Feedback).

In the future we hope to

  1. include improved and extended versions of the data sets included in this version of The Plant List
  2. to include other data sets which we were unable to include in Version 1 and
  3. to refine the procedures that were used to create The Plant List: e.g. for locating duplicate name records, for resolving inconsistencies and for detecting conflicting opinions expressed within alternative data sets and then for selecting from among those opinions (see How The Plant List was Created).

We welcome comments on the content of The Plant List, and offers of contributions for inclusion in the next edition.

Changes between The Plant List 1.0 and 1.1

Statistics relate to the final product: Version 1.1 in bold (version 1.0 in italics). The Plant List contains:

  • 1,293,685 (1,244,871) scientific name records of all ranks:
    • This represents an increase of 52,000 additional name records compared with version 1.0.
    • The overall increase hides larger increases in some areas and reductions in others.
  • 1,064,035 (1,040,426) names of species rank of which
  • 350,699 (298,900) are recognised as accepted species names
  • 642 (620) plant families and
  • 17,020 (16,167) plant genera.

The status of the 1,064,035 species names, are as follows:

Status Version 1.1 Version 1.0
Accepted 350,699 298,900
33.0% 27.8%
Synonym 470,624 477,601
44.2% 45.9%
Unresolved 242,712 263,925
22.8% 25.4%

Thus there are

  • 46k more accepted names than in version 1.0
  • 26k more synonyms and
  • 20k fewer unresolved records

Levels of confidence associated with these records. There are

  • 16k more records of high confidence
    • an increase of >4% over Version 1.0
  • 27k more records of medium confidence
    • an increase of >7%
  • 26k fewer records of low confidence
    • a reduction of >8%

Navigating between Versions of The Plant List

  • Visitors to www.theplantlist.org will now access The Plant List 1.1 rather than The Plant List 1.0.
  • The older version (The Plant List 1.0) will nevertheless continue to remain visible and can be accessed at www.theplantlist.org/1/. The Plant List 1.0 will however be masked and visitors made aware that it has been superseded.
  • Third party websites which linked to individual pages within The Plant List 1.0 will continue to work.

Website

Other major differences include:

  • Individual records indicate the date on which data was supplied by owner;
  • Browsing is now possible for all (potential) taxa: i.e. you will find both accepted and unresolved names in the taxonomic hierarchy;
  • Names modified during TPL data processing are now flagged;
  • Individual name records missing particular data values have, where possible, been completed using values taken from IPNI;
  • Species pages now link to their infra-taxa and vice versa;
  • Genus pages include statistics for infraspecific names and taxa;
  • The “Creative Commons” licence has been replaced by a more explicit Terms of Use describing how the data can be used;
  • The licence Terms of Use specifically permit production of derivative works for non-commercial use.

Limitations

The Plant List is not perfect and represents work in progress. Data comes from a variety of sources which are both monographic (global) and regional in scope. These data sources vary in the extent to which comprehensive synonymy is included, their stage of development (proximity to publication) and the degree to which they have been exposed to peer review. The Plant List indicates the confidence which can be given to the status of a particular name record using a star rating. Around 20% of names are unresolved indicating that, considered collectively, the data sources included provided insufficient evidence as to whether the name should be treated as accepted or not.

The Plant List is static. It is neither updated regularly from the original data sources, nor edited directly. Data was extracted from source databases in May 2012 and thus records included here may differ from their current equivalent records in the source database from which they were taken. Where you suspect errors in The Plant List, please first check the source databases where corrections may have already been made. Feedback and corrections pertaining to records in The Plant List are passed on to the source database for consideration. If accepted by the source database they may be incorporated in a future version of The Plant List. We do not edit or revise the content of The Plant List directly.

There exist other reliable authoritative sources of taxonomic opinion for some groups or some regions which we simply did not have the time or resources to include in this version of The Plant List. Our ambition is for future versions to be more inclusive and comprehensive. Although The Plant List may be the most comprehensive single information resource covering all plants, it is imperfect and not all the taxonomic decisions contained derive from a peer reviewed, curated, authoritative source. The Plant List therefore should only be treated as advisory. Other, more authoritative lists may exist for particular regions or taxa.

Target audience

The name of a plant is the key to communicating about it and to finding information about its uses, conservation status, relationships and place within ecosystems. The Plant List provides a tool for resolving or verifying the spelling of plant names and a means to find from a global view the botanically accepted name for a plant and all of its alternative synonyms. Since the ability to plan the sustainable use of plants, essential resources for food, medicines, and ecosystem services depends on effective retrieval of information about plants there is a broad constituency of potential users of The Plant List.

Scope

The Plant List is a working list of known plant species, which aims to be comprehensive in coverage at species level for all names of mosses and liverworts and their allies (Bryophytes) and of Vascular plants which include the flowering plants (Angiosperms), conifers, cycads and their allies (Gymnosperms) and the ferns and their allies including horsetails and club mosses (Pteridophytes).

For each name at species level we aim to provide the author of the name, the original place of publication and an assessment of whether the name is accepted or is a synonym for another name from data resources held by Kew, by Missouri Botanical Garden and by our collaborators. Wherever possible for each name included links are also provided to the original online database record, to its corresponding entry in IPNI and to further sources of information about that plant.

The names of some subspecies or varieties of plant are also included in The Plant List primarily where they are synonyms or accepted names for species names and where they were available from the contributing data sets. The Plant List does not aspire to comprehensive coverage of infraspecific taxa (subspecies, varieties, forms etc.).

What does The Plant List not contain?

Version 1.1 of The Plant List does not contain:

  • scientific names for fossil plants, algae or fungi;
  • common (or vernacular) names for the plants included;
  • the geographic distribution or any other data about the plants included (though such data may be obtained from the source databases in many instances).

Description of The Plant List data set

Taxonomic coverage

The Plant List includes all known species of the following major plant groups:

  • Angiosperms
  • Gymnosperms
  • Pteridophytes
  • Bryophytes

Genera and species are presented in families which follow the source database(s) except in the case of Angiosperms where we have, wherever possible allocated accepted genera to the families recognised by the Angiosperm Phylogeny Group.

Angiosperms

Angiosperms (Subclass Magnoliidae Novák ex Takht.). Subclass level classification follows Chase, M.W. & Reveal, J.L., 2009. A phylogenetic classification of the land plants to accompany APG III. Botanical Journal of the Linnean Society, 161, 122–127.

Genera and species of Angiosperms are presented in families following family circumscriptions in The Angiosperm Phylogeny Group, 2009. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Botanical Journal of the Linnean Society, 161, 105–121.

Within the Angiosperms, data quality varies widely reflecting the patchiness of the taxonomic and geographic coverage of the source databases. Coverage is believed to be most comprehensive and consistent for Monocotyledon, where The Plant List benefited from the existence of comprehensive checklists fully reviewed by experts (see WCSP and GrassBase). For Angiosperms other than Monocotyledons, expert-reviewed lists of similar quality provide comprehensive and consistent coverage for certain major families. Otherwise coverage is more patchy and likely to be less consistent as the name records have been assembled from regional lists and/or other sources not fully reviewed by specialist systematists. Coverage is probably least reliable for areas for which regional lists were not available for incorporation, especially for South East Asia, and for genera with names ending with the letters H–Z (as genera beginning with the letters A–G benefited from earlier compilation effort as part of development of the World Checklist of Selected Plant Families).

Gymnosperms

Gymnosperms — Conifers, Cycads, Ephedras, Gnetum, Ginkgo and Welwitschia (including Subclass Ginkgooidae Engl; Subclass Cycadidae Pax, Subclass Pinidae Cronquist, Takht. & Zimmerm.; Subclass Gnetidae Pax following Chase and Reveal, 2009)

Gymnosperms records derive primarily from WCSP and incorporate the 2001 World Checklist of Conifers by A.Farjon. Coverage is thought to be comprehensive.

Pteridophytes

Pteridophytes — ferns, horsetails and club mosses (including Subclass Equisetidae Warm; Subclass Marattiidae Klinge; Subclass Ophioglossidae Klinge; Subclass Polypodiidae Cronquist, Takht. & Zimmerm; Subclass Psilotidae Reveal; Subclass Lycopodiidae Beketov.)

No peer-reviewed global list of any family of ferns or other Pteridophyte has been incorporated. The data presented are compiled from regional and nomenclatural sources not reviewed by experts and are therefore likely to be less comprehensive and consistent than those for Angiosperms and Gymnosperms.

Bryophytes

Bryophytes — mosses, liverworts and hornworts (including Subclass Anthocerotidae Engl.; Subclass Bryidae Engl; Subclass Marchantiidae Engl.)

Nomenclatural coverage

The Plant List aims to provide all the scientific names for species for these plant groups. A breakdown of the numbers of plants and names included in each plant group is provided, see Statistics. Coverage of infraspecific taxa (subspecies, varieties, forms etc) is not comprehensive; they are included, primarily where they are synonyms or accepted names for species names.

Coverage and data quality are primarily influenced by the source data sets used to build The Plant List. We are aware of additional data sets which, had they been included, would have enriched and improved the final product. We hope to include such data sets in later releases.

The Status of name records

Each name record included within The Plant List is assigned one of the statuses listed below. The Status of each name is derived primarily from the data source from which that name record comes (see Derivation of Name Status). The decision, for example, that one name is the accepted name of a given plant is based upon a taxonomic opinion recorded within the cited data source. Such decisions were automated using a rules-based approach which, where necessary, selected from among alternative taxonomic opinions expressed, within or between different data sources. For an explanation of how these decisions were reached see How The Plant List was Created.

Accepted Name

This is the name which should be used to refer to the species (or to a subspecies, variety or forma).

For each name with Status of ‘Accepted’ The Plant List aims to provide:

  • the name currently accepted as the one which should be used in preference to refer to this species (or subspecies, variety or forma);
  • the author(s) credited with publishing that name;
  • the place and date of original publication of the name where this was supplied;
  • a reference to the source database supplying this name record that recorded the opinion that it is an accepted name (with, where possible, a link to that record in the source database);
  • other names (synonyms) considered to refer to that species;
  • the IPNI identifier (linking the name record to the International Plant Names Index, a bibliographic resource which will provide full original publication details for this name);
  • an assessment of the Confidence that The Plant List attaches to the name being accepted. (This is an indication of the confidence that the Status of the name is correct).

Synonym

A Synonym is an alternative name which has been used to refer to a species (or to a subspecies, variety or forma) but which The Plant List does not consider to be the currently Accepted name. The decision to assign the Status of Synonym to a name record is based upon a taxonomic opinion recorded in the cited data source (selected using automated rules-based approach; see How The Plant List was Created).

Synonymy can be derived directly from the source data (showing identical data as the source data) or can be derived indirectly using the automated decision rules (e.g. if source 1 says that A is a synonym of B and source 2 says that B is a synonym of C, then The Plant List will show A to be a synonym of C).

For each name with Status of Synonym The Plant List aims to provide:

  • the name;
  • the author(s) credited with publishing that name;
  • the place and date of original publication of the name where this was supplied;
  • a link to its Accepted name;
  • a reference to the source database supplying this name record and expressing the opinion that it is a Synonym (with, where possible, a link to that record in the source database);
  • the IPNI identifier (linking the name record to the International Plant Names Index, a bibliographic resource which will provide full original publication details for this name);
  • an assessment of the Confidence that The Plant List attaches to the Status of the name being Synonym.

Unresolved Name

Unresolved names are those to which it is not yet possible to assign a Status of either ‘Accepted’ or ‘Synonym’. For an explanation of how names were assigned a status please refer to How The Plant List was Created. Unresolved names fall into two sub-classes:

Unassessed names

for which there is no evidence within any of the contributing data sources that the status of this name had been evaluated by the data owners. None had recorded that it was either ‘accepted’ or a ‘synonym’. None had recorded that they had attempted such an evaluation. Since, by definition, a name is accepted by the publishing author at the time of publication, it could be argued that all names are putatively Accepted until such time as they are demonstrated to be Synonyms.

Unplaced names

for which the data source supplying that record indicated positively that the data owners had sought to resolve its status and not been able to come to a conclusion so as to place it either in synonymy or as the accepted name of a new species. This is often the case if the name has insufficient description and no herbarium specimens are known or where one or more nomenclatural acts are required to provide the accepted name for which the unplaced name would be a synonym.

Among Unresolved names, Unassessed names are much more numerous than Unplaced names.

It is also important to note that in a small number of cases the status ‘Unresolved’ was assigned to a name record during creation of The Plant List despite a taxonomic opinion having been recorded in the contributing data source. This occurs where to have followed this opinion would have conflicted with opinions recorded elsewhere in other data sources. To follow both would have resulted in inconsistencies within the working list of plants. In such cases:

  • the status of the record on this website is indicated with an ‘*’ to indicate that it derives from the procedures used to build The Plant List and
  • the original status of the name (as recorded in the source database) is indicated on the details page for that name.

For each name with Status of ‘Unresolved’ The Plant List aims to provide:

  • the name;
  • the author(s) credited with publishing that name
  • the place and date of original publication of the name where this was supplied;
  • a reference crediting the source database providing the name; (with, where possible, a link to that record in the source database)
  • the IPNI identifier (linking the name record to the International Plant Names Index which will provide full original publication details for this name);
  • Unresolved names are generally flagged as ‘Low Confidence’ entries.

Misapplied Names

Some data sets which contributed to The Plant List record not only how plant names should be used but also where in the published literature a given name may previously have been used inappropriately (to refer erroneously to another species). Recording such misapplication of names helps users to avoid pitfalls when interpreting the literature. The decision that a record represents the misuse of a name is derived from the cited data source (see How The Plant List was Created.)

For each reported misapplication of a plant name we aim to provide:

  • the name;
  • the author(s) that published that name and wherever possible an indication of where or by whom this was misused (e.g. ‘sensu Smith’ may appear after the publishing author);
  • a link to the Accepted name of the species to which this name has been previously and erroneously applied;
  • a reference crediting the source database recording this misuse of the name; (with a link to that record in the source database and hence the publication details of where this name was misapplied);
  • an assessment of the Confidence that The Plant List attaches to this name having being erroneously applied to the other species.

Annotation of names

Sources which contributed name records to The Plant List record included, on relatively few occasions, additional information about individual names beyond their status as Accepted or Synonym. Where possible this information is retained within The Plant List and made visible to users as annotations attached to the relevant name record.

Invalid and Illegitimate Names

Some of the names in The Plant List were recorded by the contributing data sets to be either invalidly or illegitimately published according to the rules of the International Code of Botanical Nomenclature.

Spelling variants (or Orthographic variants)

Some data sources include names which are recorded as ‘Orthographic variants’ (or spelling variants) of another name. These misspelt names may not have been validly published and yet are nevertheless used in the literature and therefore included in The Plant List to guide those that find them.

Confidence Levels

For each name record The Plant List offers an indication of the confidence that the Status of the name record is correct: Our confidence assessments are based primarily on the nature and taxonomic integrity of the source data.

High Confidence level

is applied to the Status of name records derived from taxonomic datasets which treat the whole of the taxonomic group in question on a global basis and have been peer reviewed (e.g. ILDIS, WCSP, see collaborators).

Medium Confidence level

is applied to the Status of name records derived from:

  • Either national or regional databases via a rules-based automated process, reflecting the challenges inherent in resolving taxonomic differences between different name data sets for the same species for different geographic areas. Regional datasets used as sources for The Plant List are primarily those stored within Tropicos (see collaborators for details).
  • Or taxonomic datasets which treat the whole of the taxonomic group in question on a global basis but which have not yet undergone peer review (e.g. GCC and WCSP (in review) see Collaborators).

Low Confidence level

is applied to the Status of name records derived from

  • any of the contributing data sets which were recorded as unresolved in those data sets.
  • to name records whose status has been inferred from (sometimes conflicting) information from more than one source database.
  • to records derived from nomenclatural resources such as IPNI which do not contain opinions about the status of the name and which were assigned a status of Unresolved in The Plant List.

Contributing data sets

The data resources used to build Version 1 of The Plant List are listed here and we are grateful to the many collaborators listed below that made their data available.

We welcome offers of additional data sets for inclusion in the future editions of The Plant List (see Contributions).

Global species resources

  • World Checklist of Selected Plant Families

    This large database of global monographic treatments was supplied to The Plant List as two separate data sets which were treated slightly differently:

    1. WCSP

      Peer reviewed treatments are available online for 151 Seed Plant families (view published families). WCSP gives information on the accepted scientific names and synonyms of selected plant families. It includes more than 320,000 names and allows the user to search for all the scientific names of a particular plant, or the areas of the world in which it grows (distribution). The data set counts upon the collaboration over 16 years of 132 specialists from 25 countries who have contributed data or acted as reviewers.

    2. WCSP (in review)

      In addition to the published family checklists the World Checklist database contains data for many other families which have either been completed and await review by specialists or are still being compiled. The Plant List also incorporates these unpublished data which include more than 290,000 additional names.

  • GrassBase – The Online World Grass Flora

    The nomenclatural component of this database currently holds over 60,000 names and lists names for any given genus, geographical region or genus within a geographical region; and links to the GrassBase description for any species. The nomenclatural data from GrassBase is made available through the WCSP system.

  • The Global Compositae Checklist

    is an integrated database of nomenclatural and taxonomic information for the second largest vascular plant family in the world. This checklist is published by the International Compositae Alliance and compiled from many contributed datasets. The database will be continually updated. Additional information such as references, distribution and infraspecific taxa are available on the website. All species are marked as ‘provisionally accepted names’ in the Beta version. The data set has not yet been fully peer-reviewed and may contain some errors. More than 100,000 records derived from The Global Compositae Checklist are included in The Plant List.

  • The International Legume Database and Information Service

    is a long-term programme of co-operation among legume specialists worldwide to create a biodiversity database for the Leguminosae (Fabaceae) family. The database provides a taxonomic checklist plus basic factual data on distribution, common names, life-forms, uses, literature references to descriptions, illustrations and maps. More than 40,000 records derived from ILDIS are included in The Plant List.

  • Royal Botanic Gardens Edinburgh, Richard Pankhurst — Rosaceae

    The Rosaceae Database is a dataset of over 55,000 names with classification status, for this economically and ecologically important plant family. This extensive and highly polished dataset was developed by Dr Richard Pankhurst, with the help of colleagues at the Royal Botanic Garden Edinburgh, using his PANDORA database system. Due to the horticultural and agricultural importance of this family many taxon names are often of dubious validity and were published in inaccessible literature. However, Richard tracked down the original publication for ca. 95% of the names. Richard died in March 2013, and to facilitate the continued use of the data, the dataset has been included within the World Checklist of Selected Plant Families. The Rosaceae dataset is now being kept up to date as part of WCSP, ensuring a lasting legacy for Richard’s life’s work.

  • The iPlants project

    developed and tested the processes and procedures that would be required during production of an authoritative, global online list of plant names. The project was a collaboration between The Royal Botanic Gardens, Kew, the Missouri Botanical Garden and the New York Botanical Garden and was funded from April 2004 to May 2006 by the Gordon and Betty Moore Foundation. Checklists for the following families were made available for The Plant List: Bignoniaceae, Iridaceae, Lecythidaceae, Melanophyllaceae, Physenaceae, Sarcolaenaceae, Schlegeliaceae and Sphaerosepalaceae. More than 11,000 records derived from iPlants are included in The Plant List.

  • The International Organization for Plant Information

    aims to provide a series of computerised databases summarizing taxonomic, biological, and other information on plants of the world. IOPI’s mission is to develop an efficient and effective means of providing basic plant information to users, and guide them toward sources of authoritative data. Their checklist currently holds over 200,000 names from which The Plant List includes records for Juncaceae compiled by J. Kirschner (Institute of Botany, Pruhonice) (Over 1,000 name records).

  • Missouri Botanical Garden

    The Bryophyte information was primarily gathered from A Checklist of Mosses and ongoing projects dealing with mosses and liverworts to create World Checklists for these groups. Some liverwort names were not yet available from data sources but are expected to be added in future versions.

Floristic Datasets

  • Missouri Botanical Garden

    the botanical information system at the Missouri Botanical Garden, Tropicos, contains information on over one million plant names and 3.9 million herbarium specimens. The system was developed through the actions of a wide variety of floristic, nomenclatural, and bibliographic projects both at the Garden and in collaboration with other institutions. All of this information is available on the Internet through the Garden’s web site.

    Tropicos provides access to the accumulated data on vascular plant and bryophyte as authority files for the development of floras and checklists that provide synthesis of local and regional vegetation. Included within each of these syntheses are indications of acceptance, synonymy and misapplication of names within a floristic region. This information was used to evaluate plant names from these regions for The Plant List.

    The project data held by Tropicos and used in the development of The Plant List include:

    Information was also gleaned from recent published literature when the acceptance or synonyms have been recorded in Tropicos.

    More than 240,000 records derived from Tropicos were included in The Plant List.

  • African Plant Database

    is a database originating from a collaboration between the Conservatory and Botanical Gardens of the City of Geneva (CJB) and the South African National Biodiversity Institute (SANBI) to bring together all names of vascular plants for sub-Saharan Africa. Further important contributions include North Africa (Alain Dobignard) and Madagascar (Missouri Botanical Garden). It encompasses information on all vascular plant species from Africa, including synonymy and information on the ecology and distribution of species. The database currently (2013) comprises more than 198,000 names of African plants with their nomenclatural status corresponding to some 57,000 accepted species. Data are updated on a regular basis, following the literature.

  • Madagascan endemics

    The iPlants project also provided a checklist for Madagascan endemics.

Plant nomenclatural resources

  • The International Plant Names Index

    is a database of the names and associated basic bibliographical details of seed plants, ferns and fern allies. Its goal is to eliminate the need for repeated reference to primary sources for basic bibliographic information about plant names. The data are freely available and are gradually being standardised and checked. IPNI will be a dynamic resource, depending on direct contributions by all members of the botanical community. IPNI is the product of a collaboration between the Royal Botanic Gardens, Kew, the Harvard University Herbaria, and the Australian National Herbarium.

  • Uncompiled name data records derived from Kew’s checklist databases.

  • Uncompiled name data records from Missouri’s Tropicos database.

How The Plant List was Created

Development of The Plant List has been a collaborative venture coordinated at the Royal Botanic Gardens, Kew and Missouri Botanical Garden and relying on the generosity of many collaborators who manage significant taxonomic data resources. The purpose was to merge into a single consistent database the best of the nomenclatural information available in these diverse data resources through a defined and automated process. In summary, development of The Plant List involved merging many taxonomic data sources taking the accepted name and synonymy relationships from those that were global checklist datasets, augmenting these and adding additional names and synonymy relationships from regional and national floristic datasets following a set of decision rules. Species names not accounted for in any of the previously incorporated data sets were added from nomenclatural resources, ensuring the list is comprehensive for all plant names. Finally a further set of rules are applied to the final data set to resolve inconsistencies, conflicting or overlapping statuses and to correct logical data errors.

The Sequence for Merging Data Sets

The starting point was the set of global, peer reviewed family checklists published within the World Checklist of Selected Plant Families (WCSP). Families available through the WCSP from other sources including GrassBase, iPlants (Bignoniaceae, Iridaceae, Lecythidaceae, Melanophyllaceae, Physenaceae, Sarcolaenaceae, Schlegeliaceae and Sphaerosepalaceae) and IOPI (Juncaceae) were also included. To these were added additional global checklists from collaborating partners: The Global Compositae Checklist from the International Compositae Alliance and The International Legume Database and Information Service. Also incorporated were all of the compiled WCSP data records for families other than those which have been published (i.e. are in the process of being compiled or are under peer review): WCSP (in review).

The second category of information sources was various national and regional checklists. Missouri Botanical Garden’s Tropicos system, primarily provided data from about ten digital flora projects. Each of these national or regional floras or checklists was created at a different time by a different team of botanists and considers only plant specimens found within that area’s borders. Thus these floras/checklists contain different subsets of plants (and plant names) and often record conflicting opinions as to which are the accepted names for particular plants or what are their synonyms. In building The Plant List, therefore, a significant task was to automate procedures to trawl each of these different data sets to locate new information that they might contain about names and synonymy, then to detect and resolve conflicting opinions among these data sets and to add this additional information to the merged data set. A set of decision rules was employed to differentiate between and select from among the diverse opinions expressed within these national and regional data sets.

Finally, there were many scientific plant names (recorded in IPNI or included in Tropicos or WCSP as uncompiled records) that had not been included in any of the data sets consulted up to that point. The combined set of global and regional data was therefore compared with the IPNI database to detect names missing from our merged data set so that they could be added to our final product. Names derived from IPNI (and other nomenclatural data sets consulted) were included as ‘Unresolved’, since these data sets did not indicate whether these were Accepted names for plants not yet represented in the merged data set or whether they were Synonyms of plants already in the merged list.

A significant component of this and later phases of the creation of The Plant List involved the matching of names between different data sets to identify whether a name was unique to one data set or included in multiple data sets. The algorithms employed to perform name matching varied depending upon the requirements at each stage in the process.

Derivation of Name Status

The procedures used to build The Plant List were designed to follow the taxonomic opinions recorded within the contributing data sets. Where necessary these procedures selected from among alternative and conflicting opinions recorded between data sets so as to achieve a coherent taxonomic consensus.

Consistent application of the decision rules allowed resolution of most instances of conflicts between data sources so that most species names can be clearly presented as either an accepted name or as a synonym with reference to a data source in which that status is recorded. It is important to note that the set of synonyms which point to a given accepted name in The Plant List may have originated from more than one data source i.e. some synonyms for a given species may derive from a data set other than that from which the accepted name record derived.

Approximately 98% of all Status values within The Plant List derive directly from the data source which supplied that name record.

The Status of the remaining 2% of name records in The Plant List has been modified from that stored in the source data set as a result of the conflict resolution processes. Such changes were made only where necessary to avoid illogical conflicts detected within the data sets supplied or within the merged data set (i.e. they were made to improve the consistency of The Plant List). Where such changes were made, these were primarily to downgrade name records recorded as having a status of ‘Accepted’ in the source database to having a status of ‘Unresolved’ in The Plant List.

Any name records whose status was modified during the creation of The Plant List are labelled (using an asterisk) and the original status in the data source is also indicated. The Confidence level of any record modified by these procedures was set to ‘Low Confidence’.

Decision Rules to arbitrate between Conflicts of Opinion

A set of decision rules were employed to differentiate between and select from among diverse opinions expressed within all of the data sources consulted. These rules were developed by the team at Kew and Missouri in an attempt to mimic the sort of decision-making rationale a botanist might use in a situation where he/she encounters conflict between taxonomic treatments in the literature but is not in a position to resolve the question by examining the original material. For example:

  • monographic treatments which consider the group in question in its entirety throughout its distribution are given priority over geographically defined treatments which can result in a single species being treated under different names in different parts of its range;
  • synonym relationships reported in more recent treatments are given priority over those published earlier;
  • publication dates are used to assist in detecting likely illegitimate names;
  • author details are used to detect likely orthographic variants (alternative spellings of the same name);
  • the decision rules are informed by the principles embedded in the International Code of Botanical Nomenclature.

Data analysis of logical inconsistencies and data integrity issues

The data set created by merging records from the various data sources as described above was found initially to be inconsistent and logically incongruous for a variety of reasons.

Each of the taxonomic data sets incorporated into The Plant List are themselves still being developed and improved upon by their owners and editors. None therefore can be considered to be complete or entirely up to date. Nor would their owners claim that these data sets were free of inconsistencies, gaps or data error. Furthermore these databases use terminology in different ways which necessitated some level of standardisation. Some contained fossil plant names or names of taxonomic ranks that are not intended to be included in The Plant List and yet which, nevertheless, might link to names in the merged data set. Careful filtering of the record set was needed.

Inevitably, the process of bringing many different data sets together added a layer of further complexity. Thus for example it is not straightforward to automate recognition of a particular Latin binomial reliably within different data sets given that the plant name authors may have been cited or abbreviated differently, subtle differences in spelling and punctuation occur between the data sources and not all sources included the place of publication of a name to help resolve suspected matches. This added a degree of uncertainty even before other complexities such as those surrounding homonyms and misapplied use were dealt with. As a result, in certain circumstances, our procedures resulted in a few names being treated inconsistently in the merged dataset based upon unreconciled records derived from different sources.

The goal of The Plant List project is to create a single internally coherent view rather than a set of alternative views. The final stage of development of The Plant List therefore involved rigorous logical analysis of the data set. Steps were taken, for example, to identify likely duplicates used in different senses, to detect where a number of Synonyms link one to another but lack any link to an Accepted name, where illegitimate names are assigned Accepted status or where a subspecies included in the dataset occurs within a species which itself does not occur.

Resolution of logical inconsistencies and data integrity issues

For each different data inconsistency detected, solutions were derived based upon the concepts and principles as outlined above and used in the previous stages. Additional decision rules were created and new automated steps introduced to perform the following actions on the merged data set:

  1. Standardisation of terminology
  2. Standardisation, Selection and Filtering of name records
  3. Deduplication of names
  4. Resolving referential integrity regarding linkages among synonyms.
  5. Resolving referential integrity regarding taxonomic relationships.
  6. Standardisation of the names of Families and Major Groups so as to create the taxonomic hierarchy necessary to support browsing of The Plant List.

Online Publication of The Plant List

Target 1 of the GSPC was to achieve a "widely accessible" working list of all known plant species by 2020. To accomplish that aspect of Target 1, this website was created to enable world-wide access to the working list. The final merged and resolved data set of all plant species is accessible through the search and browse features offered here.

Next Steps

As a result of the data analysis and conflict resolution steps described above it is now intended to provide detailed feedback to each of the collaborators that contributed datasets on providing them with enriched data records, information on inconsistencies detected and comparisons with other relevant data sets. Details of the data processing entailed in creating The Plant List are to be published for broader discussion. Interest in the process and suggestions for refinements to the decision rules are welcome.

The project team

The Plant List owes its origins to a three-day workshop at Missouri Botanical Garden in May 2008. Bob Allkin, Eimear Nic Lughadha and Alan Paton (Kew) joined Bob Magill and Chuck Miller (MO) to plan how existing resources could best be combined to produce a best efforts working list to meet the 2010 deadline. The principles underlying our approach were agreed at that time, along with many of the decision rules and initial drafts of workflows for the data processing required. Translating that initial plan into action and refining the process to improve the product involved many more people over many months, with datasets, e-mails and occasionally people moving back and forward between Kew and St Louis.

Contributors working at Kew

  • Bob Allkin – Project Manager
  • Abigail Barker – Applications Development Manager
  • Matthew Blissett – Lead Developer Web Application
  • Charlotte Couch – Support Families and Genera Index
  • Paramjit Dhaliwal – IT Operations team
  • Jeff Eden – Graphic Designer
  • Rafaël Govaerts – Editor of the World Checklist of Selected Plant Families
  • Graham Hawkes – Developer responsible for The Plant List data and procedures
  • Chris Hopkins – Developer Web Application
  • Eimear Nic Lughadha – Senior Responsible Owner
  • Nicky Nicolson – Developer responsible for IPNI
  • Alan Paton – Assistant Keeper, Herbarium
  • John Stone – Graphic Designer
  • Julius Welby – Data administration
  • Ian Wright – IT Operations team leader

Contributors working at Missouri

  • Bob Magill – Senior Vice President of Science & Conservation
  • Chuck Miller – Vice President of Information Technology
  • Chris Freeland – Director of Center for Biodiversity Informatics
  • Jay Paige – Application and Database Developer
  • Heather Stimmel – Application and Database Developer
  • Craig Geil – Application and Database Developer

Enhancing The Plant List

No further changes will be made to The Plant List. The Plant List Version 1.1 was used to populate the initial taxonomic backbone of The World Flora Online, and is now being enhanced by more recent data sources and Taxonomic Expert Networks (TENs). See http://about.worldfloraonline.org. The World Flora Online will continue to link to names in TPL v1.1 as an online reference.

Relationships to other resources

The taxon pages in The World Flora Online link to names in TPL v.1.1 as an online reference.

Of the data resources that were used to create The Plant List, many of the previously published global monographic datasets are also available through the Catalogue of Life, which provides peer reviewed information for many plant families.