Skip to main content

Research Repository

Advanced Search

Vesper: Visualising species archives

Graham, Martin; Kennedy, Jessie

Authors

Martin Graham



Abstract

Vesper (Visual Exploration of SPEcies-referenced Repositories) is a tool that visualises Darwin Core Archive (DwC-A) datasets, and is aimed at reducing the amount of time and effort expended by biologists to ascertain the quality of data they are generating or using. Currently, DwC-A quality checking is limited to table outputs of data ‘existence’ and compliance with DwC-A format guidelines via the online DwC-A archive validator and reader. While these tools thoroughly examine the presence of data, and the correctness of data structure against the DwC-A schema, they do not give any insight into the underlying quality of the data itself.
Built on top of the D3 JavaScript library, Vesper analyses and displays DwC-A datasets in three fundamental dimensions - taxonomic, geographic and temporal - with a visualisation dedicated to each of these aspects of the data. By viewing a dataset’s composition in these dimensions, a data consumer can judge whether it is suitable for the tasks or analyses they have in mind, while a data provider can identify where a dataset they’ve constructed may fall short in terms of data quality i.e. does it contains data that is obviously incorrect such as the classic longitude inversion that places North American specimens in China. A further visualisation of the taxonomic dimension can reveal the subtaxa distribution of reference taxonomies - while a simple table reveals the presence or not of certain data types for each record to give an overall data ‘existence’ profile for the dataset. Selections of parts of a dataset within one visualisation are linked to the other visualisation displays for that dataset, permitting the discovery of whether data quality issues are restricted to identifiable sub-portions of the dataset.
Vesper can handle client-side data sets of a million entities within a browser by judicious use of data filtering, as many of the data types within individual records are not necessary to judge the geographic, temporal or taxonomic distribution and extent of a dataset. Thus, many of the more verbose fields in the file can simply be passed over during an initial data decompression stage. Furthermore it can provide limited name and structure matching of a dataset against DwC-A packaged reference taxonomies to indicate data quality relative to sources outside the archive. A selection of annotated example scenarios shows how Vesper can reveal data quality issues in DwC-A archives.

Citation

Graham, M., & Kennedy, J. (2014). Vesper: Visualising species archives. Ecological Informatics, 24, 132-147. https://doi.org/10.1016/j.ecoinf.2014.08.004

Journal Article Type Article
Acceptance Date Aug 17, 2014
Online Publication Date Aug 30, 2014
Publication Date 2014-11
Deposit Date Sep 2, 2014
Publicly Available Date Sep 2, 2014
Print ISSN 1574-9541
Electronic ISSN 1878-0512
Publisher Elsevier
Peer Reviewed Peer Reviewed
Volume 24
Pages 132-147
DOI https://doi.org/10.1016/j.ecoinf.2014.08.004
Keywords Information Visualisation, Data Quality, Darwin Core Archive, Open Source, Biodiversity
Public URL http://researchrepository.napier.ac.uk/id/eprint/7129

Files









You might also like



Downloadable Citations