A curated, ontology-based, large-scale knowledge graph of artificial intelligence tasks and benchmarks

Validation and evaluation of a knowledge graph and/or ontology aims to assess whether the resource adequately and accurately covers the domain it intends to model, and whether it enables an efficient execution of the tasks it was designed for.

Commonly used criteria to evaluate ontologies based on these aspects include accuracyclaritycompletenessconcisenessadaptabilitycomputational efficiency and consistency22

Accuracy indicates whether the definitions and descriptions of elements in an ontology are correct. Clarity measures whether the ontology’s elements are clearly defined and labeled, and understandable for the user. Achieving high accuracy and clarity was ensured in ITO through an extensive manual curation period lasting several months.

The criterion of completeness is concerned with whether the domain to be modeled is adequately covered by the ontology, while conciseness indicates to which extent the ontology covers only elements relevant to the domain. Both criteria are ensured in ITO through the bottom-up development approach that makes use of existing data (ie, benchmarks extracted from preprint servers) and concepts relevant to the domain of AI processes instead of a top-down approach that starts with a blank slate . Relying on existing data sources, such as the PWC database that combines automated extraction of benchmarks from papers on preprint servers and crowd-sourced annotation by several thousands of contributors enables high domain coverage. Completeness was further tested by using ITO to annotate a collection of over 450 datasets and AI benchmarks in the biomedical domain, with ITO being found to cover all required concepts to annotate all datasets13

adaptability is concerned with whether the ontology meets the requirements defined by the range of use cases for which it was built. The practical usability of ITO for its intended applications has been validated within two recently conducted studies (Barbosa-Silva et al., manuscript in preparation)12

Computational efficiency indicates whether the ontology’s anticipated tasks can be fulfilled within reasonable time and performance frames using the available tools. Even complex queries related to the use cases described above can be executed within a few seconds on standard hardware when using the high-performance Blazegraph graph database.

finally, consistency requires the ontology to be free from any contradictions. Internal consistency was checked using Protégé v5.5.0 and the elk 0.4.3 reasoner23.24

Furthermore, common pitfalls in ontology design and creation have been described, which, for example, include the creation of unconnected ontology elements, missing human readable annotations or cycles in class hierarchies25,26,27,28† ITO was checked for these with the ontology quality checking tool ‘OOPS!’27and identified issues were resolved.

Ontology evaluation metrics were calculated with the Ontometrics tool29 and were used for ontology quality evaluation following the example of Carriero et al30† Ontology metrics are reported in Table 4.

Table 4 Ontology evaluation metrics.

the inheritance number of 1.73 is low, suggesting that ITO is a deep ontology, ie the class hierarchy is well grouped and covers the domain in a detailed manner. the relationship richness as calculated by the Ontometrics algorithm of 0.002 is low, which, however is due to the fact that the vast majority of relationships in ITO are captured at the level of OWL individuals rather than classes. The axiom/class ratio is high, indicating a richly axiomatized ontology. the average population number of 5.62 indicates a good balance between the count of individuals (ie, mostly benchmark results) and the number of classes in the class hierarchy used to structure those results. the class richness of 0.49 suggests that roughly half of the classes in the ontology are not instantiated by individuals; this is due to Datesdata format and topic branches of the ontology that are primarily used for defining attributes of other classes, rather than being instantiated themselves. The average depth depth value of 5.36 is within the normal ranges for an ontology of the given size. The maximum breadth and absolute sibling cardinality of 4590 and 9037 are very high. This is caused by the modeling decision of creating a process class called Benchmarking, which is the direct superclass of the large number of classes representing benchmarks in the ontology. This design choice also led to a high tangledness metric, ie a large number of classes with multiple superclasses, since benchmark classes have both a specific AI task and the Benchmarking class as direct superclasses. While this particular design choice deviates from best practices of ontology design, it proved favorable for ease of querying the ontology, which was an important design goal.

Other data sources and related work

Besides PWC, we also investigated some other projects aiming to track global AI tasks, benchmarks and state-of-the-art results have been initiated in recent years as potential data sources. Among these, the Aicollaboratory31 and State of the art AI (https://www.stateoftheart.ai/) stood out as the most comprehensive and advanced resources.

‘AIcollaboratory’ is a data-driven framework enabling the exploration of progress in AI. It is based on data from annotated AI papers and open data from, eg, PWC, AI metrics and OpenML. Similar to the projects described above, benchmark results are organized hierarchically and can be compared per task. In addition, the platform provides summary diagrams that combine all benchmark results per top-level task class, eg, ‘Natural language processing’ and display progress over time. We found that relevant data in AIcollaboratory were already covered by PWC, and that the project did not seem to be actively maintained at the moment.

‘State of the art AI’ collects AI tasks and datasets, models and papers building on data from PWC, arXiv, DistillPub and others. Similar to PWC, it organizes AI tasks, allows for a comparison of results per task, and makes them available on a web-based platform. However, data are not available for download at the time of this writing, and relevant data were already covered by PWC.

There are some ontologies and taxonomies that are related to ITO. the Computer Science Ontology (CSO)32 is a large-scale ontology created through literature mining that captures research areas and their relations in computer science. WikiCSSH provides a large-scale, hierarchically organized vocabulary of subjects in computer science that was derived from Wikipedia33† Compared to ITO, CSO and WikiCSSH have lower coverage of the domain of AI tasks. Outside of the domain of computer science, the Cognitive Atlas Ontology provides concepts of human cognition that partially overlap with concepts from AI34

There are several related projects that aim to capture scientific results through knowledge graphs. the Artificial Intelligence Knowledge Graph (AI-KG) contains a large collection of research statements mined from AI manuscripts35† the Open Research Knowledge Graph (ORKG)36 captures research statements across multiple scientific domains. the Academia/Industry Dynamics (AIDA) Knowledge Graph describes 21 million publications and 8 million patents and utilizes CSO for annotations.

There are also multiple partially related initiatives towards creating large, integrated knowledge graphs in the life sciences. The decentralized nanopublications infrastructure that captures and integrates research statements and their provenance, particularly in the domain of life sciences37† More centralized ontology-based knowledge graphs that were recently published include OpenBioLink38Hetionet39 and PheKnowLator40

Maintenance and future development

To ensure content validity and keeping up with the fast-paced developments in the field of AI, newly available data will be periodically imported. Furthermore, the underlying ontological model will be subject to continuous refinement, and future developments will also focus on creating mappings between ITO and other thematically relevant ontologies and knowledge graphs, particularly AI-KG, ORKG and CSO.

New Technology Era

Leave a Reply

Your email address will not be published.