Ranking Wikidata items based on a keyword query




Wikidata is a knowledge base consisting of information that is expressed in a graph model. Each statement in Wikidata consists of triples (subject, predicate, object) which are known as the items. Until now, there are more than 78 million data items in Wikidata which are readable and editable by humans as well as machines. Due to its editability, the data items are increasing day by day too.

Searching for items in the Wikidata space is problematic, as we may find overlapping, closely related and even contradicting items. In this project, you will be evaluating some popular Information Retrieval ranking algorithms to filter the relevant items from an initial matching result set based on a keyword query. The outcomes of these ranking models will be judge by human experts. Also, you will present your observations and recommendations derived from the analysis of your experiment for selecting the most suitable ranking model to filter the items from a search result set.


Department of Finance


Strong skills in Java are required.
Knowledge of Information Retrieval techniques is desired.
Basic background in graph models, or the semantic Web in general is desired.


Denny Vrandecic, Markus Krötzsch: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10): 78-85 (2014)
Armin Haller, Axel Polleres: Are we better off with just one ontology on the Web? Semantic Web 11(1): 87-99 (2020)

Updated:  10 August 2021/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing