This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Faculty of Physics, University of Belgrade , Belgrade , Serbia
Among the most famous collections of publicly available data on the Internet is Wikipedia, which contains millions of articles in many languages covering a wide variety of topics. Complete dumps of all texts from the Wikipedia database in XML format are updated monthly. In this paper, the contents that exist on Wikipedia in the official languages of the former Yugoslavia are analysed and the knowledge base is integrated. Although there are over 10 million articles in this data collection, the number of described terms and topics is significantly smaller, because many articles only redirect to other articles, some are user conversations and some are article templates. A detailed classification of articles, terms, and topics was performed and their mutual connections were obtained (for that, an auxiliary dataset of the English version of Wikipedia was used). Detailed statistical, structural, and cluster analyses were performed on the generated graph of interrelationships of articles, terms, and topics. Using force-directed algorithms for redistribution of graphs, the final result was a comprehensive mapping and visualization of the knowledge base map.
The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.