Human and machine judgements for Russian semantic relatedness / Panchenko A., Ustalov D., Arefyev N., Paperno D., Konstantinova N., Loukachevitch N., Biemann C. // Communications in Computer and Information Science. - 2017. - V. 661, l. . - P. 221-235.

ISSN:
18650929
Type:
Conference Paper
Abstract:
Semantic relatedness of terms represents similarity of meaning by a numerical score. On the one hand, humans easily make judgements about semantic relatedness. On the other hand, this kind of information is useful in language processing systems. While semantic relatedness has been extensively studied for English using numerous language resources, such as associative norms, human judgements and datasets generated from lexical databases, no evaluation resources of this kind have been available for Russian to date. Our contribution addresses this problem. We present five language resources of different scale and purpose for Russian semantic relatedness, each being a list of triples (wordi, wordj, similarityij). Four of them are designed for evaluation of systems for computing semantic relatedness, complementing each other in terms of the semantic relation type they represent. These benchmarks were used to organise a shared task on Russian semantic relatedness, which attracted 19 teams. We use one of the best approaches identified in this competition to generate the fifth high-coverage resource, the first open distributional thesaurus of Russian. Multiple evaluations of this thesaurus, including a large-scale crowdsourcing study involving native speakers, indicate its high accuracy. © Springer International Publishing AG 2017.
Author keywords:
Crowdsourcing; Distributional thesaurus; Evaluation; Language resources; Semantic relatedness; Semantic similarity
Index keywords:
Crowdsourcing; Image analysis; Natural language processing systems; Thesauri; Evaluation; High-accuracy; Language processing systems; Language resources; Lexical database; Semantic relatedness; Semant
DOI:
10.1007/978-3-319-52920-2_21
Смотреть в Scopus:
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85014186549&doi=10.1007%2f978-3-319-52920-2_21&partnerID=40&md5=9a3f3de648e4e5623423064377b3c920
Соавторы в МНС:
Другие поля
Поле Значение
Link https://www.scopus.com/inward/record.uri?eid=2-s2.0-85014186549&doi=10.1007%2f978-3-319-52920-2_21&partnerID=40&md5=9a3f3de648e4e5623423064377b3c920
Affiliations TU Darmstadt, Darmstadt, Germany; Ural Federal University, Yekaterinburg, Russian Federation; Moscow State University, Moscow, Russian Federation; University of Trento, Rovereto, Italy; University of Wolverhampton, Wolverhampton, United Kingdom
Author Keywords Crowdsourcing; Distributional thesaurus; Evaluation; Language resources; Semantic relatedness; Semantic similarity
Funding Details DFG, Deutsche Forschungsgemeinschaft; 15-04-12017, RHF, Russian Humanitarian Foundation; 16-37-00354, RFBR, Russian Foundation for Basic Research; 283554, ERC, European Research Council
Funding Text We would like to acknowledge several funding organisations that partially supported this research. Dmitry Ustalov was supported by the Russian Foundation for Basic Research (RFBR) according to the research project no. 16-37-00354. Denis Paperno was supported by the European Research Council (ERC) 2011 Starting Independent Research Grant no. 283554 (COMPOSES). Natalia Loukachevitch was supported by Russian Foundation for Humanities (RFH), grant no. 15-04-12017. Alexander Panchenko was supported by the Deutsche Forschungsgemeinschaft (DFG) under the project “Joining Ontologies and Semantics Induced from Text (JOIN-T)”.
References Budanitsky, A., Hirst, G., Evaluating WordNet-based measures of lexical semantic relatedness (2006) Comput. Linguist, 32 (1), pp. 13-47; Pedersen, T., Pakhomov, S.V., Patwardhan, S., Chute, C.G., Measures of semantic similarity and relatedness in the biomedical domain (2007) J. Biomed. Inform, 40 (3), pp. 288-299; Gabrilovich, E., Markovitch, S., Computing semantic relatedness using Wikipediabased explicit semantic analysis (2007) Proceedings of the 20Th International Joint Conference on Artifical Intelligence, IJCAI 2007, pp. 1606-1611. , Morgan Kaufmann Publishers Inc; Batet, M., Sánchez, D., Valls, A., An ontology-based measure to compute semantic similarity in biomedicine (2011) J. Biomed. Inform., 44 (1), pp. 118-125; Bär, D., Biemann, C., Gurevych, I., Zesch, T., UKP: Computing semantic textual similarity by combining multiple content similarity measures (2012) Proceedings of the First Joint Conference on Lexical and Computational Semantics, pp. 435-440. , vol. 1: Proceedings of the Main Conference and the Shared Task, vol. 2: Proceedings of the Sixth InternationalWorkshop on Semantic Evaluation, SemEval 2012, Association for Computational Linguistics; Tsatsaronis, G., Varlamis, I., Vazirgiannis, M., Text relatedness based on a word thesaurus (2010) J. Artif. Intell. Res, 37 (1), pp. 1-40; Patwardhan, S., Banerjee, S., Pedersen, T., Using measures of semantic relatedness for word sense disambiguation (2003) Cicling 2003. LNCS, 2588, pp. 241-257. , Gelbukh, A. (ed.), Springer, Heidelberg; Hsu, M.-H., Tsai, M.-F., Chen, H.-H., Query expansion with ConceptNet and WordNet: An intrinsic comparison (2006) AIRS 2006. LNCS, 4182, pp. 1-13. , Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.), Springer, Heidelberg; Panchenko, A., (2013) Similarity Measures for Semantic Relation Extraction, , Ph.D. thesis, UCLouvain; Miller, G.A., WordNet: A lexical database for English (1995) Commun. ACM, 38 (11), pp. 39-41; Rubenstein, H., Goodenough, J.B., Contextual correlates of synonymy (1965) Commun. ACM, 8 (10), pp. 627-633; Miller, G.A., Charles, W.G., Contextual correlates of semantic similarity (1991) Lang. Cogn. Processes, 6 (1), pp. 1-28; Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E., Placing search in context: The concept revisited (2001) Proceedings of the 10Th International Conference on World Wide Web, WWW 2001, pp. 406-414. , ACM; Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A., A study on similarity and relatedness using distributional and WordNet-based approaches (2009) Proceedings of Human Language Technologies: The 2009 Annual Conference of The North American Chapter of The Association for Computational Linguistics, NAACL 2009, pp. 19-27. , Association for Computational Linguistics; Gurevych, I., Using the structure of a conceptual network in computing semantic relatedness (2005) IJCNLP 2005. LNCS (LNAI), 3651, pp. 767-778. , Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.), Springer, Heidelberg; Hassan, S., Mihalcea, R., Cross-lingual semantic relatedness using encyclopedic knowledge (2009) Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, 3, pp. 1192-1201. , Association for Computational Linguistics; Postma, M., Vossen, P., What implementation and translation teach us: The case of semantic similarity measures in WordNets (2014) Proceedings of the Seventh Global Wordnet Conference, pp. 133-141; Jin, P., Wu, Y., Semeval-2012 task 4: Evaluating Chinese word similarity (2012) Proceedings of the First Joint Conference on Lexical and Computational Semantics, pp. 374-377. , vol. 1: Proceedings of the Main Conference and the Shared Task, vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, SemEval 2012, Association for Computational Linguistics; Yang, D., Powers, D.M.W., Verb similarity on the taxonomy of WordNet (2006) Proceedings of the Third International Wordnet Conference – GWC 2006, pp. 121-128. , Masaryk University; Meyer, C.M., Gurevych, I., To exhibit is not to loiter: A multilingual, sensedisambiguated wiktionary for measuring verb similarity (2012) Proceedings of COLING 2012: Technical Papers, the COLING 2012 Organizing Committee, pp. 1763-1780; Hill, F., Reichart, R., Korhonen, A., SimLex-999: Evaluating semantic models with (genuine) similarity estimation (2015) Comput. Linguist, 41 (4), pp. 665-695; Bruni, E., Tran, N.K., Baroni, M., Multimodal distributional semantics (2014) J. Artif. Intell. Res, 49 (1), pp. 1-47; Ferraresi, A., Zanchetta, E., Bernardini, S., Baroni, M., Introducing and evaluating ukWaC, a very large web-derived corpus of English (2008) Proceedings of the 4Th Web as Corpus Workshop (WAC-4): Can We Beat Google?, pp. 47-54; Faruqui, M., Dyer, C., Community evaluation and exchange of word vectors at wordvectors.org (2014) Proceedings of the 52Nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 19-24. , Association for Computational Linguistics; Baroni, M., Lenci, A., How we BLESSed distributional semantic evaluation (2011) Proceedings of the GEMS 2011 Workshop on Geometrical Models of Natural Language Semantics, GEMS 2011, pp. 1-10. , Association for Computational Linguistics; Van De Cruys, T., (2010) Mining for Meaning: The Extraction of Lexicosemantic Knowledge from Text, , Ph.D. thesis, University of Groningen; Biemann, C., Riedl, M., Text: Now in 2D! A framework for lexical expansion with contextual similarity (2013) J. Lang. Model, 1 (1), pp. 55-95; Sahlgren, M., (2006) The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations between Words in High-Dimensional Vector Spaces, , Ph.D. thesis, Stockholm University; Griffiths, T.L., Steyvers, M., Prediction and semantic association (2003) Advances in Neural Information Processing Systems, 15, pp. 11-18. , MIT Press; Rapp, R., Zock, M., The CogALex-IV shared task on the lexical access problem (2014) Proceedings of the 4Th Workshop on Cognitive Aspects of the Lexicon (Cogalex), pp. 1-14. , Association for Computational Linguistics and Dublin City University; Kiss, G.R., Armstrong, C., Milroy, R., Piper, J., An associative thesaurus of English and its computer analysis (1973) The Computer and Literary Studies, pp. 153-165. , Edinburgh University Press; Panchenko, A., Loukachevitch, N.V., Ustalov, D., Paperno, D., Meyer, C.M., Konstantinova, N., RUSSE: The first workshop on Russian semantic similarity (2015) Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference “Dialogue”, 2, pp. 89-105; Resnik, P., Using information content to evaluate semantic similarity in a taxonomy (1995) Proceedings of the 14Th International Joint Conference on Artificial Intelligence, IJCAI 1995, 1, pp. 448-453. , Morgan Kaufmann Publishers Inc; Lin, D., An information-theoretic definition of similarity (1998) Proceedings of the Fifteenth International Conference on Machine Learning, ICML 1998, pp. 296-304; Patwardhan, S., Pedersen, T., Using WordNet-based context vectors to estimate the semantic relatedness of concepts (2006) Proceedings of the Workshop on Making Sense of Sense: Bringing Psycholinguistics and Computational Linguistics Together, pp. 1-8. , Association for Computational Linguistics; Zesch, T., Müller, C., Gurevych, I., Using Wiktionary for computing semantic relatedness (2008) Proceedings of the 23Rd National Conference on Artificial Intelligence, AAAI 2008, 2, pp. 861-866. , AAAI Press; Loukachevitch, N.V., Dobrov, B.V., Chetviorkin, I.I., RuThes-Lite, a publicly available version of Thesaurus of Russian language RuThes (2014) Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference “Dialogue”, pp. 340-349; Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J., Distributed representations of words and phrases and their compositionality (2013) Advances in Neural Information Processing Systems, 26, pp. 3111-3119. , Curran Associates, Inc; Arefyev, N., Panchenko, A., Lukanin, A., Lesota, O., Romanov, P., Evaluating three corpus-based semantic similarity systems for Russian (2015) Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference “Dialogue”, 2, pp. 106-118. , RGGU; Lopukhin, K.A., Lopukhina, A.A., Nosyrev, G.V., The impact of different vector space models and supplementary techniques on Russian semantic similarity task (2015) Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference “Dialogue”, 2, pp. 115-127. , RGGU; Korobov, M., Morphological analyzer and generator for Russian and Ukrainian languages (2015) AIST 2015. CCIS, 542, pp. 320-332. , Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.), Springer, Heidelberg; Ustalov, D., A crowdsourcing engine for mechanized labor (2015) Proceedings of the Institute for System Programming, 27 (3), pp. 351-364
Correspondence Address Panchenko, A.; TU DarmstadtGermany; email: panchenko@lt.informatik.tu-darmstadt.de
Editors Loukachevitch N.Panchenko A.Vorontsov K.Labunets V.G.Savchenko A.V.Ignatov D.I.Nikolenko S.I.Khachay M.Y.
Sponsors Exactpro;IT Centre;OK.Ru (Mail.Ru Group)
Publisher Springer Verlag
Conference name 5th International Conference on Analysis of Images, Social Networks and Texts, AIST 2016
Conference date 7 April 2016 through 9 April 2016
Conference code 189269
ISBN 9783319529196
Language of Original Document English
Abbreviated Source Title Commun. Comput. Info. Sci.
Source Scopus