Eliminating fuzzy duplicates in crowdsourced lexical resources / Kiselev Y., Ustalov D., Porshnev S. // Proceedings of the 8th Global WordNet Conference, GWC 2016. - 2016. - V. , l. . - P. 161-167.

ISSN:
нет данных
Type:
Conference Paper
Abstract:
Collaboratively created lexical resources is a trending approach to creating high quality thesauri in a short time span at a remarkably low price. The key idea is to invite non-expert participants to express and share their knowledge with the aim of constructing a resource. However, this approach tends to be noisy and error-prone, thus making data cleansing a highly topical task to perform. In this paper, we study different techniques for synset deduplication including machineand crowd-based ones. Eventually, we put forward an approach that can solve the deduplication problem fully automatically, with the quality comparable to the expertbased approach.
Author keywords:
Index keywords:
Data cleansing; De duplications; Error prones; High quality; Lexical resources; Time span; Ontology
DOI:
нет данных
Смотреть в Scopus:
https://www.scopus.com/inward/record.uri?eid=2-s2.0-84962832689&partnerID=40&md5=d96f9107244d8e454a92e0411d58ce7a
Соавторы в МНС:
Другие поля
Поле Значение
Link https://www.scopus.com/inward/record.uri?eid=2-s2.0-84962832689&partnerID=40&md5=d96f9107244d8e454a92e0411d58ce7a
Affiliations Yandex, Yekaterinburg, Russian Federation; Ural Federal University, Yekaterinburg, Russian Federation
References Babenko, L.G., (2011) Dictionary of Synonyms of the Russian Language, , AST: Astrel, Moscow, Russia; Bernstein, M.S., Little, G., Miller, R.C., Hartmann, B., Ackerman, M.S., Karger, D.R., Crowell, D., Panovich, K., Soylent: A word processor with a crowd inside (2010) Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology, UIST '10, pp. 313-322. , New York, NY, USA. ACM; Braslavski, P., Ustalov, D., Mukhin, M.Yu., A spinning wheel for YARN: User interface for a crowdsourced thesaurus (2014) Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 101-104. , Gothenburg, Sweden. Association for Computational Linguistics; Deng, J., Krause, J., Fei-Fei, L., Fine-grained crowdsourcing for fine-grained recognition (2013) Computer Vision and Pattern Recognition (CVPR) 2013 IEEE Conference on, pp. 580-587; Guarino, N., Welty, C.A., An overview of on to clean (2009) Handbook on Ontologies, International Handbooks on Information Systems, pp. 201-220. , Steffen Staab and Rudi Studer, editors, Springer Berlin Heidelberg; Kiselev, Y., Krizhanovsky, A., Braslavski, P., Russian lexicographic landscape: A tale of 12 dictionaries (2015) Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference "dialogue", 1, pp. 254-271. , RGGU, Moscow; Loukachevitch, N.V., (2011) Thesauri in Information Retrieval Tasks, , Moscow University Press Moscow Russia; Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J., Introduction to WordNet: An on-line lexical database (1990) Lexicography, 3, pp. 235-244; Pavlick, E., Post, M., Irvine, A., Kachaev, D., Callison-Burch, C., The language demographics of amazon mechanical turk (2014) Transactions of the Association for Computational Linguistics, 2, pp. 79-92; Sagot, B., Fišer, D., Cleaning noisy wordnets (2012) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), , Istanbul, Turkey; Sajous, F., Navarro, E., Gaume, B., Prévot, L., Chudy, Y., Semi-automatic enrichment of crowdsourced synonymy networks: The WISIGOTH system applied to wiktionary (2013) Language Resources and Evaluation, 47 (1), pp. 63-96; Snow, R., O'Connor, B., Jurafsky, D., Ng, A.Y., Cheap and fast-but is it good?: Evaluating non-expert annotations for natural language tasks (2008) Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, pp. 254-263. , Stroudsburg, PA, USA. Association for Computational Linguistics; Ustalov, D., Kiselev, Y., Add-remove- confirm: Crowdsourcing synset cleansing (2015) Application of Information and Communication Technologies (AICT) 2015 IEEE 9th International Conference on, pp. 143-147. , IEEE; Ustalov, D., A crowdsourcing engine for mechanized labor (2015) Proceedings of the Institute for System Programming, 27 (3), pp. 351-364; Wang, J., Kraska, T., Franklin, M.J., Feng, J., CrowdER: Crowdsourcing entity resolution (2012) Proc. VLDB Endow., 5 (11), pp. 1483-1494
Editors Forascu C.Fellbaum C.Forascu C.Vossen P.Mititelu V.B.
Sponsors Oxford University Press;PIM;QATAR Airways
Publisher Global WordNet Association
Conference name 8th Global WordNet Conference, GWC 2016
Conference date 27 January 2016 through 30 January 2016
Conference code 119284
ISBN 9789730207286
Language of Original Document English
Abbreviated Source Title Proc. Glob. WordNet Conf., GWC
Source Scopus