LMF version of the SenSem Spanish Data Base

Welcome to the UPF Digital Repository

Grup de Recerca Interuniversitari en Aplicacions Lingüístiques (GRIAL); Ana Fernandez Montraveta; Glòria Vázquez; Irene Castellón; Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA). LMF version of the SenSem Spanish Data Base. 2012
http://hdl.handle.net/10230/17088
To cite or link this document: http://hdl.handle.net/10230/17088
dc.contributor.author Grup de Recerca Interuniversitari en Aplicacions Lingüístiques (GRIAL)
dc.contributor.author Ana Fernandez Montraveta
dc.contributor.author Glòria Vázquez
dc.contributor.author Irene Castellón
dc.contributor.author Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
dc.date.accessioned 2012-10-10T07:34:22Z
dc.date.available 2012-10-10T07:34:22Z
dc.date.issued 2012-05-10
dc.identifier.uri http://hdl.handle.net/10230/17088
dc.description This is the LMF version of the SenSem database created by the Spanish Inter-University Research Group GRIAL. As part of SenSem project, a corpus of sentences annotated at the semantic and syntactic levels was created. The source corpus is made up of around 13 million words extracted from the online versions of a Spanish newspaper. From this corpus, 25.000 sentences have been randomly selected, 100 for each of the 250 more frequent verbs in current Spanish. Each sentence has been labeled according to the verb sense it exemplifies, the type of complements it takes (arguments or adjunts), their syntactic category and function, and finally each argument has been labelled with a semantic role. The sentence has also been annotated as to its semantics both in relation with aspectual information and the type of construction being expressed. From this annotated corpus a lexical data base of verbs was created in which all the previous information will be recollected. The unit of description of the verbs is the sense. In the description of the verbs, argument structure is included, incorporating subcategorization patterns, with the information of frequency of them, semantic roles and information regarding sentence semantics. The lexicon and the corpus are associated at sense level and together shape up what we call the data bank of the sentential semantic of the Spanish verbs. Both resources are available via web and will form a very important source of linguistic information which we hope will be of utility in different areas of the natural language processing and linguistic research in general. The LMF conversion has been done by the Universitat Pompeu Fabra.
dc.language.iso spa
dc.publisher Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
dc.rights This resource is licensed under a GNU General Public License version 3.0 (http://www.gnu.org/licenses/gpl.html)
dc.subject language resources, lexical conceptual resource, monolingual lexicon
dc.title LMF version of the SenSem Spanish Data Base
dc.date.modified 2012-08-23T10:22:44Z


See full text

Search


Advanced Search

Browse

My Account

Statistics