31062 - .I.M. a a A 118, 0 a 2 a is - 1 - a - a 3 a - BESOINS LEXICAUX LA LUMIERE DE L'ANALYSE STA'FISTIQUE DU CORPUS DE TEXTES DU PRO JET "BREF" LE LEXIQUE "BDLEX" DU FRANCAIS ECRIT ET ORAL. FERRANE, M. de CALMES, D. CO'lq'O, PECA'ITE, G. PERENNOU. IRIT UniversitY! Paul Sabatier route de Narbonne TOULOUSE Cedex FRANCE ABSTRACT In this paper, we describe lexical needs for By comparison between the vocabulary spoken and written French surface processing, provided (LexBref, composed of 84,900 items, like automatic text correction, speech mainly distinct inflected forms) and the forms recognition and synthesis. generated from BDLEX, we obtain about 62% of known forms, taking in account some We present statistical observations made on acronyms and abbreviations. vocabulary compiled from real texts like articles. These texts have been used for building Then, we approach tile unexpected word recorded speech database called BREF. question looking into the 38% of left tbrms. Developed by the Limsi, within the research Among them we can find numeration, group GDR-PRC CHM (Groupe De Recherche neologisms, foreign words and proper names, Programme de Recherches Concertdes, as well as other acronyms and abbreviations. Communication Homme-Machine -- Research So, to obtain large text covering, lexical Group Concerted Research Program, Man component must take in account all these kinds Machine Communication), this database is of words and must ...