tutorial-SNASEL
3 pages
English

tutorial-SNASEL

-

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
3 pages
English
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

Description

Web Content Mining Václav Snášel, Miloš Kudělka Web page is like a family house. Each part has its sense, determined by a purpose which it serves. Every part can be named so that everybody imagines approximately the same thing under that name (living room, bathroom, lobby, bedroom, kitchen, balcony,…). In order that the inhabitants may orientate well in the house, certain rules are kept. From the point of view of these rules, all houses are similar. That is why it is usually not a problem e.g. for first time visitors to orientate in the house. We can describe the house quite precisely thanks to names. If we add information about a more detailed location such as sizes, colours, equipment and further details to the description, then the future visitor can get an almost perfect notion of what he will see in the house when he comes in for the first time. We can also approach similarly the description of a building other than a family house (school, supermarket, office etc.). Also in this case the same applies for visitors and it is usually not a problem to orientate (of course it does not always have to be the case, as well as bad Web pages there are also bad buildings). Let us look at the problem from the other side. If we visit a building with a blindfolded person, then we can submit basically three tasks. The first is to find out what the purpose of the building is. The second is to find out what parts (e.g. rooms) the building contains and the ...

Informations

Publié par
Nombre de lectures 32
Langue English

Extrait

Web Content Mining Václav Snášel, Miloš Kuděl kaWeb page is like a family house. Each part has its sense, determined by a purpose which it serves. Every part can be named so that everybody imagines approximately the same thing under that name (living room, bathroom, lobby, bedroom, kitchen, balcony,…). In order that the inhabitants may orientate well in the house, certain rules are kept. From the point of view of these rules, all houses are similar. That is why it is usually not a problem e.g. for first time visitors to orientate in the house. We can describe the house quite precisely thanks to names. If we add information about a more detailed location such as sizes, colours, equipment and further details to the description, then the future visitor can get an almost perfect notion of what he will see in the house when he comes in for the first time. We can also approach similarly the description of a building other than a family house (school, supermarket, office etc.). Also in this case the same applies for visitors and it is usually not a problem to orientate (of course it does not always have to be the case, as well as bad Web pages there are also bad buildings). Let us look at the problem from the other side. If we visit a building with a blindfolded person, then we can submit basically three tasks. The first is to find out what the purpose of the building is. The second is to find out what parts (e.g. rooms) the building contains and the third task can be linked e.g. to the equipment of individual rooms. When solving these tasks, it is probably possible to start with any of them. There is another important issue. If the visitor completes some of the tasks and we will require him to describe the result, he will certainly use commonly used names, which describe the type of building, its parts and finally, its equipment. Architect Christopher Alexander brought in a similar and to a certain extent formalized way of description. In our tutorial, we work with a Web page in a similar way. We have shown that this way of looking at a Web page can moreover, be a good tool for the classification of some approaches in the field of Web content mining. Furthermore, in the framework of our own research, we managed to verify experimentally, that it is reasonable to use a Web page description by the named parts of the Web page. This holds true both for the suggestion of methods for page semantics detection and for the technically utilizable user's page description. Proposed duration: 2 hours or halfday Halfday Intended audience: to whom is the tutorial of interestWeb mining, User interface, Prerequisite knowledge: what the attendees should already know Basic knowledge about web page building
Detailed outline Our tutorial is organized in the following way. In the first section, basic principles concerning Web usability are described. On one hand, it is an explanation of how the Web page is perceived by the user and what problems he solves during this task. On the other hand, we will deal with Web page authors, who aim to satisfy the user's needs. In the second section, we will explain what is meant by Web content mining and what typical tasks are dealt with in this area. In the third section of the tutorial, we will explain in detail what is meant by the term intention in relation to Web page content. In this section, we will also introduce a new term: Named Object, as a basic abstraction related to the intention. The fourth section of the tutorial is devoted to the survey of approaches which in some way relate to our view on a Web page. The introduction of this section summarizes the methods serving to the preprocessing of a Web page, namely with regard to nonsupervised and templateindependent methods. In the fifth section, we present our method Pattrio, which is focused on detection of Named Objects. We will describe experiments related to the successfulness of this method's usability and to its results for partial tasks. The last section of the tutorial is devoted to a summary and prospects for further research. The background information on the presenter(s) should be limited to 12 pages and contain: Names, affiliations, homepages and contact details Václav Snášel, Miloš Kudělka VSBTechnical University of Ostrava 708 33 Ostrava, Czech Republic vaclav.snasel@vsb.czwww.cs.vsb.cz/snaselShort biographies Prof. Václav Snášel Ph.D. graduated from the Faculty of Science of the Palacky University, Olomouc Czech Republic in 1981, Ph.D. in Algebra from the Masaryk University Brno in 1991. From 2001 he is a visiting scientist in the Institute of Computer Science, Academy of Sciences of the Czech Republic. From 2003 he is vicedean for Research and Science at Faculty of Electrical Engineering and Computer Science. Snášel has published more than 300 papers on Ontology, Knowledge Management, Data mining, Databases, Multimedia, Information Retrieval, Neural Networks, Data Compression and File Organization. He supervised many Ph.D. students, and Ph.D. students outside Czech Republic (Jordan, Yemen, Slovakia, Vietnam). He is an editorial board of many journals. According to the Erdös Number Project, my Erdös number is 3. He is member of IEEE, ACM, SIAM and AMS.
Information about previous tutorials given by the same presenters (title, location, number of attendees, etc.) GUI Patterns and Web Semantics80. CISIM 2007: Elk, Poland, Discrete data mining, IADIS European Conference on Data Mining 2008, Amsterdam, Holland, 150
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents