Cardinal Stefan Wyszynski University in Warsaw - Central Authentication System
Strona główna

Algorithms text mining and web crawling

General data

Course ID: WF-R-PS-STAS
Erasmus code / ISCED: 14.4 Kod klasyfikacyjny przedmiotu składa się z trzech do pięciu cyfr, przy czym trzy pierwsze oznaczają klasyfikację dziedziny wg. Listy kodów dziedzin obowiązującej w programie Socrates/Erasmus, czwarta (dotąd na ogół 0) – ewentualne uszczegółowienie informacji o dyscyplinie, piąta – stopień zaawansowania przedmiotu ustalony na podstawie roku studiów, dla którego przedmiot jest przeznaczony. / (0313) Psychology The ISCED (International Standard Classification of Education) code has been designed by UNESCO.
Course title: Algorithms text mining and web crawling
Name in Polish: Seminarium tematyczne: Algorytmy text mining i web crawling
Organizational unit: Institute of Psychology
Course groups: (in Polish) Grupa przedmiotów ogólnouczelnianych - Doktoranci
(in Polish) Przedmioty dla doktorantów psychologii
(in Polish) Seminaria tematyczne z psychologii
ECTS credit allocation (and other scores): (not available) Basic information on ECTS credits allocation principles:
  • the annual hourly workload of the student’s work required to achieve the expected learning outcomes for a given stage is 1500-1800h, corresponding to 60 ECTS;
  • the student’s weekly hourly workload is 45 h;
  • 1 ECTS point corresponds to 25-30 hours of student work needed to achieve the assumed learning outcomes;
  • weekly student workload necessary to achieve the assumed learning outcomes allows to obtain 1.5 ECTS;
  • work required to pass the course, which has been assigned 3 ECTS, constitutes 10% of the semester student load.

view allocation of credits
Language: Polish
Subject level:

intermediate

Learning outcome code/codes:

SD_ PS _W01

SD_ PS _W03

SD_ PS _U02

SD_ PS _U03

SD_ PS _K02

Short description:

The aim of the course is to familiarize PhD students with the method of using Text Mining algorithms and the technique of network searching, the so-called Web Crawling. During the exercises, doctoral students will learn not only about the practical approaches to the use of these methods of data analysis (Text Mining) and data acquisition (Web Crawling), but also learn the technique of combining them with other algorithms.

Full description:

1. Introduction to Text Mining algorithms, basic information about the method.

2. Using algorithms to count the number of words in documents and assigning weights to them: Raw, Inverce Document Frequency - part one

3. Using algorithms to count the number of words in documents and assigning weights to them: Raw, Inverce Document Frequency - part two

4. Presentation of the results by means of Principal Component Analysis.

5. Reporting the results of Principal Components Analysis.

6. Converting the database of words into numerical data.

7. Recoding of numerical data into new numerical variables.

8. Web Crawling - basic information

9. Reporting the results of Web Crawling

10. Combining qualitative data analysis methods using TM algorithms with other algorithms: decision trees.

11. Reporting the results of connecting TM with decision trees

12. Combining qualitative data analysis methods using TM algorithms with other algorithms: Generalized k-means Cluster Analysis.

13. Reporting the results of combining TM with cluster analysis

14. Combining qualitative data analysis methods using TM algorithms with other algorithms: neural networks.

15. Reporting the results of connecting TM with neural networks

Bibliography:

Elder, J., Hill, T., Miner, G., Nisbet, B., Delen, D., & Fast, A. (2012). Practical Text Mining and Statistical Analysis for Nono-structured Text Data Application. Oxford: Elsevier.

Nisbet, R., Elder, J., & Miner, G. (2009). Handbook of statistical analysis and data mining applications. Burlington, MA: Academic Press (Elsevier).

Szymańska, A. (2017). Wykorzystanie algorytmów Text Mining do analizy danych tekstowych w psychologii [Usage of text mining algorithms to analyze textual data in psychology]. Socjolingwistyka, 33, 99–116.

Efekty kształcenia i opis ECTS:

KNOWLEDGE:

- PhD students correctly use the terminology of the Text mining and Web Crawling methods.

SKILLS:

- carry out analysis with the use of Text mining algorithms and principal components analysis as well as search for data using Web Crawling

COMPETENCES:

- correctly interpret the results of the analyzes performed

Description of ECTS credits

Participation in classes: 30 hours

Preparation for classes and preparation of reports, reading literature: 30 hours

Assessment methods and assessment criteria:

The basis for passing the course is the submission of two final reports presenting the results prepared with the use of the Web Crawling and Text Mining methods.

This course is not currently offered.
Course descriptions are protected by copyright.
Copyright by Cardinal Stefan Wyszynski University in Warsaw.
ul. Dewajtis 5,
01-815 Warszawa
tel: +48 22 561 88 00 https://uksw.edu.pl
contact accessibility statement mapa serwisu USOSweb 7.0.4.0-1 (2024-05-13)