Algorithms text mining and web crawling
General data
Course ID: | WF-R-PS-STAS |
Erasmus code / ISCED: |
14.4
|
Course title: | Algorithms text mining and web crawling |
Name in Polish: | Seminarium tematyczne: Algorytmy text mining i web crawling |
Organizational unit: | Institute of Psychology |
Course groups: |
(in Polish) Grupa przedmiotów ogólnouczelnianych - Doktoranci (in Polish) Przedmioty dla doktorantów psychologii (in Polish) Seminaria tematyczne z psychologii |
ECTS credit allocation (and other scores): |
(not available)
|
Language: | Polish |
Subject level: | intermediate |
Learning outcome code/codes: | SD_ PS _W01 SD_ PS _W03 SD_ PS _U02 SD_ PS _U03 SD_ PS _K02 |
Short description: |
The aim of the course is to familiarize PhD students with the method of using Text Mining algorithms and the technique of network searching, the so-called Web Crawling. During the exercises, doctoral students will learn not only about the practical approaches to the use of these methods of data analysis (Text Mining) and data acquisition (Web Crawling), but also learn the technique of combining them with other algorithms. |
Full description: |
1. Introduction to Text Mining algorithms, basic information about the method. 2. Using algorithms to count the number of words in documents and assigning weights to them: Raw, Inverce Document Frequency - part one 3. Using algorithms to count the number of words in documents and assigning weights to them: Raw, Inverce Document Frequency - part two 4. Presentation of the results by means of Principal Component Analysis. 5. Reporting the results of Principal Components Analysis. 6. Converting the database of words into numerical data. 7. Recoding of numerical data into new numerical variables. 8. Web Crawling - basic information 9. Reporting the results of Web Crawling 10. Combining qualitative data analysis methods using TM algorithms with other algorithms: decision trees. 11. Reporting the results of connecting TM with decision trees 12. Combining qualitative data analysis methods using TM algorithms with other algorithms: Generalized k-means Cluster Analysis. 13. Reporting the results of combining TM with cluster analysis 14. Combining qualitative data analysis methods using TM algorithms with other algorithms: neural networks. 15. Reporting the results of connecting TM with neural networks |
Bibliography: |
Elder, J., Hill, T., Miner, G., Nisbet, B., Delen, D., & Fast, A. (2012). Practical Text Mining and Statistical Analysis for Nono-structured Text Data Application. Oxford: Elsevier. Nisbet, R., Elder, J., & Miner, G. (2009). Handbook of statistical analysis and data mining applications. Burlington, MA: Academic Press (Elsevier). Szymańska, A. (2017). Wykorzystanie algorytmów Text Mining do analizy danych tekstowych w psychologii [Usage of text mining algorithms to analyze textual data in psychology]. Socjolingwistyka, 33, 99–116. |
Efekty kształcenia i opis ECTS: |
KNOWLEDGE: - PhD students correctly use the terminology of the Text mining and Web Crawling methods. SKILLS: - carry out analysis with the use of Text mining algorithms and principal components analysis as well as search for data using Web Crawling COMPETENCES: - correctly interpret the results of the analyzes performed Description of ECTS credits Participation in classes: 30 hours Preparation for classes and preparation of reports, reading literature: 30 hours |
Assessment methods and assessment criteria: |
The basis for passing the course is the submission of two final reports presenting the results prepared with the use of the Web Crawling and Text Mining methods. |
Copyright by Cardinal Stefan Wyszynski University in Warsaw.