Information Retrieval
Information Retrieval (IR) is the study of methods for capturing, representing, storing, organizing,  and retrieving unstructured or loosely structured information. Its most  well known aspect is also known as document retrieval: the process of  indexing and retrieving text documents. However, the field of  Information Retrieval includes almost any type of unstructured or  semi-structured data, including newswire stories, transcribed speech,  email, blogs, images, or even video. When the data consists of material  found on the Web, Information Retrieval is a critical aspect of Web  search engines.
CMPSCI  646 is a graduate-level class in Information Retrieval. It covers the  basic ideas of IR to provide the student with an intuition for how  search engines work, why they're successful, and to some degree how they  fail. The course touches on popular and important approaches to the  problem, providing both historical context as well as state-of-the-art results.
Download Lectures
Download Lectures
Class #  |    Topic  |   
Class canceled on account of graduate orientation.  |   |
1  |    |
2  |    Evaluation basics [pdf].  Please read CMS 8.1, 8.2, and  8.4 and/or MRS 8.1-8.4 beforehand.  (This assignment   was made late, so there  will be no assumption that this material was read in   advance.  But you should if you can.)  |   
3  |    Retrieval models [pdf].  Read CMS 7.1-7.3 and/or MRS 6.3,   11.1-11.3, 12.1-12.2.    |   
4  |    Retrieval models, wrapping up sketch of language modeling,    then onto vector space, including LSI [pdf].  |   
5  |    Retrieval models, binary independence model [pdf].   Also, the first quiz [Q1,pdf].  |   
6  |    Retrieval models, complete probabilistic, also  inference networks   and logic models [pdf].  |   
7  |    Text statistics [pdf].  Read CMS 4.1-4.4 and the rest  of the chapter if it grabs your attention.  MRS 5.1   is also useful but less thorough.  |   
8  |    Guest lecture on relevance models [pdf] by  Niranjan   Balasubramanian.  |   
9  |    Complete text statistics [pdf], talking about estimating weights.   Start talking   about file organization--i.e., some issues involved in how   |   
No class today because the University is running a Monday   schedule.  |   |
10  |    Complete presentation of file organization, specifically   inverted files [pdf].   A pop quiz [Q2,pdf].  Discussion of class projects.    Readings: For CMS,  look at Chapter 5,   particularly 5.3 for inverted lists.  For MRS the inverted list is  introducted in 1.2, built on in 2.3 and 2.4, and wildcards   are in 3.2.  |   
11  |    Compression.  Handed back HW1 (solution [pdf]) Readings:  For CMS, look at Chapter 5, particularly 5.4 for   compression.   For MRS,  compression is 5.4 (though you'll need more   of chapter 5 for background).    |   
12  |    Complete compression [pdf].  |   
13  |    Clustering for IR, largely from a vector space view [pdf]  |   
14  |    Clustering for IR, largely from a language modeling   (including topic modeling)   |   
15  |    Web retrieval basics, including NDCG [pdf].   HW2 [pdf] is due but will be accepted until   Thursday.  |   
16  |    Midterm review with some discussion questions  (but no answers) [pdf].   HW2 due today (here is the solutoin [pdf]).  |   
17  |    In-class, open book midterm exam.  You may bring   electronic devices,  but only if you can   convincingly demonstrate that they cannot access the outside world.  |   
18  |    Optional class.  Project "workshop",   helping groups clean up project  specifications and sort out the details of what needs to   be done.  |   
By 5:00pm, submit a project description (see Project tab   above).   Also prepare a project pitch presentation (one or two   slides) for Tuesday and send the presentation  to the professor before 7:00am tomorrow (Tuesday the 7th).      |   |
19  |    Project pitches.  Send your presentation (if any) to   the professor before 7:00am today.   Also bring a backup presentation mechanism in case it   won't work.   In particular, if you're a Keynote user there may be   issues....  |   
20  |    Guest lecture by Niranjan  overviewing learning to rank [pdf, but note that it's 19Mb].  |   
21  |    MapReduce approach to massively   |   
No class today because  it is Thanksgiving   break.  |   |
22  |    |
23  |    Question answering [pdf].  Draft version of final project   writeup due to professor.  |   
24  |    Final project presentations, part 1 of 2.  (This is   National Computer Science Education Week!)  |   
25  |    Final project presentations, part 2 of 2.  Last   class.   (This is National Computer Science Education Week!)  |   
P2,  P3, and HW3 final submission deadline.  Remember   that you may elect  to skip one of them and get full credit for it (but you   must  explicitly skip it or not bother handing it in).  |   |
Final exam available for pickup.  See   "exams" tab for details.   |   |
Last possible time to pick up final exam and get a full 48   hours to complete it.  |   |
Last possible day and time to hand in final exam.    |   

No comments:
Post a Comment