Machine Learning and Extracting Information from the Web by Tom Mitchell

Date: Thursday, February 22, 2001, 10.00 - 12.00 hours
Location: Universiteit Maastricht, Faculty of general sciences, Tongersestraat 6, Maastricht. Room 3.002 (3rd flooor)
Costs: no charge

Thursday 22nd February, professor Tom Mitchell will give a talk at the Institute for knowledge and agent technology (IKAT) in Maastricht. The title of the talk is `Machine Learning and Extracting Information from the Web'.

Abstract: Today's search engines can retrieve and display over a billion web pages, but their use is limited by the fact that they don't analyze the content of these pages in any depth.

What if these search engines could extract the factual content from the pages they retrieve? Then, instead of asking for pages that contain the keyword "java," we could then ask directly for the facts we are after, such as "What Java programming jobs are available in Palo Alto?," or "Are there any evening courses on Java available in the Palo Alto area during spring 2001?"

This talk will describe research that has resulted in systems that answer these kinds of questions by extracting detailed factual information automatically from millions of web pages. Our approach relies heavily on machine learning algorithms to train the system to find and extract targeted information. For example, in one case we trained our system to find and extract job postings from the web, resulting in the world's largest database of job openings (over 600,000 jobs, see This talk will describe machine learning algorithms for classifying and extracting information from web pages, including results of recent research on using unlabeled data and other kinds of information to improve learning accuracy.

This event is financially supported by BNVKI, the Belgian-Dutch association for artificial intelligence.