About This Course
This course is a continuation of the class TA0105: Text Analytics 101. You'll learn about text analytics at scale: how do modern text analytics systems handle big data? We'll first take you down the history lane and show you why earlier formalisms for text analytics languages suffer from fundamental limitations in both expressivity and runtime performance, leading to scalability, accuracy and usability issues. We will then dive into details on the declarative principles behind the SystemT Information Extraction system and its automatic Optimizer, and show you how it addresses all these limitations.
- Module 1 - Limitations in previous approaches to rule-based IE
- Module 2 - Declarative IE and the SystemT optimizer
- This course is free.
- It is self-paced.
- It can be taken at any time.
- It can be audited as many times as you wish.
Recommended skills prior to taking this course
Yunyao Li joined IBM Almaden Research Center in July 2007 after obtaining her Ph.D degree in Computer Science & Engineering from the University of Michigan in April 2007. Before that, she was a graduate student in the Database Research Group, Department of Electrical Engineering and Computer Science, under the guidance of Professor H. V. Jagadish. Her primary research area is Database Systems and Natural Language Processing.
She is particularly interested in designing, developing and analyzing large scale systems that can improve the accessibility of information for a wide spectrum of users. Her current research towards this direction involves a number of disciplines, most notably natural language processing, databases, human-computer interaction, information retrieval, and machine learning.
Before she started her Ph.D study, she obtained dual-degrees of M.S.E in Computer Science & Engineering and M.S in Information from Computer Science and Engineering and School of Information respectively at the University of Michigan. She went to college at Tsinghua University, Beijing, China, and graduated with dual-degrees of B.E in Automation and B.S in Economics.
You can read about Yunyao's inspiring story from small-town China to Silicon Valley here.
Follow her on Twitter @yunyao_li.
Laura Chiticariu is a Research Staff Member in the Scalable Natural Language Processing group at IBM Research-Almaden. Her primary research is in Database Systems and Natural Language Processing. She is one of the core members of SystemT, a declarative system for specifying NLP algorithms and executing them at scale. Her current research focuses on making information extraction systems transparent and easier to use, utilizing a range of techniques including data provenance, information integration and machine learning. Laura has a Ph.D. in Computer Science from University of California, Santa Cruz, and a B.S. in Computer Engineering with a major in Automation and Industrial Informatics from Politehnica University of Bucharest.
Marina Danilevsky is a Research Staff Member of the SystemT group at IBM Almaden Research Center in San Jose, California. She is interested in data mining, text mining, natural language processing, network ontologies, information networks, and other related areas.
She holds a Ph.D. in Computer Science, awarded in 2014 from the University of Illinois at Urbana-Champaign (UIUC). Her research was in the area of Data Mining, supervised by Professor Jiawei Han. She previously received an M.S. in Computer Science from UIUC in 2011, and a B.S. in Mathematics from the University of Chicago in 2007.
Huaiyu Zhu is a member in the Infrastructure for Intelligent Information Systems group at IBM Research-Almaden. His main research focus is on text analytics, natural language processing, machine learning and statistical information processing.