About This Course
From social media to news articles to machine logs, text data is everywhere. In this class, you'll learn how to do text analytics, also known as Information Extraction: how to extract structured data from text in order to derive valuable insights. You will learn about applications of information extraction in many domains: social media analytics, healthcare analytics, financial risk analysis and many others, and common tasks: extracting entities and relations between them, event extraction and sentiment analysis. You'll then dive into "Declarative Information Extraction", a powerful method to building high-performance and high-quality text analytics, and gain hands-on experience writing your own extractors with a tool called SystemT (available as IBM BigInsights Text Analytics).
- Module 1 - Getting to know Information Extraction (IE)
- Module 2 - Getting to know SystemT
- This course is free.
- It is self-paced.
- It can be taken at any time.
- It can be audited as many times as you wish.
Recommended skills prior to taking this course
Yunyao Li joined IBM Almaden Research Center in July 2007 after obtaining her Ph.D degree in Computer Science & Engineering from the University of Michigan in April 2007. Before that, she was a graduate student in the Database Research Group, Department of Electrical Engineering and Computer Science, under the guidance of Professor H. V. Jagadish. Her primary research area is Database Systems and Natural Language Processing.
She is particularly interested in designing, developing and analyzing large scale systems that can improve the accessibility of information for a wide spectrum of users. Her current research towards this direction involves a number of disciplines, most notably natural language processing, databases, human-computer interaction, information retrieval, and machine learning.
Before she started her Ph.D study, she obtained dual-degrees of M.S.E in Computer Science & Engineering and M.S in Information from Computer Science and Engineering and School of Information respectively at the University of Michigan. She went to college at Tsinghua University, Beijing, China, and graduated with dual-degrees of B.E in Automation and B.S in Economics.
You can read about Yunyao's inspiring story from small-town China to Silicon Valley here.
Follow her on Twitter @yunyao_li.
Laura Chiticariu is a Research Staff Member in the Scalable Natural Language Processing group at IBM Research-Almaden. Her primary research is in Database Systems and Natural Language Processing. She is one of the core members of SystemT, a declarative system for specifying NLP algorithms and executing them at scale. Her current research focuses on making information extraction systems transparent and easier to use, utilizing a range of techniques including data provenance, information integration and machine learning. Laura has a Ph.D. in Computer Science from University of California, Santa Cruz, and a B.S. in Computer Engineering with a major in Automation and Industrial Informatics from Politehnica University of Bucharest.
Marina Danilevsky is a Research Staff Member of the SystemT group at IBM Almaden Research Center in San Jose, California. She is interested in data mining, text mining, natural language processing, network ontologies, information networks, and other related areas.
She holds a Ph.D. in Computer Science, awarded in 2014 from the University of Illinois at Urbana-Champaign (UIUC). Her research was in the area of Data Mining, supervised by Professor Jiawei Han. She previously received an M.S. in Computer Science from UIUC in 2011, and a B.S. in Mathematics from the University of Chicago in 2007.
Huaiyu Zhu is a member in the Infrastructure for Intelligent Information Systems group at IBM Research-Almaden. His main research focus is on text analytics, natural language processing, machine learning and statistical information processing.