Use customer behavior data and Machine Learning to improve search relevancy

Chao Han • Back to Haystack 2018

Nowadays websites can easily track and store user events such as queries, result clicks and purchases, then how to use this collective behavior to guide us for better search. In this talk, we will walk through several applications of those signal data analysis, from common use cases such as clickstream signal boosting and recommenders, to intelligent NLP tasks such as spell checker, synonym detection, finding phrases and query rewriting. We will also demo how to generate those analytical results and use them to improve search relevancy by a system, which combines the power of a search engine (in our case Apache Solr) with the power of a fast distributed compute engine like Apache Spark, to bring data science into production.

Chao is a data scientist with over 10 years of analytical experience in both academia and industry. She got a PHD in Statistics from Virginia Tech in 2012 (with 8 publications). After graduation, she worked at JPMorgan Chase R&D supporting projects in the areas of transaction text mining, social media sentiment analysis, fraud detection, default prediction and target marketing. She also initiated and lead the ""Robot Modeler"" project to reduce predictive modeling time from months to days. She joined SAS in 2015 to help develop a new platform, which is an in-memory multi-threaded analytic engine that enables fast model implementation calculations on a gridded network. Currently, Chao is the head of R&D at Lucidworks, to help build a new product called Fusion AI with functionalities such as recommendation, query analytics, automatic document clustering and QA system.

Chao Han

Chao is a data scientist with over 10 years of analytical experience in both academia and industry. She got a PHD in Statistics from Virginia Tech in 2012 (dissertation: Bayesian visual analytics for high dimensional data. with 8 publications). After graduation, she worked at JPMorgan Chase R&D supporting projects in the areas of transaction text mining, social media sentiment analysis, fraud detection, default prediction and target marketing. She also initiated and lead the "Robot Modeler" project to reduce predictive modeling time from months to days. She joined SAS in 2015 to help develop a new platform, which is an in-memory multi-threaded analytic engine that enables fast model implementation calculations on a gridded network. Currently, Chao is working at Lucidworks, an enterprise search engine company, to help build a new product called Fusion AI with functionalities such as recommendation, query analytics, automatic document clustering and chatbot.