– Sponsored by Microsoft LUIS Research Team (01.2016 – 03.2016)

– Planned and conducted usability studies to evaluate  2 versions of UI of LUIS with 9 participants


– Moderated 3 sessions and took notes for 3 sessions

– Analyzed findings and provided suggestions to improve LUIS


– Xuan Liu

– Marina Makarechian

– Qianying Miao

– Yuhui Chen


How can we leverage the power of machine learning in software development without having to know how machine learning works? Microsoft Language Understanding Intelligent Service (LUIS) can use machine learning to help with text and speech processing to make developers’ life easier. With LUIS, any mobile developers can build speech recognition and language understanding features in their applications.

However, LUIS is full of jargon due to the complex technology behind it. Meanwhile, the target users of LUIS are developers who have limited knowledge in machine learning but want to use it for language processing. In such a circumstance, the interaction and workflow of LUIS should be simple and intuitive to make sure that the users can achieve their goals effortlessly.

dev and luis

While LUIS was running on a beta version, the LUIS team had been building a new set of UI based on internal feedback. To improve the UI and UX of LUIS, the LUIS team requested a usability study to evaluate the UI of both versions. Thus, we considered the evaluation of the flow, labeling, navigation, and the overall ease of use as our primary focus. We defined the research questions are the following:

“How intuitive and efficient is the end-to-end application building process?”
“What is the overall ease of use of LUIS?”


Our approach starts with identifying the problems and objectives. First, we created the interaction map to identify the key workflow and tasks in using LUIS. Based on the nature of tasks, we chose the methods and tools to conduct tests and created the data metrics for data collection and evaluation.

Based on the study plan, we began to recruit participants and setup the testing environment. Before jumping into testing, we ran several pilot studies to make sure that the tools are in good condition, the scenario is reasonable, and the data metrics can bring us answer to the research questions.


interaction map

Interaction Map

First, we created the interaction map that describes the workflow of using LUIS to build an application. To get familiar with LUIS, each of us watched the demo video individually and took screenshots when we thought it came to an important task. We printed out all the screenshots and ran an affinity diagram to identify the key tasks and user flow.

Key Tasks

1. Build an application;

2. Train the application;

3. Publish and test the performance;

4. Review the performance.



Before the test, participants were asked to fill out a pre-test questionnaire  to report how many times they had watched the tutorial video and how they found it useful (rated from 1 to 5).


All 9 participants were asked to build a fitness application with LUIS on both the old UI and the new UI. All participants were given a scenario and asked to finish the given tasks. We randomly picked 4 of them to start with the old UI and let the rest started with the new one. By doing so, we could understand how the first impression and first use affect users’ perception of LUIS. After each task, the participants were asked to rate the difficulty level of the task (rated from 1 to 5) and explain the reasons.

After completing the test, they would fill out a post-task questionnaire to evaluate their experience with both interfaces, including the overall ease of use, navigation intuitiveness, their proficiency in the interface, etc.

Data Metrics

We measured participants’ task completion based on the following criteria:


1. Task completion success/failure rate

2. Number of prompts

3. Likert scale rating of user performance satisfaction


1. Satisfaction level based on verbal feedback

2. Comments, suggestions, concerns, quotes gathered during the study and reported in the post-task questionnaire by the participants

3. Body language observations during the study

4. Self-reported effort required to become proficient with an end-to-end application building process (learnability)

Environment and Equipment

All tests were conducted in University of Washington’s Laboratory for Usability Testing and Evaluation (LUTE), where participants used a desktop computer to complete the tasks with Morae Recorder recording during the test. In order to gain more insight into user interface issues, we also used Tobii eye-tracking equipment to collect participant’s eye tracking data.

Participants Recruitment

Since the target users of LUIS are developers who are not expert in ma- chine learning, we chose two requirements as our screening criteria:

1. Students with mobile app development experience or developers who have been working in the field for at least 2 years.

2. Participants should not be an expert in machine learning area.

We used an online questionnaire to find our participants and selected 9 participants corresponded to our criteria. Four of them are professional software developers and 5 are students with software development experiences.


The findings and user profiles are subject to NDA. If you are interested in the details of the research process, feel free to contact me at xuanliu@uw.edu