Microsoft: Studio9

Transforming a chaotic legacy platform into a streamlined professional audio editing environment through data-driven UX strategy

 

Role | UX Designer

Company | Microsoft

Timeline | 2017

Focus Areas | Voice Technology UX, Enterprise Tool Design, Workflow Optimization

Impact | 60% reduction in task completion time, 45% increase in feature adoption, platform design scaled to support 5+ new AI voice models

 

Project Overview

Studio9 was Microsoft's internal text-to-speech tuning platform that powered voice synthesis across major products including Cortana, Skype, and various third-party applications. Despite serving critical functions for both internal teams and external developers, the three-year-old platform had become increasingly difficult to use and maintain.

I led a complete UX redesign that transformed a chaotic, popup-heavy interface into a streamlined, professional-grade audio editing environment. By combining systematic user research with data analytics from actual platform usage, we achieved a 60% reduction in task completion time while maintaining the advanced capabilities that power users required. The modular design system I created also enabled seamless integration of new AI voice models, supporting Microsoft's expanding speech synthesis capabilities.

 

Background

Microsoft Studio9 served as the text-to-speech tuning platform powering voice synthesis across critical Microsoft products including Cortana, Skype, WeChat integrations, and numerous third-party applications. Using SSML (Speech Synthesis Markup Language) controls, the platform enabled both internal teams and external developers to fine-tune AI-generated speech for natural-sounding audio output.

After three years of continuous use and feature additions, the platform had evolved organically without structured design oversight. Long-term users had developed workarounds for its inefficiencies, while the scattered interface made it increasingly difficult to onboard new team members and external developers.

1st Party Developers 3rd Party Developers
Cortana Global Chitchat
XiaoiCe Chat and Story Teller
Skype
MS WeChat Post
Audio Factory
Tencent News
Audio Book
 

The Challenge

The existing platform suffered from fundamental usability issues that directly impacted developer productivity:

Core Problems

  • Disorganized editing functions with no clear hierarchy or logical grouping

  • Circuitous workflow patterns that forced users through unnecessary steps

  • Limited scalability - the popup-heavy design couldn't accommodate new AI model features

  • Inadequate editing capabilities for complex audio projects requiring precision control

  • Poor discoverability of advanced features buried in nested menus

These issues created a critical tension: the platform was essential for Microsoft's voice AI initiatives, yet its poor usability limited adoption and slowed development cycles. Any redesign needed to satisfy existing power users who had invested years learning the system while making the platform accessible to new developers.

 

Strategic Approach

This project required balancing the needs of long-term power users with the goal of expanding adoption. My approach combined systematic user research with rapid iteration cycles to validate structural changes before diving into specific feature design.

Design Principles

  1. Simplicity: Clear mental models for complex AI processes

  2. Efficiency: Streamlined workflows for frequent tasks

  3. Scalability: Flexible architecture for emerging voice synthesis capabilities

 

Research & Discovery

User Analysis I conducted comprehensive research across Microsoft's global developer community, including engineers working on Cortana integrations, third-party app developers, and audio production teams. The research revealed that 73% of users primarily used just 4 core functions, while advanced features were scattered and hard to discover.

Usage Data Analysis Analyzing platform telemetry data, I identified the most critical user journeys. This data-driven insight became crucial for prioritizing the interface redesign.

 

Design Strategy & Approach

Structural Redesign

Rather than iterating on existing layouts, I advocated for a complete structural overhaul. The platform needed to function more like professional audio software (Pro Tools, Audacity) than a basic web tool.

I proposed a three-phase approach:

  1. ProblemsOverall StructureSpecific Functions

  2. Focus on overall information architecture first

  3. Validate with rapid prototyping and user testing

Key Design Decisions:

  • Eliminated popup-heavy interactions in favor of persistent, contextual panels

  • Created dedicated zones for text input, audio editing, and controls

  • Implemented a component-based design system for future AI model integrations

 

User Testing & Validation

Structural Testing
I ran A/B tests comparing 4 different layout approaches with internal Microsoft developers across different regions. The winning design showed a 40% improvement in task completion time and significantly reduced errors.

Visual System Testing
Interestingly, user interviews revealed that developers preferred a light theme over the expected dark theme for extended editing sessions, contrary to initial assumptions. This reinforced the importance of testing assumptions early.

 

Technical Innovation

AI-Informed Design
Working closely with the speech research team, I designed interface elements that surfaced AI model confidence levels and suggested optimizations. This was an early example of AI-assisted UX design, where the interface adapts based on model performance.

Component Architecture
I created a modular component system that could accommodate new voice synthesis features as Microsoft's AI capabilities evolved. This forward-thinking approach proved valuable as the team integrated new language models and voice personas.

 

Final Design System

Interface Organization:

  • Primary Work Area: Text editor with inline SSML controls

  • Audio Preview: Persistent audio player with waveform visualization

  • Tool Palette: Contextual editing tools organized by function

  • Export Hub: Streamlined output options for different use cases

Interaction Patterns:

  • Real-time preview as users edit SSML markup

  • Keyboard shortcuts for power users

  • Progressive disclosure for advanced features

  • Undo/redo with branching history for complex edits

 

Impact & Results

Immediate Outcomes:

  • 60% reduction in average time to complete common tasks

  • 45% increase in feature discovery for advanced tools

  • 89% positive feedback from existing power users during transition

Long-term Value: The modular design system I created supported the integration of 5 new AI voice models over the following year without requiring major interface changes. The component library became a template for other Microsoft developer tools.

Strategic Learning: This project demonstrated how understanding both user needs and underlying AI capabilities can create interfaces that feel intuitive while handling complex technical processes—a skill that became increasingly valuable as AI tools proliferated across Microsoft's product suite.

 

Studio9 taught me that successful AI product design requires deep collaboration between UX design and research teams. The interface needed to abstract complex speech synthesis parameters while giving developers precise control when needed. This balance between simplicity and power became a recurring theme in my approach to designing AI-powered tools.

The project also reinforced the importance of data-driven design decisions. Usage analytics revealed user behavior patterns that contradicted our initial assumptions, leading to a more effective final design.

 
Previous
Previous

Sina Weibo: Mini Program