Tuck professor Lauren Xiaoyuan Lu analyzed a generative AI experiment at Alibaba and found it a powerful tool with some surprising outcomes.
Every day, more than 380 million people visit the e-commerce site Taobao, which is owned by the Chinese company Alibaba. Taobao is akin to the Amazon Marketplace: it’s a place where consumers are connected with millions of merchants. If a customer has a problem with their order, they can contact Taobao’s online customer service center, which first tries to solve it with a chatbot. If that doesn’t work, Taobao routes the customer to a human agent. The website churns through more than 400,000 of these after-sales customer service requests on an average day.
Finding a satisfactory solution to customer service inquiries is crucial to the success of an e-commerce business. Since there is no bricks-and-mortar store for the customer to visit, the human agents are the manifestation of the firm in consumers’ minds. If the agent can’t solve the customer’s problem, the customer may not come back. Scale that dynamic to hundreds of thousands of requests per day and you have a massive challenge to earn the trust and loyalty of shoppers.
Curious to see if generative AI (gen AI) could improve customer service, Taobao in early 2024 conducted a large-scale randomized field experiment. The company gave some of its after-sales agents access to a gen AI assistant, while other agents (the control group) didn’t have access to the assistant. The gen AI assistant would analyze customer order data and previous customer interactions in real time, giving the agent a quick synopsis of the problem in the form of a written message that the agent could send to the customer directly. Simultaneously, the assistant would prepare another message with a proposed solution to resolve the customer issue. The agent has the option to use, modify, or disregard these messages formulated by the gen AI assistant.
Tuck professor Lauren Xiaoyuan Lu investigates the operational drivers of organizational performance in health care, retail, and supply chain settings. She teaches Supply Chain Management in the Tuck MBA program and teaches in Tuck Executive Education.
Tuck professor Lauren Xiaoyuan Lu, along with several colleagues in China, collaborated with Alibaba to analyze the data from this experiment. They report their findings in a new working paper titled “Generative AI in Action: Field Experimental Evidence on Worker Performance in E-Commerce Customer Service Operations.” In analyzing the experiment, Lu and her colleagues sought to address two questions that are now top of mind for academics in many fields, as well as practitioners in retail settings: “How does gen AI influence human agents’ performance in real-world customer service interactions, and are these effects uniform across different types of agents?”
Their first main finding was rather predictable: The deployment of the gen AI assistant improved both the service speed and quality in this setting. This was evident by the reduced amount of time agents spent in customer chats, and an increase in customer satisfaction, which was reported in higher customer ratings and lower dissatisfaction rates.
The rest of the findings were more nuanced, but no less significant. The researchers saw that the gen AI assistant served to augment the human agents rather than replace them. One might imagine that with an automated assistant providing all the answers, the human agents would slack off, but they didn’t. The agents’ typing time was not statistically reduced, and the message volume from agents actually increased. “So, both of these suggest that the engagement level of these agents went up after deployment of the assistant, which was sort of a surprise,” Lu says. Lu plans to further research this phenomenon in future work; for now, her hypothesis is that the gen AI assistant reduced the burden on both the customer and the agent for rote communication, allowing the agent to focus more on the customer’s specific needs.
I do have a concern about the displacement of workers. Firms should ponder how they can re-deploy these workers and upskill them for other value-added services.
— Lauren Xiaoyuan Lu, Professor of Business Administration
Lu’s third main finding was also a bit of a surprise. She found that the gen AI assistant reduces the performance gap between low and high performers, but it was not a unidirectional change. Yes, the gen AI assistant improved the performance of the low performing agents, but it also made the high performing agents a little worse. This is likely explained, Lu says, by the gen AI assistant’s recommendations being less sophisticated than those of the top performing agents. So when those top agents relied on the assistant, it resulted in less optimal solutions. This presents a management challenge for e-commerce firms deploying gen AI. Since the gen AI assistant’s message formulation is heavily influenced by past messages from top performing agents, if these agents dumb down the data by overly relying on the assistant, it could cause the quality of the gen AI assistant’s diagnoses and advice to stagnate. “Firms need to think about how to keep incentivizing their top performers to perform at their original level and even improve,” Lu suggests.
While the results of the Alibaba experiment generally bode well for e-commerce businesses, who can deploy gen AI to improve customer service speed and quality, Lu still senses a looming dark side to this technology when it comes to workers. In the e-commerce world, margins are very thin, and a big portion of a firm’s costs are in logistics, yet customers are expecting faster and faster service. Something’s got to give. Firms already lowered their labor costs by offshoring customer service operations; as gen AI assistants improve, it will be tempting for firms to reduce their workforce even more.
“I do have a concern about the displacement of workers,” Lu says. “Firms should ponder how they can re-deploy these workers and upskill them for other value-added services.”