Date
|
Speaker
|
Topic
|
Faculty Host
|
11/8/2024
Melcher Hall 365A
11:00 AM - 12:30 PM
|
Behnam Mohammadi
Carnegie Mellon University
|
''Creativity Has Left the Chat: The Price of Debiasing Language Models'' and ''Wait, It’s All Token Noise? Always Has Been:Interpreting LLM Behavior Using Shapley Value''
-
Click to read Abstract
Large Language Models (LLMs) have revolutionized natural language processing but can exhibit biases and may generate toxic content. While alignment techniques like Reinforcement Learning from Human Feedback (RLHF) reduce these issues, their impact on creativity, defined as syntactic and semantic diversity, remains unexplored. We investigate the unintended
consequences of RLHF on the creativity of LLMs through three experiments focusing on the Llama-2 series. Our findings reveal that aligned models exhibit lower entropy in token predictions, form distinct clusters in the embedding space, and gravitate towards ''attractor states'', indicating
limited output diversity. Our findings have significant implications for marketers who rely on LLMs for creative tasks such as copywriting, ad creation, and customer persona generation. The trade-off between consistency and creativity in aligned models should be carefully considered when selecting the appropriate model for a given application. We also discuss the importance of prompt engineering in harnessing the creative potential of base models.
-
The emergence of large language models (LLMs) has opened up exciting possibilities for simulating human behavior and cognitive processes, with potential applications in various domains, including marketing research and consumer behavior analysis. However, the validity of utilizing LLMs as stand-ins for human subjects remains uncertain due to glaring divergences that suggest fundamentally different underlying processes at play and the sensitivity of LLM responses to prompt variations. This paper presents a novel approach based on Shapley values from cooperative game theory to interpret LLM behavior and quantify the relative contribution of each prompt component to the model’s output. Through two applications—a discrete choice experiment and an investigation of cognitive biases—we demonstrate how the Shapley value method can uncover what we term ''token noise'' effects, a phenomenon where LLM decisions are disproportionately influenced by tokens providing minimal informative content. This phenomenon raises concerns about the robustness and generalizability of insights obtained from LLMs in the context of human behavior simulation. Our model-agnostic approach extends its utility to proprietary LLMs, providing a valuable tool for marketers and researchers to strategically optimize prompts and mitigate apparent cognitive biases. Our findings underscore the need for a more nuanced understanding of the factors driving LLM responses before relying on them as substitutes for human subjects in research settings. We emphasize the importance of researchers reporting results conditioned on specific prompt templates and exercising caution when drawing parallels between human behavior and LLMs.
|
Ye Hu
|
11/1/2024
Melcher Hall 365A
12:30 PM - 2:00 PM
|
Kevin Lee
University of Chicago
|
Generative Brand Choice
-
Click to read Abstract
Estimating consumer preferences for new products in the absence of historical data is an important but challenging problem in marketing, especially in product categories where brand is a key driver of choice. In these settings, measurable product attributes do not explain choice patterns well, which makes questions like predicting sales and identifying target markets for a new product intractable. To address this ''new product introduction problem,'' I develop a scalable framework that enriches structural demand models with large language models (LLMs) to predict how consumers would value new brands. After estimating brand preferences from choice data using a structural model, I use an LLM to generate predictions of these brand utilities from text descriptions of the brand and consumer. My main result is that LLMs attain unprecedented performance at predicting preferences for brands excluded from the training sample. Conventional models based on text embeddings return predictions that are essentially uncorrelated with the actual utilities. In comparison, my LLM-based model attains a 30% lower mean squared error and a correlation of 0.52 i.e. for the first time, informative predictions can be made for consumer preferences of new brands. I also show how to combine causal estimates of the price effect obtained via instrumental variables methods with these LLM predictions to enable pricing-related counterfactuals. Combining the powerful generalization abilities of LLMs with principled economic modeling, my framework enables counterfactual predictions that flexibly accommodate consumer heterogeneity and take into account economic effects like substitution by consumers and price adjustments by firms. Consequently, the framework is useful for downstream decisions like optimizing the positioning and pricing of a new product and identifying promising target markets. More broadly, these results illustrate how new kinds of questions can be answered by using the capabilities of modern LLMs to systematically combine the richness of qualitative data with the precision of quantitative data.
|
Ye Hu
|
10/25/2024
|
Fei Teng
Yale University
|
Honest Ratings Aren't Enough: How Rater Mix Variation Impacts Suppliers and Hurts Platforms
-
Click to read Abstract
Customer reviews and ratings are critical for the success of online platforms in that they help consumers makechoices by reducing uncertainty and motivate supplier (worker) incentives. Existing literature has shown that rating systems face problems primarily due to fake or discriminatory reviews. However, customers also differ in their rating styles: some are generous and others are harsh. In this paper, we introduce the novel idea: even if raters are honest and unbiased, differences in the early rater mix (of generous and harsh raters) for a supplier can lead to biased ratings and unfair outcomes for suppliers. This is because platforms display past ratings to customers whose own ratings and acceptance of suppliers are impacted by it and platform uses the past ratings for its prioritization and recommendations. These lead to the path dependence. Using data from a gig-economy platform, we estimate a structural model to analyze how early ratings affect longterm worker ratings and earnings. Our findings reveal that early ratings significantly impact future ratings leading to persistent advantages for early lucky workers and disadvantages for unlucky ones. Further, the use of these ratings in the platform's prioritization algorithms magnify these effects. We propose a neutral adjusted rating metric that can mitigate these effects. Counterfactuals show that using the metric enhances the accuracy of rating systems for customers, fairness in earnings for workers, and better retention of high quality workers for the platform. The resulting supplier turnover can lead to lower quality supplier mix on platforms.
|
Bowen Luo
|
10/18/2024
Melcher Hall 365A
11:00 AM - 12:30 PM
|
Nguyen (Nick) Nguyen
University of Miami
|
DeepAudio: An AI System to Complete the Pipeline of Generating, Selecting, and Targeting Audio Ads
-
Click to read Abstract
Audio advertising is a large industry reporting a billing of $14 billion in 2022 and reaching up to 86.8% of the U.S. population. Reflecting the importance of audio advertising, AI startups are offering marketers generative AI tools to efficiently create multiple audio ads. Also, ad targeting platforms like Spotify can deliver audio ads to targeted audiences. This raises a key question: Which of the numerous ads should marketers launch on the ad targeting platforms? Marketers may rely on conventional methods such as A/B testing or Multi-Armed Bandit to answer this question. However, they are slow and require significant resources, particularly when assessing numerous ad executions. Moreover, online audio platforms such as Spotify or iHeartRadio do not support A/B testing or Multi-Armed Bandit. Given this background, the authors propose DeepAudio, an AI system that integrates insights of behavioral literature on ad likeability with AI algorithms to automatically assess the likeability of audio ads. Benchmarking DeepAudio with different approaches, the authors find that integrating behavioral features into AI systems significantly increases system performance, robustness, and generalizability. By quickly assessing the likeability of multiple audio ads, DeepAudio enables marketers to select the most promising ad executions and fully harness the power of Generative AI. Thus, DeepAudio completes the modern pipeline of generating, selecting, and targeting audio ads.
|
Bowen Luo
|
10/11/2024
Melcher Hall 365A
11:00 AM - 12:30 PM
|
Yanyan Li
USC
|
Understanding Privacy Invasion and Match Value of Targeted Advertising
-
Click to read Abstract
Targeted advertising, advanced by behavioral tracking and data analytics, is now extensively utilized by firms to present relevant information to consumers, potentially enhancing consumer experience and marketing effectiveness. Despite these advantages, targeted advertising has raised significant privacy concerns among consumers and policymakers due to unintended consequences from the extensive collection and use of personal data. Consequently, comprehending the tradeoff between the enhanced match value and privacy concerns is crucial for effective implementation of targeted advertising. In this research, we develop a structural model to empirically analyze this tradeoff, addressing a gap in the literature. We assume consumers form correlated beliefs about privacy invasion and match value from targeted advertising in a Bayesian fashion, and use these beliefs to decide whether to click an ad and whether to opt out of ad tracking. Consumers update their privacy invasion beliefs by considering how each received ad corresponds to their clicked ads and update their match value beliefs by considering how well each ad engages them, and do so jointly due to potential correlation between these two beliefs. Leveraging the Limit Ad Tracking (LAT) policy change with iOS 10 in September 2016, which allows consumers to opt out of ad tracking, we estimate the proposed model using panel ad impression and consumer response data from 166,144 opt-out and 166,144 opt-in consumers, across two months pre and three months post of the policy change. We find that consumers generally have a negative preference for privacy invasion and a positive preference for match value in their clicking decisions, with notable heterogeneity in these preferences. Consumers with higher uncertainty about privacy invasion are more likely to opt out of tracking. Upon opting-out, highly privacy-sensitive consumers (about 20%) experience net benefits, while the majority faces a loss from reduced match value that outweighs their gain from decreased privacy invasion. Through counterfactual analyses, we propose a probabilistic targeting strategy which balances match value and privacy concern, and demonstrate that such privacy-preserving targeting strategy can benefit consumers, advertisers, and the ad network.
|
Sesh Tirunillai
|
10/9/2024
MH 126
11:00 AM - 12:30 PM
|
Maria Giulia Trupia
UCLA
|
''No Time to Buy'': Asking Consumers to Spend Time to Save Money is Perceived as Fairer than Asking Them to Spend Money to Save Time
-
Click to read Abstract
Firms often ask consumers to either spend time to save money (e.g., Lyft’s ''Wait & Save'') or spend money to save time (e.g., Uber’s ''Priority Pickup''). Across six preregistered studies (N = 3,631), including seven reported in the Web Appendix (N = 2,930), we find that asking consumers to spend time to save money is perceived as fairer than asking them to spend money to save time (all else equal), with downstream consequences for word-of-mouth, purchase intentions, willingness-to-pay (WTP), and incentive-compatible choice. This is because spend-time-to-save-money offers reduce concerns about firms' profit-seeking motives, which consumers find aversive and unfair. The effect is thus mediated by inferences about profit-seeking and attenuates when concerns about those motives are less salient (e.g., for non-profits). At the same time, we find that spend-money-to-save-time offers (e.g., expedited shipping) are more common in the marketplace. This research reveals how normatively equivalent trade-offs can nevertheless yield contradictory fairness judgments, with meaningful implications for marketing theory and practice.
|
Melanie Rudd
|
10/4/2024
Melcher Hall 365A
11:00 AM - 12:30 PM
|
Jasmine Y. Yang
Columbia University
|
What Makes For A Good Thumbnail? Video Content Summarization Into A Single Image
-
Click to read Abstract
Thumbnails, reduced-size preview images or clips, have emerged as a pivotal visual cue that helps consumers navigate through video selection while 'previewing' for what to expect in the video. We study how thumbnails, relative to video content, affect viewers' behavior (e.g., views, watchtime, preference match, and engagement). We propose a video mining procedure that decomposes high-dimensional video data into interpretable features (image content, affective emotions, and aesthetics) leveraging computer vision, deep learning, text mining and advanced large language models. Motivated by behavioral theories such as expectation-disconfirmation theory and Loewenstein's theory of curiosity, we construct theory-based measures to evaluate the thumbnail relative to the video content to assess the degree to which the thumbnail is representative of the video. Using both secondary data from YouTube and a novel video streaming platform called ''CTube'' that we build to exogenously randomize thumbnails across videos, we find that aesthetically pleasing thumbnails lead to overall positive outcomes across measures (e.g., views and watchtime). On the other hand, content disconfirmation between the thumbnail and the video leads to opposing effects. It leads to more views, higher watchtime but lower post-video engagement (e.g., likes and comments). To further investigate how thumbnails affect consumers' video choice and watchtime decisions, we build a Bayesian learning model in which consumers' decisions to click on a video and continue watching the video are based on their priors (the thumbnail) and updated beliefs of the video content (the video's frames, characterized as multi-dimensional and correlated video topic proportions). Our results suggest that viewers overall prefer watching videos longer when there is a higher disconfirmation between their initial content beliefs formed based on the thumbnail and updated beliefs based on the observed video scenes (signals), suggesting one role of thumbnails as generating curiosity for what may come next in the video. In addition, viewers prefer less disconfirmation before observing the thumbnail, highlighting the role of disconfirmation may change before and after the thumbnail. Based on the model's estimates, we then run a series of counterfactual analyses to propose optimal thumbnails and compare them with current practices of thumbnail recommendation to guide creators and platforms in thumbnail selection.
|
Sesh Tirunillai
|
9/27/2024
Melcher Hall 365A
11:00 AM - 12:30 PM
|
Hangcheng Zhao
University of Pennsylvania
|
Algorithmic Collusion of Pricing and Advertising on E-commerce Platforms
-
Click to read Abstract
Firms have been adopting AI learning algorithms to automatically set product prices and advertising auction bids on e-commerce platforms. When firms compete using such algorithms, one concern is that of tacit collusion—the algorithms learn to settle on higher than competitive prices which increase firm profits, but hurt consumers. We empirically investigate the impact of competing reinforcement learning algorithms to determine if they are always harmful to consumers, in a setting where firms learn to make two-dimensional decisions on pricing and advertising together. Our analysis uses a multi-agent reinforcement learning implementation of the Q-learning algorithm, which we calibrate to estimates from a large-scale dataset collected from Amazon.com. We find that learning algorithms can facilitate win-win-win outcomes that are beneficial for consumers, sellers, and even the platform when consumers have high search costs, i.e., the algorithms learn to collude on lower than competitive prices. The intuition is that algorithms learn to coordinate on lower bids, which lowers advertising costs, leading to lower prices for consumers and enlarging the demand on the platform. We collect and analyze a large-scale high-frequency keyword product search dataset from Amazon.com and estimate consumer search costs. We provide policy guidance by identifying product markets with higher consumer search costs that could benefit from tacit collusion, and markets where regulation on algorithmic pricing might be most needed. Further, we show that even if the platform responds strategically by adjusting the ad auction reserve price or the sales commission rate, the beneficial outcomes for both sellers and consumers are likely to persist.
|
Sam Hui
|
9/20/2024
Melcher Hall 365A
11:00 AM - 12:30 PM
|
Kyeongbin (KB) Kim
Emory University
|
Generative Multi-Task Learning for Customer Base Analysis: Evidence from 1,001 Companies
-
Click to read Abstract
Modeling the activity of a customer base is an inherently multi-objective problem. It requires understanding how many customers will be acquired over time, how many repeat purchases they will make, and how much they will spend per purchase (''upstream'' objectives), as well as how these behaviors come together into cohort- and company-level sales (''downstream'' objectives). There are many other empirical settings as well in which applied researchers face with problems that are similarly modeling multiple customer behavioral processes in nature. This paper introduces a flexible and adaptable unified generative multi-task learning approach tailored for panel data, the customer-based multi-task transformer (CBMT), designed to jointly project upstream and downstream outcomes. Our methodology balances the trade-off between generalization and specialization by leveraging commonalities across upstream customer behavioral processes through shared layers while enabling each objective tailored predictions via task-specific layers. By employing a multi-objective loss function that explicitly incorporates downstream objectives as auxiliary tasks, we obtain the best of both worlds: accurate predictions for each upstream outcomes while also ensuring strong goodness of fit in terms of the downstream outcomes. We validate the model on a one-of-a-kind dataset of 1,001 companies, over a 37-month observation period. Our model significantly outperforms existing approaches, exceeding six benchmark models in predicting company-level revenue by 34% to 65% over a temporal holdout period. We provide insight into the drivers of this outperformance and under what conditions the proposed model performs relatively better or worse through an ablation study and a performance heterogeneity analysis as a function of various contextual factors.
|
Sam Hui
|
9/13/2024
MH 365A
11:00 AM - 12:30 PM
|
Arpit Agrawal
University of Houston
|
So Near Yet So Far: The Unexpected Role Of Near Misses In Salesperson Turnover
|
|
9/6/2024
Melcher Hall 365A
11:00 AM - 12:30 PM
|
Aprajita Gautam
University of Texas at Austin
|
Product Perfectionism: Defining and Measuring Consumers' Tendency to Hold Uncompromisingly High Expectations from Possessions and Consumption Experiences
-
Click to read Abstract
Perfectionist tendencies have been on the rise in recent years. In this paper, we conceptualize and define a specific type of this tendency, called ''product perfectionism,'' and situate it within a broader nomological network that includes trait perfectionism, entitlement, materialism, and
maximizing. We construct an eight-item Product Perfectionism Scale, which we use to predict consumption behaviors across the three stages of a typical consumer's journey: acquisition,
consumption, and disposal (studies 1–7). We find that consumers higher (vs. lower) on product perfectionism are more susceptible to set-fit effects (study 1), attracted to brands with personalities associated more (vs. less) with perfection (study 2), and willing to pay more for newer (vs. older) products (study 3). We also find that they derive lower enjoyment from less-than-perfect consumption experiences (study 4), are more attracted to product upgrades (study 5), replace both perishable and non-perishable goods faster for smaller flaws (study 6), and are more likely to dispose of and are reluctant to repair broken possessions (study 7). We conclude the paper with a discussion of the theoretical and substantive implications of our findings.
|
Melanie Rudd
|