학위논문 Theses and Dissertations


NO.M.2022.08_10

음성비서의 음성유형과 시각표현이 사용자 경험에 미치는 영향 The influence of voice type and visual representation of voice assistants on user experience

  • Name : 요신양/Yao, Chen Yang
  • Info : 석사학위논문/Master's thesis/ 2022.08
  • Adviser : 김세화/Kim, Se Hwa
192.168.95.160

초록

인공지능 기술의 발달로 인간-컴퓨터 상호작용은 과거 명령어 입출력 방식으로 진행되던 것에서 최근에는 동작이나 음성과 같이 인간의 의사소통 방식과 유사하게 바뀌고 있다. 본래 음성 상호작용은 사람과 사람 사이의 가장 직접적인 의사소통 방식으로 인간 상호작용의 70~80%는 음성 대화를 통해 이루어진다. 대화 시 음성은 정보를 전달하는 운반체가 되고 화자의 성별, 나이, 정서, 건강 상태, 심지어는 직업 등의 정보 단서를 전달해 주기 때문에 사람 간 상호작용에 있어서 원활성을 더한다. 또한 인간-컴퓨터 상호작용에서 컴퓨터의 합성음성도 역시 비언어적 정보를 지니고 있어 수신자인 사람에게도 감지될 수 있다.
음성비서(Voice Assistant)는 인공지능 기술을 기반으로 음성을 매개로 한 인간-컴퓨터 상호 작용 방식이다. 음성합성 기술이 발달함에 따라, 음성의 기술적 표현에서 정서적 만족에 관심이 높아지고 있다. 합성음성의 여러 요소 중 음성의 성별과 피치(pitch)의 높낮이 그리고 말의 빠르기와 크기 등의 특징과 더불어 음성의 시각화는 사용자 경험에 미치는 영향이 강하다. 이 중 본 연구에서는 음성의 성별과 피치 및 시각화 방식을 비교하는 실험연구를 진행하여, 사용자 만족을 높일 음성비서의 음성표현과 시각화 방법을 추천하고자 하였다.
첫 번째 실험에서는, Stern 등(2021)의 연구에서 음성의 피치가 청취자의 선호도, 매력도 등에 영향을 미친다는 연구결과를 바탕으로, 음성비서 음성의 성별과 피치가 사용자 선호도, 매력성, 지각된 설득력, 지각된 외향성, 사회적 현존감에 미치는 영향을 연구하였다. 이 실험에서는 음성합성 프로그램(TTS)을 통해 남성의 평균 피치(120Hz)와 여성의 피치(225Hz)를 중심으로 하고 각각 이보다 높고 낮은 피치를 설정하여, 여섯 가지 자극 음성을 만든 뒤 실험조사를 통해 사용자들이 각 유형의 음성에 대해 평가하도록 하였다. 실험의 결과, 사용자는 중간 피치(235.4Hz)의 여성의 음성(성년 여성의 평균 음높이 225Hz)와 낮은 피치(114.2Hz)의 남성 음성(성년 남성의 평균 음높이 120Hz)에 대해 선호도, 매력성, 지각된 설득력에서 높게 평가하였다. 이때 여성 음성은 남성 음성보다 더 높게 평가되었다. 이러한 결과를 통해 음성비서에 적합한 남성과 여성의 음성의 피치를 제안할 수 있었다.
두 번째 실험에서는, 음성비서의 시각화 표현이 사용자 선호도, 매력성, 설득력, 사회적 현존감, 유쾌성에 미치는 영향을 연구하였다. 실험에서는 첫 번째 실험에서 가장 높게 평가를 받은 음성(남성 114.2Hz, 여성 235.4Hz)을 기준으로 시각화 작업을 진행하여 시각화 표현방법에 대한 비교실험을 진행하였다. 음성비서의 시각화 표현유형으로는 도형, 캐릭터, 사실적 3D, 실물 이미지로 설계하였다. 실험 결과, 대부분 실험참여자는 캐릭터나 실물 이미지로 표현된 음성비서 선호도, 매력성, 사회적 현존감에서 높은 점수를 받아 캐릭터나 실물 이미지로 표현 음성비서에서 선호도가 높은 것으로 나타났다. 이퀄라이저 같은 도형으로 표현된 이미지가 유쾌성에 대해 가장 낮게 평가되었고, 사실적 3D 이미지로 표현된 음성비서도 매력성에서 가장 낮게 평가된 것을 통해 디지털 자체의 인상이 강하게 드는 시각 표현은 사용자들에게 긍정적인 사용자 경험을 형성하지 못하고 있음을 알 수 있다.
이와 같은 음성비서의 시청각적 표현에 관한 연구의 성과는 AI 기술의 발달로 한층 더 지능화된 음성비서 서비스에 걸맞은 사용자 인터페이스 개발에 일조할 수 있을 것이며, 이를 통해 사용자와 음성비서의 상호작용을 증진시켜 사용자에게 더 풍부한 사용 경험을 제공할 수 있을 것이다.

摘要

Abstract

With the development of artificial intelligence technology, methods of human-computer interaction have recently become able to interact with voice, either in initial command code input or in mouse and keyboard input. Originally, voice interaction is the most direct communication method between people, and 70-80% of the information is made through voice conversation. In conversation, the voice becomes a carrier of information and conveys various social clues such as the speaker's gender, age, emotion, psychological talent, health status, and even occupation. This nonverbal information helps us to better understand the purpose of other people's interactions as we interact. In addition, in human-computer interactions, in voice-based interactions, the computer's synthetic voice also carries nonverbal information and can be detected in the recipient person.
Voice Assistant is a human-computer interaction method based on artificial intelligence technology. As speech synthesis technology develops, interest in emotional satisfaction in the technical expression of speech is increasing. Among the many elements of sound, visualization with sound, along with characteristics such as gender and pitch height of synthetic voice, and speed and size of speech, affects the user's experience. Among them, this study attempted to recommend a voice assistant's voice expression and visualization method that can enhance the user experience by conducting an experimental study comparing the gender, pitch, and visualization method of voice.
In the first experiment, based on the results of a study by Stern et al. (2021), the gender and pitch of voice assistant voice on user preference, attractiveness, persuasion, perceived extroversion, and social presence were studied. In this experiment, the average pitch of men (120 Hz) and the pitch of women (225 Hz) were set higher and lower, respectively, to create six stimulating voices, and then to evaluate each type of voice through an experimental survey. As a result of the experiment, users were highly evaluated in preference, attractiveness, and persuasion for female voice (235.4 Hz) at medium pitch (235 Hz) and male voice at low pitch (1144.2) at male voice (120 Hz). And female voice was rated higher than male voice. Through these results, it was possible to propose the voice of men and women suitable for voice secretary.
In the second experiment, Catherine et al. (2018) studied the effects of visualization representations of voice assistants on user preference, attractiveness, persuasion, social presence, and pleasure based on research that visualization of voice can increase users' social presence and further increase user experience satisfaction. In the experiment, visualization work was conducted based on the voice that was evaluated the most in the first experiment, and a comparative experiment on the visualization expression method was conducted. As the type of visualization expression of the voice secretary, it was designed with figures, characters, realistic 3D, and real images. As a result of the experiment, most experimental participants scored high in the preference, attractiveness, and social presence of voice secretaries expressed in characters or real images, indicating high preference for voice secretaries expressed in characters or real images. The image expressed in figures such as equalizer was the lowest for pleasure, and the voice assistant expressed in realistic 3D images was also the lowest for attractiveness, indicating that visual expressions with strong digital impressions did not form a positive user experience for users.
If these research results are developed by applying them to voice and visualization expressions of voice assistants, voice services suitable for user customization, use environment, and use can be provided, thereby forming a more positive user experience for voice assistants.

키워드

  • #음성비서
  • # 성별
  • # 피치
  • # 시각화
  • # 음성비서 이미지
  • # 사용자 경험
  • # 사회적 현존감


  • #Voice assistant
  • # Gender
  • # pitch
  • # image
  • # User experience
  • # Social Presence

참고문헌

[Book]

1. Dr. Martin Schrepp, User Experience Questionnaire Handbook, Version 8, 2019
2. Nass C, Brave S, Wired for speech, How voice activatesand advances the human-computer relationship, MIT Press, Cambridge, 2005
3. Rammohan, Raaghav, VOICE ASSISTANTS, 2022, pp. 46-51

[Dissertation]

1. 하승완, 「사회적 현존감을 높이는 상호작용 사진」, 서강대학교 대학원 국내석사학위논문, 2018
2. Andrea L, Guzman, Imagining the Voice in the Machine: The Ontology of Digital Social Agents, University of Illinois at Chicago, Thesis, 2015
3. Apple W, Streeter LA, Krauss RM, Effects of pitch andspeech rate on personal attributions, J Pers Soc Psychol 37, 1979, pp. 715–727
4. Phil McAleer, Alexander Todorov, Pascal Belin, How Do You Say ‘Hello’? Personality Impressions from Brief Novel Voices, Published: March 12, 2014
5. 牛雷,「助手还是伙伴?智能语音产品的用户偏好研究--基于性别陈规定型视角」,2021, pp.17-19
6. 雷葆华, 「语音用户界面平台的设计与评估」, 2002

[Academic Journal]

1. Arjan Geven, Johann Schrammel,and Manfred Tscheligi, Interact- ing with Embodied Agents That Can See: How Vision-enabled Agents Can Assist in Spatial Tasks, In Proceedings of the 4th Nordic Conference on Human-computer Interaction: Changing Roles (NordiCHI ’06). ACM, 2006, pp. 135–144.
2. B. Borkowska, B. Pawlowski, Female voice frequency in the context of dominance and attractiveness perception, Animal Behaviour, 82, 2011, pp. 55-59
3. baike.baidu.com/item/%E8%AF%B4%E6%9C%8D%E5%8A%9B/1796229
4. Baylor, A.L, The impact of pedagogical agent image on affective outcomes, In Proceedingsof Workshop on Affective Interactions, Intelligent User Interface International Conference.San Diego, CA, 2005
5. Biocca F, Harms C, Gregg J, The networked minds measure of social presence: Pilot test of the factor structure and concurrent validity, In: 4th Annual international workshop on presence, 2001, p. 1–9.
6. Burgoon. J, Bonito. JA, Bengtsson. B, Cederbergb. C, Lundebergc. M, Allspachd. L, a study of credibility, understanding and influence. Computers in Human Behavior, Interactivity in human-computer interaction, 2000, pp,  553–574
7. C.A. Klofstad, R.C. Anderson, S. Nowicki, Perceptions of competence, strength, and age influence voters to select leaders with lower-pitched voices, PloS One, 10, 2015
8. Carroll, J. M. and Thomas, J. C, Fun, SIGCHI Bulletin19 3, 1988, pp. 21-24
9. Catherine S Oh, Jeremy N Bailenson, and Gregory F Welch, A systematic review of social presence: Definition, antecedents, and implications, Frontiers in Robotics and AI 5, 2018, pp. 114
10. Csikszentmihalyi, M, Beyond Boredom and Anxiet, Jossey-Bass, San Francisco, 1975
11. Duffy, BR, Anthropomorphism and the social robot, Robotics and Autonomous Systems, 2003, 42(3–4), pp. 177–190
12. Feinberg, D.R, Are human faces and voices ornaments signaling common underlying cues to mate value? Evol. Anthropol, 17, 2008, pp. 112-118
13. Hassenzahl, M, Platz, A, Burmester, M, & Lehner, K, Hedonic and ergonomic qualityaspects determine a software’s appeal, In Proceedings of the CHI 2000 Conference on HumanFactors in Computing Systems, 2000, pp. 201–208
14. Igbaria, M, Schifflnan, S. J, Wieckowski, T. J, Therespective roles of perceived usefulness and perceivedfun in the acceptance of microcomputer technology, Behaviour & Information Technology 13 6, 1994, pp. 349-361
15. Jodi Forlizzi, John Zimmerman, Vince Mancuso, and Sonya Kwak, How Interface Agents Affect Interaction Between Humans and Computers, In Proceedings of the 2007 Conference on Designing Pleasurable Products and Interfaces (DPPI ’07), 2007, pp. 209–221
16. Jun’ichiro Seyama and Ruth S Nagayama, The uncanny valley: Effect of realism on the impression of artificial human faces, Presence: Teleoperators and virtual environments 16, 4, 2007, pp. 337–351
17. Ko SJ, Judd CM, Blair IV, What the Voice Reveals: Within- and Between-Category Stereotyping on the Basis of Voice, Personality and Social Psychology Bulletin, 2006, pp. 806-819
18. Kwan Min Lee, Presence, Explicated, Communication Theory, Volume 14, Issue 1, 1 February 2004, pp. 27–50
19. Leongómez JD, Mileva VR, Little AC, Roberts SC, Perceived differences in social status between speaker and listener affect the speaker's vocal characteristics, PLOS ONE 12(6): e0179407, 2017
20. medium.com/punchcut/the-state-of-voice-user-experience-48f4a0c3d628
21. Michael Bonfert, Nima Zargham, Florian Saade, Robert Porzel, and Rainer Malaka, An Evaluation of Visual Embodiment forVoice Assistants on Smart Displays, In CUI 2021 - 3rd Conference on Conversational User Interfaces (CUI '21). Association for Computing Machinery, 2021, pp. 1–11
22. Michael Lankes, Regina Bernhaupt, and Manfred Tscheligi, An Experi- mental Setting to Measure Contextual Perception of Embodied Conversational Agents. In Proceedings of the International Conference on Advances in Computer En- tertainment Technology (Salzburg, Austria) (ACE '07). 2007, pp. 56-59
23. Mitchell W.J, Szerszen K.A. Sr, Lu A.S, Schermerhorn P.W, Scheutz M, MacDorman K.F, A mismatch in the human realism of face and voice produces an uncanny valley, I-Perception, 2 (1), 2011, pp. 10-12
24. Montepare JM, Zebrowitz-McArthur L, Perceptions ofadults with childlike voices in two cultures, J Exp Soc Psychol23, 1987, pp. 331–349
25. N. Dahlback, A. Jonsson, L. Ahrenberg, ‘Wizard of Oz Studies- Why and How,’ presented at Intelligent User Interfaces' 93, 1993
26. Nick Yee, Jeremy N Bailenson, and Kathryn Rickertsen, A Meta- analysis of the Impact of the Inclusion and Realism of Human-like Faces on User Experiences in Interfaces, In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’07). 2007, pp. 1–10
27. PAN Xiao-qin, LU Tian-liang, DU Yan-hui, TONG Xin, Overview of Speech Synthesis and Voice Conversion Technology Based on Deep Learning, Computer Science,  Issue (8): 200-208, 2021, pp. 5-6
28. Phil McAleer, Alexander Todorov, Pascal Belin, How Do You Say ‘Hello’? Personality Impressions from Brief Novel Voices, Published: March 12, 2014
29. Rabia Khan and Antonella De Angeli, The attractiveness stereotype in the evaluation of embodied conversational agents, In IFIP Conference on Human- Computer Interaction. Springer, Springer, Heidelberg, Germany, 2009, pp. 85-97
30. Rebecca Cherng-Shiow Chang, Hsi-Peng Lu, Peishan Yang, Stereotypes or golden rules? Exploring likable voice traits of social robots as active aging companions for tech-savvy baby boomers in Taiwan, Computers in Human Behavior, Volume 84, 2018, pp. 194-210
31. Riding, D., Lonsdale, D. & Brown, B, The Effects of Average Fundamental Frequency and Variance of Fundamental Frequency on Male Vocal Attractiveness to Women, J Nonverbal Behav 30, 2006, pp. 55–61
32. Sei Jin Ko, What the Voice Reveals: Within- and Between-Category Stereotyping on the Basis of Voice, 2016, pp-806
33. Sercan Ö. Arık, Mike Chrzanowski,et al, Deep Voice: Real-time Neural Text-to-Speech, Proceedings of the 34th International Conference on Machine Learning, PMLR 70, 2017, pp. 195-204
34. Shenj, Pang R, Weiss R, etal, Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions[C], International Conference on Acoustics, Speech, and Signal Processing. IEEE,:4779G4783, 2018
35. Yi Ren, Yangjun Ruan, et al, FastSpeech: Fast, Robust and Controllable Text to Speech, arXiv:1905.09263, 2019
36. Zanbaka, C., Goolkasian, P., Hodges, L, Can a virtual cat persuade you? The role of genderand realism in speaker persuasiveness, in Proceedings of the SIGCHI conference on HumanFactors in computing systems. ACM: Montreal, Quebec, Canada, 2006
37. Zibrek K, Kokkinara E., McDonnell R, The effect of realistic appearance of virtual characters in immersive environments-does the character’s personality play a role? IEEE Trans Vis Comput Graphics, 24 (4), 2018, pp. 1681-1690
38. Zuckerman M, Miyake K, Elkin CS, Effects of attractiveness and maturity of face and voice on interpersonal impressions, J Res Pers 29, 1995, pp. 253–272
39. Álvaro Hernández-Trapote, Beatriz López-Mencía, David Díaz, Rubén Fernández- Pozo, and Javier Caminero, Embodied Conversational Agents for Voice- Biometric Interfaces, In Proceedings of the 10th International Conference on Mul- timodal Interfaces (Chania, Crete, Greece) (ICMI ’08), 2008, pp. 305–312
40. 박주연, “인터페이스 음성비서와의 상호작용이 사용자의 심리적 경험에 미치는 영향”, (한국HCI 학회 학술대회) 2007, pp. 1640-1647

[Magazine]

1. Ben Mimoun, MS, Poncin, I, Garnier, M,  Case study, Embodied virtual agents: An analysis on reasons for failure, Journal of Retailing and Consumer Services 19(6): 605–612, 2012
2. Coronavirus Lockdown is Upping Voice Assistant Interaction in the UK, Report, 2020
3. D. Berry, R. Driver, Vocal attractiveness and vocal babyishness: effects on stranger, self and friend impressions, Journal of Nonverbal Behavior, 14, 1990, pp. 141-153
4. Hassenzahl, M, The effect of perceived hedonic quality on product appealingnes, International Journal of Human-Computer Interaction 13, 2002, pp. 479-497
5. Julia Stern, Christoph Schild, Benedict C. Jones, Lisa M. DeBruine, Amanda Hahn, David A. Puts, Ingo Zettler, Tobias L. Kordsmeyer, David Feinberg, Dan Zamfir, Lars Penke, Ruben C. Arslan, Do voices carry valid information about a speaker’s personality? Journal of Research in Personality, Volume 92, 2021
6. K.R. Scherer, Personality inference from voice quality: The loud voice of extroversion, European Journal of Social Psychology, 8, 1978, pp. 467-487
7. Nass, C., & Lee, K. M, Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction, Journal of Experimental Psychology: Applied, 7(3), 2001, pp. 171–181
8. Richard E Mayer, C Scott DaPra, An embodiment effect in computer- based learning with animated pedagogical agents, Journal of Experimental Psy- chology: Applied 18, 3, 2012, pp. 239
9. Schrepp, Martin & Hinderks, Andreas & Thomaschewski, Jörg, Design and Evaluation of a Short Version of the User Experience Questionnaire (UEQ-S), International Journal of Interactive Multimedia and Artificial Intelligence, 2017
10. W. Apple, L.A. Streeter, R.M. Krauss, Effects of pitch and speech rate on personal attributions, Journal of Personality and Social Psychology, 37, 1979, pp. 715
11. 陈浩然,孙记明,刘牧寅,「Research on Key Technologies and Service Practices of AI Speech」 中讯邮电咨询设计院有限公司,北京 100048, 2021, pp. 7-9

[Website]

1. “설득력의 정의”, baike.baidu.com/item/%E8%AF%B4%E6%9C%8D%E5%8A%9B/1796229
2. “음성 사용자 경험의 현황”, medium.com/punchcut/the-state-of-voice-user-experience-48 f4a0c3d628
3. “스마트 음성 소비자 조사 분석”, www.cebnet.com.cn/20190311/102555791.html
4. “스마트 음성의 발전의 역사”, www.grandsun.com/xingyeguanzhu/280.html