2 Literature Review
2.1 Introduction
Several recent occurrences and technical developments have highlighted wisdom’s value in navigating the VUCA (Volatility, Uncertainty, Complexity, and Ambiguity) (Whiteman, 1998) environment marked by fast change, unanticipated difficulties, and complex interdependencies. The well-being of our world is under threat from significant challenges - September 11 attacks and conflicts in the Middle East pose a severe challenge to global security. Likewise, recurring pandemics (such as SARS) or financial crises such as the one in 2008, undermine various aspects, including socio-economic-political systems. Not only that but instances where nature unleashes its wrath upon us include natural disasters (like the Tohoku earthquake and the subsequent tsunamis), exposing weaknesses in our preparedness for dealing with them. In the meantime, social movements such as Occupy Wall Street or Black Lives Matter, along with political upheavals like Brexit, contribute to increasing societal polarization and discontent.
Rapid technological advancement is another aspect of this VUCA environment. It has altered how we live, work, and communicate. Machine Learning techniques, in particular deep learning, have allowed us to analyze vast amounts of data, leading to advances in various fields (Nguyen et al., 2019). Smartphones and social media platforms have revolutionized communication, while the IoT and cloud computing have initiated new opportunities and challenges in privacy and security (Stergiou, Psannis, Gupta, & Ishibashi, 2018). The development of blockchain technology has the potential to reshape businesses (Y. Chen & Bellavitis, 2020), and the rise of renewable energy and autonomous vehicles are crucial steps toward a more sustainable future (Chehri & Mouftah, 2019).
In this context, the importance of wisdom in navigating the VUCA environment cannot be overstated (Brooks, 2021). In front of complexity and uncertainty, wisdom brings discretion, clarity, and understanding (McKenna, 2013). Leadership has been studied for centuries to identify the key characteristics that make a leader successful. Nonetheless, in recent years, there has been a growing pursuit in the contemporary trends in the field of leadership (Bennis, 2007). Wisdom leadership, an old concept but a relatively new trend (Bachmann, Sasse, & Habisch, 2018), focuses on the idea that successful leaders possess unique qualities beyond traditional leadership attributes such as intelligence and charisma.
Cambridge dictionary of philosophy defined wisdom as “an understanding of the highest principles of things that functions as a guide for living a truly exemplary human life” (Audi, 1995). However, upon delving deeper into wisdom’s definitions, its multifaceted complexity becomes apparent. Descriptions of wisdom are most likely based on individuals’ unique experiences, including the types of complex problems and best-perceived solutions they have encountered (Staudinger & Glück, 2011). Wisdom involves identifying our knowledge’s limits (Taranto, 1989) and the future’s unpredictability (Ardelt, 2004b). It encourages humility, curiosity, and a commitment to learning from our mistakes and the experiences of others (Edmondson, 2011). In a world characterized by volatility, uncertainty, complexity, and ambiguity, wisdom is essential for individuals, organizations, and societies to thrive and overcome challenges.
In an effort to contribute to the field of wisdom leadership research, this study seeks to shed light on the dimensions of wisdom leadership and the qualities of wise leaders. To do this, in this chapter, we review the literature on text analysis techniques and their application, while also exploring the literature on wisdom and its relationship with leadership. Through the examination of various text-mining strategies, such as topic modeling and sentiment analysis, and the study of significant articles and books published by leading scholars in the field, we delve into the dimensions of wisdom and its evolution over time. The research underscores that wisdom is neither a fixed nor unique condition. Instead, it is viewed as a multi-tool kit, a collection of behaviors, skills, and principles that enable individuals to deliberate on life’s important aspects. Subsequently translating those considerations into choices and actions that simultaneously enhance their own well-being and that of others.
2.2 The Origin and History of Wisdom
According to “Online Etymology Dictionary” (n.d.), the Germanic languages frequently utilized the composite term “wisdom” in their vocabulary. The word’s initial component, “wis,” is a derivative of the adjective “wise” and denotes “learned, sagacious, cunning, sane, prudent, discreet, experienced, and having the power of discerning and judging rightly.” The word’s second component, “-dom,” is an Old English suffix that is used to denote quality, state, or condition.
People have tried to pass on their knowledge to future generations throughout human history through various storytelling techniques, such as tales, poetry, pictographs, and paintings. It demonstrates how wisdom has been cherished and passed down for centuries and is considered a crucial part of human civilization. So, the idea of wisdom is not brand-new and has been around for a while. The Sumerians are believed to have first accepted the idea of wisdom around 2,500 BCE (Mitchell, Knight, & Pachana, 2017).
Even while wisdom has long been a part of human culture, defining it has proven to be complicated. Wisdom has been an ongoing topic of philosophical debate, yet a precise and concise definition has been elusive. Psychologists have recently started empirically investigating wisdom to pinpoint the psychological traits underpinning wise behavior (Bergsma & Ardelt, 2012). The following section combines psychological and philosophical concepts to offer a multifaceted definition of wisdom.
2.3 The Interdisciplinary Evolution of Wisdom Research: From Philosophy to Psychology
Some knowledge management academics have defined wisdom simply in terms of the “Knowledge Hierarchy” or “Knowledge Pyramid,” also referred to as the “Data, Information, Knowledge, and Wisdom Hierarchy (Ackoff, 1989 ; Rowley, 2007),” the DIKW chain. However, the concept of wisdom appears to be more nuanced and expansive than this.
Scholars from various disciplines have investigated and studied wisdom throughout history (McKenna, 2013). These academic fields are neuroscience, sociology, theology, and education, with philosophy and psychology being notable contributors. The philosophical perspective on wisdom examines it as a virtue, knowledge, or way of life. In contrast, psychologists investigate it as expert knowledge (Ardelt, 2004a ; Baltes & Kunzmann, 2004), a personality type (Ardelt, Pridgen, & Nutter-Pridgen, 2019), or a cognitive and emotional process (Bluck & Glück, 2005). The expansion in thought due to this multidisciplinary approach increases our grasp on how wisdom plays into social relationships, decision-making (W. S. Brown, 2005), and well-being (Kunzmann & Baltes, 2005).
2.3.1 Philosophy: Tracing Wisdom from Ancient Thinkers to Contemporary Perspectives
The word “philosophy” is derived from the combination of two Greek words: “philo” (\(\phi\acute{\iota}\lambda o\)), meaning “love,” and “sophia” (\(\sigma o\phi\acute{\iota}\alpha\)) , meaning “wisdom” or “knowledge.” When combined, “philosophy” can be understood as “the love of wisdom” or “the pursuit of knowledge.” Unsurprisingly, it has a long history of being associated with wisdom (Kenny, 2012). Durant (1961) stated that “Science without philosophy, facts without perspective and valuation, cannot save us from havoc and despair. Science gives us knowledge, but only philosophy can give us wisdom.” Leading thinkers like Plato, Aristotle, Confucius, and the Stoics had their own conceptions of wisdom and guided how to lead morally upright lives. In his Nicomachean Ethics, Aristotle (Ross, Aristotle, & Brown, 2009) introduced the idea of “practical wisdom, emphasizing the value of moral and intellectual growth. Contrarily, Confucius strongly stressed forging close friendships and leading a morally upright life (Yuan, Chia, & Gosling, 2023).
For 600–700 years, Western philosophy was dominated by Plato and his students’ writing (Sternberg & Jordan, 2005). According to Sternberg & Jordan (2005), Plato, Socrates’ disciple, had the same belief that wisdom was theoretical, abstract, and the preserve of a select few. In his work, Nicomachean Ethics (Ross et al., 2009), Aristotle describes the idea of practical wisdom. According to him, traits like loyalty, truthfulness, gentleness, fairness, friendliness, self-control, courage, and generosity must be learned and developed. Aristotle referred to these qualities as “excellences” or “virtues.” Still, he thought that “practical wisdom,” which is the ability to decide how to behave in particular situations with particular people, is the most crucial virtue. Because none of the other qualities can be effectively used without it, he thought that a lifetime of moral and intellectual growth might lead to wisdom.
In Nicomachean Ethics VI.6, he identified two types of wisdom: metaphysical wisdom and practical wisdom. He wrote that Phronèsis is “a true and reasoned state of capacity to act with regard to the things that are good or bad for man.” Hughes (2001) asserts that Aristotle distinguished between intellectual and moral virtues by highlighting two essential intellectual qualities. He refers to “Sophia” as the capacity to think clearly on scientific topics and “Phronèsis” or “practical wisdom” as the practice of doing so. While the DIKW hierarchy does not exactly translate into Aristotle’s divisions between Sophia and Phronesis, we may argue that Sophia correlates more closely to the knowledge chain and appears more in line with the upper stage of it. The DIKW hierarchy is a simplistic and linear paradigm, but Aristotle’s Phronèsis notion is more complicated and interrelated.
McKenna (2013) described “practical wisdom” as the capacity to behave morally under pressure. Later, Vaccarezza & De Caro (2021) collected fresh interpretations of the centuries-old virtue of Phronèsis, highlighting practical wisdom’s significance in moral philosophy.
Other viewpoints can be derived from Meacham (1983). He discussed the nature of wisdom and knowledge, especially in Socratic thoughts. Socrates believed that genuine wisdom derived from knowing one’s own limitations. His statement, “what I do not know, I don’t think I do,” implies that admitting one’s ignorance is a type of wisdom. Knowledge inevitably creates uncertainty since knowing something means questioning it. In a classical concept of knowledge, where knowing and not knowing are mutually incompatible, this may appear inconsistent. The confluence of knowledge and doubt is not a problem but rather an essential component of wisdom (Meacham, 1983).
Another important school of thought, with its own notion of wisdom, began to flourish in another corner of the planet, not at the same time in ancient history. Confucius, the son of a low government official in China, began to lay the groundwork for a very practical curriculum of public behavior that is still used today as a guide to wisdom (Hall, 2011). Confucius underlined practical wisdom’s value and believed it comes from leading a morally upright life and developing healthy interpersonal interactions (Yuan, 2013). Confucianism defined ethics in terms of five fundamental qualities, or the “five constants.” According to Yuan et al. (2023), it is composed of “benevolence and compassion (ren), righteousness (yi), ritual propriety (li), trustworthiness (xin) and wisdom (zhi).”
2.3.2 Psychology: The Multifaceted Nature of Wisdom
Since the publication of Clayton and Birren’s study in 1980, the psychology of wisdom has advanced significantly (McKenna, 2013). The Berlin School, Robert Sternberg’s methodology, and US Positivists are only a few of the distinct “schools” of wisdom study that have since arisen (McKenna, 2013). Each of these schools has contributed significantly to our knowledge of wisdom by providing distinctive viewpoints and models.
Wisdom, according to the Berlin School, is “expert knowledge in the fundamental pragmatics of life that permits exceptional insight, judgment, and advice about complex and uncertain matters” (Baltes & Staudinger, 2000). The Berlin school holds that wisdom is said to be involved with life’s crucial and challenging issues and associates five facets: “rich factual knowledge, rich procedural knowledge, lifespan contextualism, relativism of values, and awareness and management of uncertainty (Pasupathi, Staudinger, & Baltes, 2001).”
Renowned wisdom researcher Robert Sternberg created the Balance Theory of Wisdom and strongly emphasizes values as an integral part of wisdom. According to Sternberg & Glück (2022), wisdom involves applying practical intelligence or tacit knowledge mediated by values aimed at realizing a common good through a balance of different interests and perspectives; “a person is wise to the extent that they use their skills and knowledge to (1) achieve a common good, by (2) balancing intrapersonal (their own), interpersonal (others’), and extrapersonal (larger) interests over (3) the long term as well as the short term, through (4) the utilization of positive ethical values, by (5) adapting to, shaping, and selecting environments (Sternberg & Glück, 2022).”
Sternberg & Glück (2022) state three types of practical action. One can adapt to the environment as it is, sometimes recognizing that it is unsatisfactory but that there is nothing better available. Alternatively, one can choose to alter the setting for the betterment of both themselves and others. That also aligns with studies of Law & Staudinger (2016) that “Wisdom is heavily dependent on an understanding of the good life as involving self-transcendence and a concern for the good of others”. Then again, one can decide that the environment in which they live does not promote a widespread benefit and then look for another one. Sometimes situations are unchangeable, and one must choose whether to stay or leave.
Based on these concepts, Karami et al. (2020) carried out a systematic evaluation of 50 papers from the psychology, management, leadership, and education domains to look for areas of agreement among conceptions of wisdom. They put out the Polyhedron Model of Wisdom (PMW), which includes several elements, such as “the adequate use of knowledge, intelligence and creativity, self-regulation, openness and tolerance, altruism and moral maturity, and sound judgment to solve critical problems.”
Igor Grossmann and his associates proposed a single model of wisdom (Grossmann et al., 2020), which incorporates components of earlier models. According to Grossmann et al. (2020), wisdom is “morally-grounded excellence in social-cognitive processing” (Grossmann et al., 2020). While excellence in social-cognitive processing requires considering various contexts, perspectives, short- and long-term effects, thinking reflectively and dialectically, and being aware of limitations and subjectivity of thought, morally grounded wisdom balances self-interest with others, values truth, and cares for humanity. Additionally, Sternberg & Karami (2021) tried to develop a thorough and organized model of the so-called 6Ps: purpose, environmental/situational pressures, problems requiring wisdom, traits of wise people, psychological processes, and products of wisdom (Sternberg & Karami, 2021).
The study of wisdom has evolved over time, with contemporary moral philosophy and psychology engaging in an interdisciplinary dialogue that has brought new insights into the concept of practical wisdom. This dialogue, as Vaccarezza & De Caro (2021) point out, has been fueled by the resurgence of virtue ethics and its recent engagement with psychology. Despite the extensive discussions on the topic, they argue that there is still room for further exploration, primarily due to “the resurgence of virtue ethics within contemporary moral philosophy and its recent dialogue with psychology (Vaccarezza & De Caro, 2021).”
In conclusion, the understanding of wisdom has advanced via the shift from philosophical research to psychological analysis. The interdisciplinary discussion between philosophy, psychology, and other areas has shed light on the value of wisdom in our lives. To help us understand this lasting quality as we seek wisdom in our continuously changing environment, ongoing collaboration across various disciplines is essential.
2.4 Understanding Wisdom Leadership: Origins, Principles, and Applications
Numerous scholars have investigated the concept of wisdom leadership and the characteristics of wise leaders, contributing to our understanding in diverse ways. Sternberg & Karami (2021), Riggio, Zhu, Reina, & Maroosis (2010), Grossmann & Brienza (2018), Ardelt (2004a), Nonaka & Takeuchi (2019), and McKenna (2013) all discussed the multifaceted concept of wisdom and especially “practical wisdom,” which involves adapting to new situations, solving complex problems, and balancing competing interests while maintaining one’s values. Practical wisdom has its roots in the concept of phronèsis which was explained before.
Nonaka & Takeuchi (2019) explained that wisdom is cultivated through practice, transforming knowledge into wisdom. They argue that wisdom persists through generations and is essential for businesses, communities, and society’s sustainability. Adams (2007) identified wisdom leadership as a key component of success in fast-paced and complex environments. Schwartz & Sharpe (2010) stated that wise leaders balance short-term goals with long-term values and engage in ethical decision-making while inspiring and empowering others.
McKenna, Rooney, & Boal (2009) presented five principles of wisdom that can be used to measure and evaluate wise leadership. The principles include the use of reason and careful observation, allowing for non-rational and subjective elements when making decisions, valuing humane and virtuous outcomes, being practical and oriented towards everyday life, being articulate, understanding the aesthetic dimension of work, and seeking intrinsic personal and social rewards.
Grossmann & Brienza (2018) proposed that wise reasoning, distinct from intelligence, helps solve societal problems. Sternberg’s WICS (Wisdom, Intelligence, and Creativity Synthesized) model (Sternberg, 2003) emphasizes the integration of practical, creative, and analytical abilities and the application of wisdom in decision-making. He also identified five patterns among unwise leaders. According to the model, effective leaders are those who can integrate these different components, such as the ability to communicate effectively, build relationships, and inspire others in a balanced way, depending on the specific situation they are facing (Sternberg, 2003).
In their studies, Nonaka & Takeuchi (2011) showed that practical wisdom is crucial for prudent judgments and actions. They outlined six abilities of wise leaders, highlighting the importance of practical wisdom such as decision-making for the good of the organization and society, perception and understanding of people, things, and events quickly, creating contexts for meaningful interactions, using metaphors and stories to convey tacit knowledge, employing political power to mobilize action, and mentoring and cultivating practical wisdom in others.
“Prudence” is the actual translation from the word “Phronèsis” that Aristotle used to describe the ability to find the balance between two extremes and make the appropriate decision that both minimizes harm and maximizes the good. Riggio et al. (2010) argued that prudence is essential for ethical leadership. Rooney, Küpers, Pauleen, & Zhuravleva (2021) proposed a model that helps leaders develop skills and habits needed for wise decision-making in complex and uncertain environments.
To make informed decisions, managers must understand a company’s purpose and pursue the common good to ensure sustainability. Japanese companies practicing “Wise Capitalism,” as described by Nonaka & Takeuchi (2011), prioritize the common good, create both economic and societal value, and align with social entrepreneurship principles. This approach represents a shift away from the narrow, profit-maximizing view of capitalism and towards a more holistic and sustainable model of business that recognizes the interdependence of business and society. Rocha & Pinheiro (2021) explored the role of business education in addressing gaps in leaders’ awareness of organizational practical wisdom through 23 interviews. The research emphasizes the importance of Phronèsis in creating wise organizations.
Rooney & McKenna (2022) propose a unique application of wisdom in practical leadership scenarios, suggesting that wisdom operates both as a personal attribute and a collective duty. It is crucial to consider one’s part in preventing the projection of toxic leaders into influential roles within commercial or political spheres. Such harmful leaders frequently overemphasize their achievements and abilities. However, the responsibility for their placement in power principally lies with others. Regardless of one’s ability to embody the characteristics of a wise leader, one can play a crucial role in shaping an environment that discourages the rise of toxic leaders. This environment promises to yield the common good.
Various researchers have developed tests to measure wisdom, such as Ardelt (2003) and Webster (2003), emphasizing the importance of reflection and understanding life’s dialectical and uncertain nature as fundamental to wisdom. Ardelt (2003) described wisdom as a set of personality attributes that allow people to see things from other people’s viewpoints, overcome prejudices and blindspots, learn from life, and care for people. She suggested wisdom involves three personality dimensions incorporating cognitive, affective, and reflective(3D-WS). Ardelt (2003) defined the cognitive dimension as obtaining knowledge and its application, critical thinking, and the capacity to make informed judgments through problem-solving and decision-making. Compassion, empathy, and emotional control are examples of affective traits that allow a wise person to show concern and sensitivity for the well-being of others. Finally, the reflective component stresses self-awareness, introspection, and the ability to learn from previous experiences, which includes knowledge of one’s opinions, beliefs, and values. In his Self-Assessed Wisdom Scale (SAWS), Webster (2003) recognized five aspects of wisdom. It includes “Experience, Emotional Regulation, Reminiscence and Reflectiveness, Openness, and Humor.” He has developed a Self-Assessed Wisdom Scale (SAWS). Both researchers highlight the significance of critical life experiences and reflection in developing wisdom.
2.5 Wisdom: A Mysterious and Multidimensional Concept
As mentioned, the idea of wisdom has endured throughout human history, drawing scholarly interest from many fields, including philosophy and psychology. A universal agreement on the definition of wisdom is elusive despite much investigation. There are two explanations for this. First, the constraints of individual insight necessitate more interdisciplinary collaboration. Second, the complex nature and inherent subjectivity of wisdom.
2.5.1 Limitations of Individual Insight and the Need for Interdisciplinary Collaboration
Rumi’s parable of the elephant in the dark chamber is a helpful metaphor for examining the difficulties associated with the concept of wisdom and the limitations of personal insight.
In Rumi’s metaphor, numerous men touch various portions of an elephant as they try to explain it in a dark space. Because each man’s description is constrained by his or her own perception and background, they only partially understand the elephant’s true character. This narrative highlights the importance of embracing openness to diverse points of view to develop a more thorough awareness of the world by showing how individual opinions and experiences can limit one’s sense of reality.
When this metaphor is applied to the study of wisdom, It becomes clear that the different viewpoints of academics from various fields may be the reason for the lack of a generally agreed-upon definition. Each academic discipline may give its own distinctive insights and knowledge of wisdom, much like the elephant in the room, but a thorough understanding necessitates the confluence of diverse viewpoints. The introduction of light into the dark room signifies the potential of different types of perspectives and research methodologies. If light - in the form of new theories, interdisciplinary studies, or innovative analytical tools - were introduced, the true form of the elephant could be perceived. Scholars may advance a more comprehensive and nuanced understanding of wisdom by encouraging interdisciplinary collaboration and discussion, transcending the constraints imposed by unique experiences and academic boundaries.
2.5.2 The Complexity and Subjectivity of Wisdom
Marcel’s view on the distinction between problems and mysteries is discussed in his book “The Mystery of Being” (Le Mystère de l’être), which was originally published in French in 1951 (Knepper, 2020). In this book, Marcel argues that there are two distinct ways of approaching reality: a problem to be solved through scientific inquiry or a mystery to be encountered through personal reflection and existential questioning.
Marcel believed that problems are situations that can be solved through rational inquiry, whereas mysteries are situations that resist rational investigation and can only be understood through intuition and personal experience. Problems are situations that we can solve through our own efforts and resources, while mysteries are situations that require us to look beyond ourselves and rely on something greater than ourselves. Marcel argued that recognizing the difference between problems and mysteries is crucial for understanding our place in the world and cultivating a sense of wonder and awe.
The lack of a clear and concise definition of wisdom could be due to a combination of factors, including the complexity of the concept and the subjective nature of its interpretation. While there have been numerous attempts to define and understand wisdom, it is possible that some aspects of wisdom may be inherently mysterious or elusive.
Gabriel Marcel’s distinction between problems and mysteries suggests that certain facets of the human experience could be inherently enigmatic and not entirely amenable to logical explanation. Even if this could be the case for some parts of wisdom, further study and multidisciplinary discussion may eventually result in a more precise understanding and more thorough description of wisdom.
By examining commencement speeches from successful individuals, this study aims at exploring wisdom in leadership. It acknowledges the difficulty in defining wisdom, as shown by Rumi’s parable and Marcel’s distinction between problems and mysteries. The research uses text-mining tools to find characteristics of smart leadership that are present in many different professions. The study adds to a more varied and nuanced understanding of wisdom by bridging various points of view. The study emphasizes the significance of multidisciplinary collaboration in generating a more thorough knowledge of wisdom while acknowledging the inherent subjectivity in defining wisdom. The study aims add to our understanding of wisdom and its role in effective leadership by examining transcripts of commencement addresses given by successful people. We increase our appreciation for the complexity of wisdom as we continue to discern between issues and mysteries. This fosters curiosity that leads us to a deeper comprehension of the idea.
In conclusion, the notion of wisdom is a broad and complicated construct that has been investigated in numerous branches of literature. It has cognitive, moral, and interpersonal components that represent the complexities of human thought and action. To summarize all of the numerous definitions presented above, we may cluster them into three major groups:
- Personal (Intrapersonal) Dimensions:
- Expert knowledge acquisition, critical thinking, and awareness of ignorance
- Integration of intelligence and creativity
- Dialectical thinking and reflection
- Integration of rational and non-rational elements
- Emotional control
- Relational (Interpersonal) Dimensions:
- Empathy, compassion
- Communication, storytelling, and humor
- Adaptability and environmental shaping
- Ethical and Contextual Dimensions:
- Moral grounding
- Focus on the common good
- Long-term and short-term balance
- Contextualism and relativism of values
These clusters emphasized the varied character of wisdom and the significance of integrating distinct components with practical knowledge, moral grounding, and interpersonal skills in order to promote the common good and navigate complicated circumstances. Thus far, we have examined the notion of wisdom, but in order to analyze our dataset, which contains transcripts of speeches delivered by notable individuals, we must enter an entirely new realm. The next part describes the text mining techniques and algorithms that will aid us in our research.
2.6 Text Analysis
Due to the exponential expansion of digital data, it is crucial for academics and organizations to create techniques for effectively deriving insightful information from massive volumes of text data. A powerful method for analyzing and processing unstructured textual data to uncover relevant patterns and correlations is text mining, a branch of natural language processing (NLP). Topic modeling, a collection of primarily unsupervised algorithms that seek to reveal the hidden theme organization contained within a corpus of texts, is an essential component of text mining. To better grasp the underlying themes and attitudes, these strategies have been used in a variety of contexts, including news stories (K. Chen et al., 2023 ; Rajasundari, Subathra, & Kumar, 2017), scholarly publications (Y. Wang, Bowers, & Fikis, 2017 ; Glazkova, 2021), consumer evaluations (F. Wang, Yang, Tso, & Li, 2019), behavioral marketing (Dan, 2023), and political speeches (Müller-Hansen et al., 2021 ; Atiq, Abeed, Efat, & Momin, 2022).
This section of the literature review’s main goal is to examine the volume of text mining research that has already been done, with an emphasis on topic modeling and the discovery of themes in particular. Some prominent topic modeling techniques, such as Latent Dirichlet Allocation (LDA), Top2Vec, and Structural Topic Model (STM), will be discussed in this study, along with their guiding principles, advantages, and disadvantages. The review will also go over text representation and preprocessing methods that are essential for successful topic modeling and look at the relationships between text mining findings and wisdom leadership aspects. This literature review intends to shed light on the possibilities of text mining and topic modeling approaches in enhancing the knowledge of wisdom leadership via the analysis of speeches by combining the findings from prior research.
2.7 Text Analysis vs. Text Mining: Convergence and Divergence
Text mining and text analysis have distinct roots but are converging as their methods and applications overlap. Text analysis has its origins in the social sciences and humanities, dating back long before the concept of computers. It involves the systematic analysis of word use patterns in texts, combining formal statistical methods with humanistic interpretive techniques. Major approaches to text analysis include Frame Analysis (Goffman, 1974), Grounded Theory Methodology (Strauss & Corbin, 1994), Discourse Analysis (Johnstone, 2017), (Qualitative) Content Analysis (Mayring, 2014), and conversation analysis (Sacks, 1992). These techniques establish procedures for analyzing textual data in the social sciences and may also fall into the area of qualitative data analysis (QDA).
Text mining, on the other hand, originates from computer science and involves information retrieval, natural language processing, part-of-speech tagging, syntactic parsing, named entity recognition (NER), and sentiment analysis. Knowledge discovery from text (KDT) or text mining was initially described by Feldman and Dagan (Girju, 2002). At the core of text mining lies natural language processing. Machine learning (ML) and natural language processing (NLP) algorithms are utilized in these techniques to automatically extract insights, patterns, and relationships from large volumes of unstructured text data (Moreno & Redondo, 2016). Applications of these techniques span various domains, such as business, healthcare, social media, and education (Deng & Liu, 2018).
2.8 Text Mining: A General Overview
2.8.1 Definition and Key Concepts
Although several researchers have given different explanations of text mining, Hotho, Nürnberger, & Paaß (2005) defined it clearly and concisely as “the application of algorithms and methods from the fields machine learning and statistics to texts with the goal of finding useful patterns.”
2.8.2 Common Text Mining Tasks and Techniques
In text mining, there are several typical tasks and techniques, including but not restricted to:
Text preprocessing is the first stage of text mining in the bag-of-words approach (Harris, 1954), which entails preparing raw text data for subsequent analysis by cleaning and putting it into a structured format. Tokenization, stopword elimination, stemming, and lemmatization are a few examples of this (Nayak, Kanive, Chandavekar, & Balasubramani, 2016). All of which can be done in various ways and algorithms (Vijayarani, Ilamathi, Nithya, et al., 2015).
Sentiment analysis identifies the emotions present in a text and classifies them as positive, negative, or neutral. In many circumstances, Sentiment analysis helps understand public opinion, client feedback, or general mood (Medhat, Hassan, & Korashy, 2014).
Named entity recognition (NER) is the process of locating and categorizing named entities—such as individuals, groups, places, and dates—in texts. NER can assist in the extraction of structured data from unstructured data (Goyal, Gupta, & Kumar, 2018).
Text classification is the process of classifying texts into predetermined groups or labels based on their content (Kobayashi, Mol, Berkers, Kismihok, & Den Hartog, 2018). Supervised machine learning techniques, including Naive Bayes, Support Vector Machines, Nearest Neighbor, Decision Tree Induction, Centroid/ Association based Classifications, and Neural Network deep learning strategies, are frequently used for text categorization (Desai et al., 2015).
Discovering latent topics or themes within a collection of texts is the goal of the mostly unsupervised learning approach known as “topic modeling” (Kherwa & Bansal, 2019). Popular topic modeling techniques include Latent Dirichlet Allocation (LDA) (Blei, Ng, & Jordan, 2003), Top2Vec (Angelov, 2020), and Structural Topic Model (STM) (Roberts, Stewart, & Tingley, 2019).
Text mining is a flexible and effective technology that may be used in a variety of fields to unearth important insights and guide decision-making. Because of developments in NLP, machine learning, and artificial intelligence, text mining techniques and approaches are continually changing.
Most researchers do not have the time to read thousands of pages. Reading alone is deemed too vulnerable to biased judgments to be considered scientifically reliable since it relies on the subjective interpretation of texts (Mayring, 2014). As mentioned above, while the debate between social researchers (text analysis) and computer scientists using (text mining) continues, there has been an increasing adaptation of text mining tools for social scientists’ research. Two continuous trends have altered the way social scientists do qualitative data analysis: first, the rapid rise of the 5Vs (Volume, Velocity, Variety, Veracity, and Value) of big data (Anuradha et al., 2015) in digital text worthy of investigation, and second, the advancement of text analysis technology that enables a range of tools and new capacities (Wiedemann & Wiedemann, 2016). Some researchers want to narrow the gap between these techniques by addressing theoretical, methodological, and empirical elements of topic modeling and researching strategies to enhance the creation and deployment of these models in understanding social phenomena (Doogan, 2022).
It is critical to evaluate the preprocessing and text representation procedures involved in preparing the textual data for analysis and knowing the different topic modeling strategies. The quality of preprocessing and text representation can have a great influence on topic model performance and interpretability. In the next section, we will go through some of the most prevalent preprocessing and text representation approaches available in the literature.
2.9 Preprocessing Techniques
Text preprocessing is a vital step in text-mining because it transforms raw, unstructured data into a format that can be effectively analyzed and interpreted. It involves various processes like tokenization, removal of stop words and punctuation, and stemming or lemmatization. Without this crucial step, the analysis may be tainted by irrelevant information or overwhelmed by data variability, thereby negatively affecting the accuracy of subsequent tasks like sentiment analysis or topic modeling.
2.9.1 Tokenization
The practice of breaking down a text into separate words or tokens is known as tokenization (Vijayarani, Janani, et al., 2016). Because it converts unstructured text into a more organized format, this is frequently the initial step in preparing text data for analysis. Tokenization methods include Lucene Analyzer, Byte Pair Encoding (BPE), whitespace-based, rule-based, and more complex algorithms that employ machine learning models or language-specific resources (Rai & Borah, 2021). The type of tokenization used might be influenced by the unique properties of the text data as well as the desired level of granularity (Boyd-Graber, Mimno, & Newman, 2014).
Although some researchers (Rahmoun & Elberrichi, 2007) classify n-grams and skip n-grams (Goodman, 2001) under text representation techniques, they may be better considered an extension of tokenization because they represent continuous sequences of n-words in a given text (Žižka, Dařena, & Svoboda, 2019). While simple tokenization frequently entails breaking text into individual words (unigrams), n-grams can aid in the capture of local context and word order information by generating tokens comprising two or more consecutive terms. Bigrams (n=2), for example, represent pairs of successive words, trigrams (n=3) represent three-word sequences, and so on. The following example shows how the sentence “This is an example of tokenization.” can be tokenized.
## [1] "Unigram:"
## # A tibble: 6 × 1
## word
## <chr>
## 1 this
## 2 is
## 3 an
## 4 example
## 5 of
## 6 tokenization
## [1] "Bigram:"
## # A tibble: 5 × 1
## bigram
## <chr>
## 1 this is
## 2 is an
## 3 an example
## 4 example of
## 5 of tokenization
## [1] "Trigram:"
## # A tibble: 4 × 1
## trigram
## <chr>
## 1 this is an
## 2 is an example
## 3 an example of
## 4 example of tokenization
Text preprocessing activities that rely on word order or contextual information, such as sentiment analysis, machine translation, or language modeling, can benefit from using n-grams and even combining n-grams (Dashtipour et al., 2019). However, as the value of n and the window size of skip grams rises, so does the number of potential n-grams and the size of the feature space, which can result in increasing computational complexity and memory needs (Li, Wang, Stolfo, & Herzog, 2005). As a result, selecting a suitable n value is critical based on the work type and available resources.
2.9.2 Stemming and Lemmatization
To shrink words to their base or root form, stemming and lemmatization processes are performed. This can aid in improving the consistency of word representations and reducing data dimensionality. Stemming leads to deleting word affixes (such as suffixes or prefixes), whereas lemmatization uses morphological analysis and linguistic resources, such as dictionaries or morphological databases, to turn words into their base forms. The decision between stemming and lemmatization is determined by the language’s complexity and the resources available for analysis. While both lemmatization and stemming can be useful methods for reducing words to their basic form, it is not suggested to utilize both procedures simultaneously since they may yield contradictory or duplicate findings. The following example shows the results of stemming and lemmatization for the words “running”, “ran”, “jumps”, “leaves”, “better”.
## Stemmed words: 'run' 'ran' 'jump' 'leav' 'better'
## Lemmatized words: 'run' 'run' 'jump' 'leave' 'good'
Schütze, Manning, & Raghavan (2008) compare and contrast stemming and lemmatization in the context of information retrieval. The authors point out that stemming is faster but less accurate than lemmatization and that the decision between the two approaches is dependent on the application’s unique demands (Schoemaker et al., 2018).
By considering the context and grammatical structure of the word, lemmatization is often more precise than stemming. As a result, lemmatization can yield more relevant and interpretable findings (Balakrishnan & Lloyd-Yemoh, 2014), which is especially crucial for applications like topic modeling and sentiment analysis (May, Cotterell, & Van Durme, 2016).
Stemming, on the other hand, is often faster and easier than lemmatization since it simply requires the application of a set of predetermined rules (Schütze et al., 2008). However, stemming can occasionally yield mistakes - over-stemming and under-stemming - (Jivani et al., 2011) because it does not examine the context or grammatical structure of the word. As a result, the decision between lemmatization and stemming is determined by the job or application’s unique needs. Lemmatization may be better suited in some circumstances since it generates more accurate and interpretable results. In other cases, stemming may be preferable because of its speed and simplicity.
2.9.3 Stopwords Removal
Stopwords are common words with minimal significance that can be eliminated from the text in order to decrease noise and computational complexity. Stopwords include articles, prepositions, and conjunctions (for example, “the,” “and,” and “in”) (Asghar, Khan, Ahmad, & Kundi, 2014). Stopword removal can be done with predetermined lists, frequency-based approaches, or more complex systems that take into account word distribution over the full document collection. The individual qualities of the data and the required level of text filtering might impact the choice of stopword removal strategy (Kaur & Buttar, 2018).
2.9.4 Text Representation Techniques
Text representation approaches are ways of converting text input into a structured format that machine learning algorithms can readily understand and interpret. These approaches are critical for computers to be able to deal with human language and accomplish tasks like topic modeling, sentiment analysis, and information retrieval. Here are a few examples of typical text representation techniques:
The Bag-of-Words (BoW) concept, initially described by Harris (1954), is one typical approach. This approach depicts text as an unordered set of words that ignores syntax and word order while tracking word frequency. Each document is represented as a vector in feature space, with dimensions matching vocabulary words.
Term Frequency-Inverse Document Frequency (TF-IDF) (Jones, 1972; Jones, 1973 ; Jones, 2004) is a text analysis technique used for document comparison, text categorization, and information retrieval. It uses statistical assumptions to determine the significance of a token in a corpus. The method is divided into two parts: Term Frequency (TF) and Inverse Document Frequency (IDF). The term frequency (TF) reflects how frequently a token appears in a text in comparison to all other tokens. IDF counts the number of times a token appears in a corpus of documents. The product of TF and IDF yields a score or weight for each token, evaluating its importance in relation to other tokens. The TF-IDF principle states that the more frequently a token occurs, the more essential it is, and the less frequently a token appears, the more important it is as well. This aids in the identification of key tokens inside a corpus document, with TF representing the first half of a Bag-of-Words method and TF-IDF determining which tokens describe an individual paper.
Word Vectors, also known as Word Embeddings, are dense vector representations that encapsulate semantic meaning and word relationships within a continuous mathematical space (Salim & Mustafa, 2022). These representations have become crucial in the field of NLP, mainly due to the pioneering work of researchers such as Tomas Mikolov (Lopez-Martinez & Sierra, 2020).
Mikolov, Sutskever, Chen, Corrado, & Dean (2013) introduced Word2Vec, a technique that ignited a wave of research and development in word embeddings. The approach involves training shallow neural networks on large text corpora to yield word vectors that capture complex semantic and syntactic relationships. Following Word2Vec’s success, Mikolov, along with Le, further extended this idea with the development of Paragraph2Vec (Le & Mikolov, 2014), which focused on learning representations for larger pieces of text, like sentences and paragraphs.
Tomáš Mikolov’s contribution to the field did not stop there. Mikolov also played a significant role in the development of FastText (Joulin et al., 2016). Like Word2Vec, FastText is designed to capture the syntactic and semantic meaning of words, but with several improvements. It also considers the internal structure of words, allowing it to better handle rare words and languages with complex morphology. Another notable method for generating word embeddings is GloVe (Global Vectors for Word Representation), developed by Pennington, Socher, & Manning (2014).
These models have significantly contributed to the development and refinement of word embedding techniques. They continue to play a substantial role in the way we represent and understand textual data in machine learning.
2.10 Topic Modeling
Topic modeling emerged in the 1980s as a part of the “generative probabilistic modeling” field (Liu, Tang, Dong, Yao, & Zhou, 2016). Researchers often use topic models as a time-saving and unbiased alternative to reading extensive text collections in various domains (Jelodar et al., 2019). Latent Dirichlet Allocation (LDA), Top2Vec, and Structural Topic Model (STM) are the most widely used models for topic modeling. Because of its interpretability, topic modeling has found enormous applications in a variety of areas (Abdelrazek, Eid, Gawish, Medhat, & Hassan, 2022). Topic models see texts as a “bag of words,” recording word co-occurrences independent of syntax, narrative, or placement within a text. A subject may be regarded as a group of words that often appear in a discussion and hence co-occur more frequently than they would otherwise, anytime the issue is discussed. The TF-IDF reduction strategy was the first approach designed for this goal.
2.10.1 Latent Dirichlet Allocation (LDA)
Blei et al. (2003) established Latent Dirichlet Allocation (LDA) as an unsupervised topic modeling approach. LDA is a probabilistic generative model that seeks to uncover latent themes within a set of texts. The essential premise of LDA is that texts contain mixes of subjects, and each topic is defined by a distribution of words.
The number of topics in LDA is fixed, and the technique uses the Dirichlet distribution (Minka, 2000) to estimate the topic-word distribution and document-topic distribution. As a consequence, each topic is represented by a list of words with associated probabilities (Blei et al., 2003).
Zvornicanin (2022) provided a more extensive but simple description of how LDA works, which may be found here:
Initialization: LDA asks you to specify how many topics (K) you wish to extract from the texts. The model begins by allocating each word in the texts to one of the K subjects at random. Topic allocations are haphazard at this point and do not reflect relevant themes.
Iterative Process: The LDA model then iteratively refines the topic assignments using Gibbs sampling or variational inference. The topic allocations for each word are updated using two probabilities:
Document-Topic probability (P(topic | document)): The likelihood of a topic based on the document. This reflects the percentage of words in a document that are related to a specific topic.
Topic-Word probability (P(word | topic)): The likelihood of a particular word given the topic. This is the percentage of a certain term in a subject in all documents in the corpus.
Convergence: The model iteratively updates the topic assignments based on the previously stated probabilities until the assignments become stable and the model converges. A specified threshold for the change in the maximum number of repeats (iterations) is often used to establish the convergence criteria.
Topic Extraction: Once the model has converged, the resultant topic assignments offer information on the words that comprise each subject as well as the topic distributions for each document. Each subject is represented by a collection of words and their associated probabilities, with higher probability indicating phrases that are more important to the topic. These high-probability terms are then used to interpret the themes.
Topic Assignment: LDA gives a probability distribution of topics for each document. The proportion of words in the document that pertain to each topic is shown by this distribution, and the document may be allocated to the topic with the highest proportion.
It is vital to remember that LDA is an unsupervised learning approach (Blei, 2012), which means that it does not learn subjects using labeled data. Instead, it finds subjects entirely based on word patterns and co-occurrences in the texts. As a result, the quality and interpretability of the retrieved themes are significantly dependent on parameter selection (Griffiths & Steyvers, 2004), text preparation, and the structure of the underlying data (Blei et al., 2003).
It is critical to understand the difference between Linear Discriminant Analysis and Latent Dirichlet Allocation. Despite having the same abbreviation, Linear Discriminant Analysis (LDA) and Latent Dirichlet Allocation (LDA) are two completely different approaches utilized in separate fields of research. In a larger statistical framework, Linear Discriminant Analysis is employed for class separation or dimensionality reduction (Balakrishnama & Ganapathiraju, 1998).
2.10.2 Top2Vec
Top2Vec is another unsupervised topic modeling approach proposed by Angelov (2020) that makes use of the capabilities of contemporary word embedding and density-based clustering techniques. Top2Vec begins by creating a dense document and word embeddings via methods such as Word2Vec (Mikolov et al., 2013), Doc2Vec (Le & Mikolov, 2014), or BERT (Bidirectional Encoder Representations from Transformers) (Devlin, Chang, Lee, & Toutanova, 2018). The HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) (Campello, Moulavi, & Sander, 2013) technique is then used to cluster the dense document embeddings based on their semantic similarity.
The resultant clusters indicate subjects, with each topic defined by the terms that are most semantically comparable to it. Top2Vec is remarkable for its ability to estimate the ideal number of topics based on the data’s intrinsic structure (Angelov, 2020). This method has demonstrated promising results in a variety of applications and provides an alternative to established approaches such as LDA (Egger & Yu, 2022).
2.10.3 STM (Structural Topic Model)
Roberts et al. (2019) developed Structural Topic Model (STM) as an extension of LDA that adds document-level metadata into the topic modeling process. This enables STM to capture the links between document characteristics and the found topics, which may be valuable for investigating how topics differ across contexts or time periods.
STM may simulate a wide range of information, including authorship, publication date, and categorization designations. Incorporating topical prevalence covariates - metadata that explain topical prevalence - into the modeling process can improve topic interpretation and give new insights into the underlying structure of the document collection (Roberts et al., 2019).
2.10.4 A Synopsis of Supervised Methods
While unsupervised approaches such as LDA, Top2Vec, and STM dominate the topic modeling field, supervised strategies that use labeled data to assist the topic-finding process are also available. LDA, for example, is extended by integrating response variables linked with each document, allowing the model to predict these variables based on the learned topic distributions.
Supervised approaches can be beneficial when labeled data is available, or certain subjects of interest are known ahead of time. However, this literature review focuses primarily on unsupervised techniques, as they are more generally applicable when labeled data is insufficient, or the goal is to uncover latent subjects without any prior assumptions.
Table 2.1 summarizes the advantages and disadvantages of Latent Dirichlet Allocation (LDA), Top2Vec, and Structural Topic Models (STM). This table gives a broad summary of the benefits and drawbacks of each algorithm. Specific benefits and drawbacks may vary depending on the nature of the text data and the application’s requirements.
| LDA | Top2Vec | STM | |
|---|---|---|---|
| Advantages | |||
| Generative probabilistic model | Combines document & word embeddings | Incorporates document-level metadata | |
| Scalable to large corpora | Automatically determines number of topics | Flexible modeling approach | |
| Interpretable topics (with proper tuning) | Provides topic, document, and word similarities | Can model relationships between topics | |
| Wide range of applications | Faster than traditional topic modeling techniques | Directly models covariates | |
| Disadvantages | |||
| Difficulty in selecting number of topics | Sensitive to the choice of word embeddings | Assumes linear effects of covariates | |
| Parameter selection is crucial | Difficult to interpret dense embeddings | Requires proper metadata | |
| No word-order or context information | Limited research on algorithm performance | Less interpretable than LDA (in some cases) | |
| Assumes topic independence | Not as established as LDA or STM | Sensitive to tuning and preprocessing |