Gery W. Ryan RAND 1700 Main Street P.O. Box 2138 Santa Monica, CA 90407-2138 H. Russell Bernard Department of Anthropology 1350 Turlington Hall University of Florida Gaineville, FL 32611Key Words: Theme Identification, Exploratory Analysis, Open Coding, Text Analysis, Qualitative Research Methods
Abstract
Theme identification is one of the most fundamental tasks in qualitativeresearch. It also one of the most mysterious. Explicit descriptions of themediscovery are rarely described in articles and reports and if so are oftenregulated to appendices or footnotes. Techniques are shared among small groupsof social scientists and are often impeded by disciplinary or epistemologicalboundaries. During the proposal-writing phase of a project, investigatorsstruggle to clearly explain and justify plans for discovering themes. Theseissues are particularly cogent when funding reviewers are unfamiliar withqualitative traditions. In this article we have outlined a dozen techniques thatsocial scientists have used to discover themes in texts. The techniques aredrawn from across epistemological and disciplinary boundaries. They range fromquick word counts to laborious, in-depth, line-by-line scrutiny. Some methodswork well for short answers to open-ended questions while others are moreappropriate for rich, complex narratives. Novices and non-native speakers mayfind some techniques easier than others. No single technique is does it all. Tous, these techniques are simply tools to help us do better research.
Authors’ Statement
Gery W. Ryan is an Associate Behavioral Scientist at RAND in Santa Monica,California. H. Russell Bernard is professor of anthropology at the University ofFlorida. The research on which this article is based is part of a NationalScience Foundation Grant, on "Methods for Conducting Systematic TextAnalysis" (SRB-9811166). We wish to thank Stephen Borgatti for his helpfulsuggestions and two anonymous reviewers for their invaluable comments on earlierdrafts of this paper.
Introduction
At the heart of qualitative data analysis is the task of discovering themes.By themes, we mean abstract, often fuzzy, constructs which investigatorsidentify before, during, and after data collection. Where do these themes comefrom?
They come from reviewing the literature, of course. Richer literaturesproduce more themes. They come from the characteristics of the phenomena beingstudied. And they come from already-agreed-upon professional definitions, fromlocal common-sense constructs, and from researchers’ values, theoreticalorientation, and personal experience with the subject matter (Bulmer 1979;Strauss 1987; Maxwell 1996).
Mostly, though, researchers who consider themselves part of the qualitativetradition in social science induce themes from texts. This is what groundedtheorists call open coding, and what classic content analysts call qualitativeanalysis (Berleson 1952) or latent coding (Shapiro and Markoff 1997).There are many variations on these methods. Unfortunately, however, they are (a)scattered across journals and books that are read by disparate groups ofspecialists; and (b) often entangled in the epistemological wars that havedivided the social sciences. Our goal in this paper is to cross these boundariesand lay out a variety of theme-dredging methods so that all researchers who dealwith texts can use them to solve common research problems.
We outline here a dozen helpful techniques for discovering themes in texts.These techniques are based on: (1) an analysis of words (word repetitions,key-indigenous terms, and key-words-in contexts); (2) a careful reading oflarger blocks of texts (compare and contrast, social science queries, andsearching for missing information); (3) an intentional analysis of linguisticfeatures(metaphors, transitions, connectors); and (4) the physicalmanipulation of texts (unmarked texts, pawing, and cut and sort procedures).
The list is by no means exhaustive. Social scientists are an enterprisinglot. Over the last century they have invented solutions to all kinds of problemsfor managing and analyzing texts, and they will continue to do so. These burstsof methodological creativity, however, are commonly described perfunctorily, orare relegated to footnotes, and get little notice by colleagues acrossdisciplines. The dozen methods we describe here come from across the socialsciences and have been used by positivists and interpretivists alike.
1. Word repetitions
We begin with word-based techniques. Word repetitions, key-indigenous terms,and key-words-in-contexts (KWIC) all draw on a simple observation—if you wantto understand what people are talking about, look at the words they use.
Words that occur a lot are often seen as being salient in the minds ofrespondents. D'Andrade notes that "perhaps the simplest and most directindication of schematic organization in naturalistic discourse is the repetitionof associative linkages" (1991:294). He observes that "indeed, anyonewho has listened to long stretches of talk, whether generated by a friend,spouse, workmate, informant, or patient, knows how frequently people circlethrough the same network of ideas" (1991:287).
Word repetitions can be analyzed formally and informally. In the informalmode, investigators simply read the text and note words or synonyms that peopleuse a lot. For example, while conducting multiple in-depth interviews with Tony,a retired blue collar worker in Connecticut, Claudia Strauss (1992) found thatTony repeatedly referred to ideas associated with greed, money, businessmen,siblings, and "being different." These repetitions indicated toStrauss that these ideas were important, recurring themes in Tony’s life.Strauss displayed the relationships among these ideas by writing the concepts ona page of paper and connecting them with lines and explanations. Computerprograms such as ATLAS.ti and Nud*ist let you do this kind ofconnect-the-dots exercise by computer.1
A more formal analysis of word frequencies can be done by generating a listof all the unique words in a text and counting the number of times each occurs.Computers can easily generate word-frequency lists from texts and are a quickand easy way to look for themes. Ryan and Weisner (1996) asked fathers andmothers of adolescents: "Describe your children. In your own words, justtell us about them." Ryan and Weisner produced a list of all the uniquewords in the set of responses and the number of times each word was used bymothers and by fathers. Mothers were more likely than fathers to use words likefriends, creative, time, and honest; fathers were more likely than mothers touse words like school, good, lack, student, enjoys, independent, and extremely.Ryan and Weisner used this information as clues for themes that they would uselater in actually coding the texts.
2. Indigenous categories
Another way to find themes is to look for local terms that may soundunfamiliar or are used in unfamiliar ways. Patton (1990:306, 393-400) refers tothese as "indigenous categories" and contrasts them with"analyst-constructed typologies." Grounded theorist refer to theprocess of identifying local terms as in vivo coding (Strauss 1987:28-32,Strauss and Corbin 1990:61-74).
Understanding indigenous categories and how they are organized has long beena goal of cognitive anthropologists. The basic idea in this area of research isthat experience and expertise are often marked by specialized vocabulary. Forexample, Spradley (1972) recorded conversations among tramps at informalgatherings, meals, card games, and bull sessions. As the men talked to eachother about their experiences, there were many references to making a flop.
Spradley combed through his recorded material and notes looking for verbatimstatements made by informants about his topic. On analyzing the statements, hefound that most of the statements could fit into subcategories such as kinds of flops,ways to make flops, ways to make your own flop, kinds of peoplewho bother you when you flop, ways to make a bed, and kinds of beds.Spradley then returned to his informants and sought additional information fromthem on each of the subcategories. For other classic examples of coding forindigenous categories see Becker’s (1993) description of medical students useof the word crock, and Agar’s (1973) description of drug addicts’understandings of what it means to shoot up.
3. Key-words-in-context (KWIC)
Key-words-in-context (KWIC) are closely associated with indigenouscategories. KWIC is based on a simple observation: if you want to understand aconcept, then look at how it is used. In this technique, researchers identifykey words and then systematically search the corpus of text to find allinstances of the word or phrase. Each time they find a word, they make a copy ofit and its immediate context. Themes get identified by physically sorting theexamples into piles of similar meaning.
The concept of deconstruction is an abstract and oftenincomprehensible term used by social scientists, literary critics and writers inthe popular press. Jacques Derrida, who coined the term, refused to define it.To Derrida, the meaning of any text is inherently unstable and variable. Wiener(1997) was curious as to how the concept of deconstruction was used inthe popular press. He used a text-based data set (such as Lexis/Nexis), to findinstances of the word in popular publications. He found the term used in byeverything from Entertainment Weekly to the American Banker.Wiener concludes that:
Most often writers use "deconstruction" as a fancy word for "analysis" or "explanation," or else as an upscale synonym for "destruction." But in some genres, like rock music writing, the term isn't negative at all; it has become a genuinely floating signifier, a verbal gesture that implies a kind of empty intellectual sophistication.
Word-based techniques are typically a fast and efficient ways to startlooking for themes. We find that they are particularly useful at early stages oftheme identification. These techniques are also easy for novice researchers toapply. Nothing, however, beats a careful scrutiny of the texts for findingthemes that may be more subtle or that don’t get signified directly in thelexicon of the text. Scrutiny-based techniques are more time-intensive andrequire a lot of attention to details and nuances.
4. Compare and contrast
The compare and contrast approach is based on the idea that themes representthe ways in which texts are either similar or different from each other. Glazerand Strauss (1967:101_116) refer to this as the "constant comparisonmethod." [For other good descriptions of the technique see Glazer(1978:56_72) and Strauss and Corbin (1990:84_95).] Typically, grounded theoristsbegin by conducting a careful line-by-line analysis. They read each line orsentence and ask themselves, "What is this about?" and "How doesit differ from the preceding or following statements?" This kind ofdetailed work keeps the researcher focused on the data themselves rather than ontheoretical flights of fancy (Charmaz 1990).
This approach is like interviewing the text and is remarkably similar to theethnographic interviewing style that Spradley talks about using with hisinformants (1979:160_172). Researchers compare pairs of texts by asking"How is this text different from the preceding text?" and "Whatkinds of things are mentioned in both?" They ask hypothetical questionslike "What if the informant who produced this text had been a woman insteadof a man?" and "How similar is this text to my own experiences?"Bogdan and Biklen (1982:153) recommend reading through passages of text andasking "What does this remind me of?" Like a good journalist,investigators compare answers to questions across people, space, and time.
5. Social science queries
Besides identifying indigenous themes—themes that characterize theexperience of informants—researchers are interested in understanding howtextual data illuminate questions of importance to social science. Spradley(1979:199–201) suggested searching interviews for evidence of social conflict,cultural contradictions, informal methods of social control, things that peopledo in managing impersonal social relationships, methods by which people acquireand maintain achieved and ascribed status, and information about how peoplesolve problems. Bogdan & Bilken (1982:156-162) suggested examining thesetting and context, the perspectives of the informants, and informants’ waysof thinking about people, objects, processes, activities, events, andrelationships. "Moving across substantive areas," says Charmaz,"fosters developing conceptual power, depth, and comprehensiveness"(1990:1163).
Strauss and Corbin (1990:158_175) urge investigators to be more sensitive toconditions, actions/interactions, and consequences of a phenomenon and to orderthese conditions and consequences into theories. To facilitate this, they offera useful tool called the conditional matrix. The conditional matrix is aset of concentric circles, each level corresponding to a different unit ofinfluence. At the center are actions and interactions. The inner rings representindividual and small group influences on these actions, and the outer ringsrepresent international and national effects.
Querying the text as a social scientist is a powerful technique becauseinvestigators concentrate their efforts on searching for specific kinds oftopics – any of which are likely to generate major social and cultural themes.By examining the data from a more theoretical perspective, however, researchersmust be careful that they do not overfit the data – that is, find onlythat for which they are looking. There is a trade-off between bringing a lot ofprior theorizing to the theme-identification effort and going at it fresh. Priortheorizing, as Charmaz says (1990), can inhibit the forming of fresh ideas andthe making of surprising connections. Assiduous theory-avoidance brings the riskof not making the connection between data and important research questions.Novice researchers may be more comfortable with the tabula rasa approach.More seasoned researchers, who are more familiar with theory issues, may findthe social science query approach more compatible with their interests.
6. Searching for missing information
The final scrutiny-based approach we describe works in reverse from typicaltheme identification techniques. Instead of identifying themes that emerge fromthe text, investigators search for themes that are missing in the text.
Much can be learned from a text by what is not mentioned. As early as 1959,propaganda analysts found that material not covered in political speeches weresometimes more predictive that material that was covered (George 1959).Sometimes silences indicate areas that people are unwilling or afraid todiscuss. For instance, women with strong religious convictions may fail tomention abortion during discussions of birth control. In power-ladeninterviewers, silence may be tied to implicit or explicit domination (Gal 1991).In a study of birth planning in China, Greenhalgh (1994) surveyed1,011ever-married women, gathered social and economic histories from 150families. She conducted in-depth interviews with present and formal officials(known as cadres), and collected documentary evidence from local newspapers,journals and other sources. Greenhalgh notes that "Because I was largelyconstrained from asking direct questions about resistance, the informal recordof field notes, interview transcripts, and questionnaire data contains few overtchallenges to state policy (1994:9)." Greenhalgh concludes, however, that
I believe that in their conversations with us, both peasants and cadres made strategic use of silence to protest aspects of the policy they did not like. Cadres, for example were loathe to comment on birth-planning campaigns; peasant women were reluctant to talk about sterilization. These silences form one part of the unofficial record of birth planning in the villages. More explicit protests were registered in informal conversations. From these interactions emerged a sense of profound distress of villagers forced to choose between a resistance that was politically risky and a compliance that violated the norms of Chinese culture and of practical reason (1994:9).
Other times, absences may indicate primal assumptions made by respondents.Spradley (1987:314) noted that when people tell stories, they assume that theirlisteners share many assumptions about how the world works and so they leave outinformation that "everyone knows." He called this process abbreviating.Price (1987) takes this observation and builds on it. Thus, she looks for whatis not said in order to identify underlying cultural assumptions. Pricefinds the missing pieces by trying to translate what people say in the storiesinto something that the general public would understand.
Of all the scrutiny-based techniques, searching for missing information isthe most difficult. There are many reasons people do not mention topics. Inaddition to avoiding sensitive issues or assuming investigator already knowsabout the topic, people may not trust the interviewer, may not wish to speakwhen others are present, or may not understand the investigator’s questions.Distinguishing between when informants are unwilling to discuss topics and whenthey assume the investigator already knows about the topic requires a lot offamiliarity with the subject matter.
In addition to word- and scrutiny-based techniques, researchers have usedlinguistic features such as metaphors, topical transitions, and keywordconnectors to help identify themes.
7. Metaphors and analogies
Schema analysts suggest searching through text for metaphors, similes, andanalogies (D’Andrade 1995, Quinn and Strauss 1997). The emphasis on metaphorowes much to the pioneering work by Lakoff and Johnson (1980) and theobservation that people often represent their thoughts, behaviors, andexperiences with analogies.
Naomi Quinn (1997) has analyzed hundreds of hours of interviews to discoverconcepts underlying American marriage and to show how these concepts are tiedtogether. She began by looking at patterns of speech and at the repetition ofkey words and phrases, paying particular attention to informants' use ofmetaphors and the commonalities in their reasoning about marriage. Nan, one ofher informants, says that "marriage is a manufactured product." Thispopular metaphor indicates that Nan sees marriages as something that hasproperties, like strength and staying power, and as something that requires workto produce. Some marriages are "put together well," while others"fall apart" like so many cars or toys or washing machines (Quinn1987:174).
The object is to look for metaphors in rhetoric and deduce the schemas, orunderlying principles, that might produce patterns in those metaphors. Quinnfound that people talk about their surprise at the breakup of a marriage bysaying that they thought the couple’s marriage was "like the Rock ofGibraltar" or that they thought the marriage had been "nailed incement." People use these metaphors because they assume that theirlisteners know that cement and the Rock of Gibraltar are things that lastforever.
But Quinn reasons that if schemas or scripts are what make it possible forpeople to fill in around the bare bones of a metaphor, then the metaphors mustbe surface phenomena and cannot themselves be the basis for sharedunderstanding. Quinn found that the hundreds of metaphors in her corpus of textsfit into just eight linked classes that she calls: lastingness, sharedness,compatibility, mutual benefit, difficulty, effort, success (or failure), andrisk of failure. For example, Quinn’s informants often compared marriages(their own and those of others) to manufactured and durable products ("itwas put together pretty good") and to journeys ("we made it up as wewent along; it was a sort of do-it-yourself project"). Quinn sees thesemetaphors, as well as references to marriage as "a lifetimeproposition," as exemplars of the overall expectation of lastingness inmarriage.
Other examples of the search for cultural schemas in texts include Holland’s(1985) study of the reasoning that Americans apply to interpersonal problems,Kempton’s (1987) study of ordinary Americans’ theories of home heat control,and Claudia Strauss’s (1997) study of what chemical plant workers and theirneighbors think about the free enterprise system.
8. Transitions
Another linguistic approach is to look for naturally occurring shifts inthematic content. Linguistic forms of transition vary between oral and writtentexts. In written texts, new paragraphs are often used by authors to indicateeither subtle or abrupt shifts in topics. In oral speech, pauses, change intone, or particular phrases may indicate thematic transitions. Linguists whohave worked with precisely recorded texts in Native American languages havenoticed the recurrence of elements like "Now," "Then,""Now then," and "Now again." These often signal theseparation of verses and "once such patterning has been discovered in caseswith such markers, it can be discerned in cases without them" (Hymes1977:439).
For example, Sherzer (1994) presents a detailed analysis of a two-hourperformance by Chief Olopinikwa of a traditional San Blas Kuna chant. The chantwas recorded in 1970. Like many linguistic anthropologists, Sherzer had taughtan assistant, Alberto Campos, to use a phonetic transcription system. After thechant, Sherzer asked Campos, to transcribe and translate the tape. Campos putKuna and Spanish on left- and right-facing pages (1994:907). By studying Campos’stranslation against the original Kuna, Sherzer was able to pick out certainrecurrent features. Campos left out the chanted utterances of the respondingchief (usually something like "so it is"), which turned out to bemarkers for verse endings in the chant. Campos also left out so-called framingwords and phrases (like "Thus" at the beginning of a verse and"it is said, so I pronounce" at the end of a verse). These contributeto the line and verse structure of the chant. Finally, "instead oftransposing metaphors and other figurative and allusive language intoSpanish" Campos "explains them in his translation" (Sherzer1994:908). Researchers
In two-party and multiparty speech, transitions occur naturally. Conversationor discourse analysts closely examine linguistic features such as turn-takingand speaker interruptions to identify transitions in speech sequences. For agood overview, see Silverman (1993:114-143).
9. Connectors
A third linguistic approach is to look carefully at words and phrases thatindicate relationships among things. For example, causal relationships are oftenindicated by such words and phrases as, because, since, and as aresult. Words such as if or then, rather than, and insteadof often signify conditional relationships. The phrase is a is oftenassociated with taxonomic categories. Time-oriented relationships are expressedwith words such as before, after, then, and next. Typicallynegative characteristics occur less often than positive characteristics. Simplysearching for the words not, no, none, or the prefix nonmay be a quick way to identify themes. Investigator can discover themes bysearching on such groups of word and looking to see what kinds of things thewords connect.
What other kinds of relationships might be of interest to social scientists?Casagrande and Hale (1967) suggest looking for: attributes (e.g., X is Y),contingencies (e.g., if X, then Y), functions (e.g., X is a means of affectingY), spatial orientations (e.g., X is close to Y), operational definitions (e.g.,X is a tool for doing Y), examples (e.g., X is an instance of Y), comparisons(e.g., X resembles Y), class inclusions (X is a member of class Y), synonyms(e.g., X is equivalent to Y), antonyms (e.g., X is the negation of Y),provenience (e.g., X is the source of Y), and circularity (e.g., X is defined asX). [For lists of kinds of relationships that may be useful for identifyingthemes see Burton and Kirk (1980:271), Werner and Schoepfle (1987) and Lindsayand Norman (1972).]
Investigators often use the linguistic features described aboveunconsciously. Metaphors, transitions, and connectors are all part of a nativespeaker’s ability to grasp meaning in a text. By making these features moreexplicit, we sharpen our ability to find themes.
Finally, we turn to more tactile approaches for theme discovery. Each of thenext three techniques requires some physical manipulation of the text itself.
10. Unmarked texts
One way to identify new themes is to examine any text that is not alreadyassociated with a theme (Ryan 1999). This technique requires multiple readingsof a text. On the first reading, salient themes are clearly visible and can bequickly and readily marked with different colored pencils or highlighters. Inthe next stage, the search is for themes that remain unmarked. This tactic–markingobvious themes early and quickly—forces the search for new, and less obtrusivethemes.
11. Pawing
We highly recommend pawing through texts and marking them up with differentcolored highlighter pens. Sandelowski (1995a:373) observes that analysis oftexts begins with proofreading the material and simply underlining key phrases"because they make some as yet inchoate sense." Bernard (2000) refersto this as the ocular scan method, otherwise known as eyeballing.In this method, you get a feel for the text by handling your data multipletimes. [Bogdan and Biklen (1982:165) suggest reading over the text at leasttwice.] Researchers have been known to spread their texts out on the floor, tackbunches of them to a bulletin board, and sort them into different file folders.By living with the data, investigators can eventually perform the interocularpercussion test—which is where you wait for patterns to hit you between theeyes.
This may not seem like a very scientific way to do things, but it is one ofthe best ways we know of to begin hunting for patterns in qualitative data. Onceyou have a feel for the themes and the relations among, then we see no reason tostruggle bravely on without a computer. Of course, a computer is required fromthe onset if the project involves hundreds of interviews, or if it’s part of amulti-site, multi-investigator effort. Even then, there is no substitute forfollowing hunches and intuitions in looking for themes to code in texts (Dey1993).
12. Cutting and sorting
Cutting and sorting is a more formal way of pawing and a technique we bothuse quite a bit. It is particularly useful for identifying subthemes. Theapproach is based on a powerful trick most of us learned in kindergarten andrequires paper and scissors. We first read through the text and identify quotesthat seem somehow important. We cut out each quote (making sure to maintain someof the context in which it occurred) and paste the material on small indexcards. On the back of each card, we then write down the quote’s reference—whosaid it and where it appeared in the text. Then we lay out the quotes randomlyon a big table and sort them into piles of similar quotes. Then we name eachpile. These are the themes. This can be done with tag and search software, butwe find that nothing beats the ability to manually sort and group the cards.
There are many variations on this pile-sorting technique. The principleinvestigator on a large project might ask several team members to sort thequotes into named piles independently. This is likely to generate a longer listof possible themes than would be produced by a group discussion. In really largeprojects, pairs of coders could sort the quotes together and decide on the namesfor the piles. The pile-sorting exercise should be video- or audiotaped andinvestigators should pay close attention to discussions—between themselves andcoders or between coders—about which quotes belong together and why. Theseconversations are about as close as we will ever get to witnessing the emergenceof themes.
Barkin et al. (1999) interviewed clinicians, community leaders, and parentsabout what physicians could and did do to prevent violence among youth. Thesewere long, complex interviews, so Barkin et al. broke the coding process intotwo steps. They started with three major themes that they developed from theory.The principle investigator went through the transcripts and cut out all thequotes that pertained to each of the major themes. Then four other codersindependently sorted the quotes from each major theme into piles. Then, the pilesort data were analyzed with multidimensional scaling and cluster analysis toidentify subthemes shared across coders. [See Patterson et al. (1993) foranother example.]
Jehn and Doucet (1997) had short answers to open-ended questions. They foundthat several coders could easily sort these paragraph-length descriptions ofinter and intra-ethnic conflict. Then, like Barkin et al., Jehn and Doucet thenused multidimensional scaling and cluster analysis to identify subthemes ofconflict.
Another advantage to the cutting and sorting technique is that the data canbe used to systematically describe how such themes are distributed acrossinformants. After the piles have been formed and themes have been named, simplyturn over each quote and identify who mentioned each theme. (If the peoplesorting the quotes are unaware of who the quotes came from, this is an unbiasedway of coding.)
Discussion
The variety of methods available for coding texts raises some obviousquestions:
(1) Which technique generates more themes?
Frankly, we don’t know. There are just too many factors that influence thenumber of themes that are generated, including the technique itself, who and howmany people are looking for themes, and the kind and amount of texts beinganalyzed. If the goal is to generate as many themes as possible—which is oftenthe case in initial exploratory phases of research—then more is better. Thismeans using multiple techniques, investigators, and texts.
Nowhere is a multiple technique approach better exemplified than in the workof Jehn and Doucet (1996, 1997). Jehn and Doucet asked 76 U.S. managers who hadworked in Sino_American joint ventures to describe recent interpersonalconflicts with business partners. Each person described a situation with asame_culture manager and a different_cultural manger. First they generatedseparate lists of words from the intercultural and intracultural conflictnarratives. They asked 3 expatriate managers to act as judges and to identifyall the words that were related to conflict. They settled on a list of 542conflict words from the intercultural list and 242 words from the intraculturallist.
Jehn and Doucet then asked the three judges to sort the words into piles orcategories. The experts identified 15 subcategories for the intercultural data—thingslike conflict, expectations, rules, power, and volatile—and 15 categories forthe intracultural data—things like conflict, needs, standards, power,contentious, and lose. Taking into consideration the total number of words ineach corpus, conflict words were used more in intracultural interviews andresolution terms were more likely to be used in intercultural interviews.
Jehn and Doucet (1996, 1997) also used traditional content analysis on theirdata. The had two coders read the 152 conflict scenarios (76 intracultural and76 intercultural) and evaluated (on a 5_point scale) each on 27 different themesthey had identified from the literature. This produced two 76x27scenario_by_theme profile matrices—one for the intracultural conflicts and onefor the intercultural conflicts. The first three factors from the interculturalmatrix reflect: (1) interpersonal animosity and hostility; (2) aggravation; and(3) the volatile nature of the conflict. The first two factors from theintracultural matrix reflect: (1) hatred and animosity with a volatile natureand (2) conflicts conducted calmly with little verbal intensity.
Finally, Jehn and Doucet identified the 30 intracultural and the 30intercultural scenarios that they felt were the most clear and pithy. Theyrecruited fifty more expatriate managers to assess the similarities (on a5_point scale) of 60–120 randomly selected pairs of scenarios. When combinedacross informants, the managers judgments produced two aggregate,scenario_by_scenario, similarity matrices—one for the intracultural conflictsand one for the intercultural conflicts.
Multidimensional scaling of the intercultural similarity data identified fourdimensions: (1) open versus resistant to change, (2) situational causes versusindividual traits, (3) high_ versus low_resolution potential based on trust, and(4) high_ versus low_resolution potential based on patience. Scaling of theintracultural similarity data identified four different dimensions: (1) highversus low cooperation, (2) high versus low confrontation, (3) problem_solvingversus accepting, and (4) resolved versus ongoing.
The work of Jehn and Doucet is impressive because the analysis of the datafrom these tasks produced different sets of themes. All three emically inducedtheme sets have some intuitive appeal and all three yield analytic results thatare useful. They could have also used the techniques of grounded theory orschema analysis to discover even more themes.
(2) When are the various techniques most appropriate?
The choice of techniques depends minimally on the kind and amount of text,the experience of the researcher, and the goals of the project. Word-basedtechniques (e.g., word repetitions, indigenous categories, and KWIC) areprobably the least labor intensive. Computer software such as Anthropacand Code-a-text have little trouble in generating frequency counts of keywords.2 A careful look at the frequency list and maybe some quickpile sorts are often enough to identify quite a few themes. Word-basedtechniques are also the most versatile. They can easily be used with complextexts such as the complete works of Shakespear or the Bible, as well as, withsimple short answers to open-ended questions. They can also be used relativelyeasily by novice and expert investigators alike. Given their very nature,however, they are best used in combination with other approaches.
Scrutiny-based techniques (e.g., compare and contrast, querying the text, andexamining absences) are most appropriate for rich textual accounts and tend tobe overkill for analyzing short answer responses. Investigators who are justbeginning to explore a new topical area might want to start withcompare-and-contrast techniques before moving on to the more difficult tasks ofquerying the text or searching for missing information. We do not advise usingthe latter two techniques unless the investigator is fluent in the language inwhich the data are collected. If the primary goal of the this portion of theinvestigation is to discover as many themes as possible, then nothing beatsusing these techniques on a line-by-line basis.
Like scrutiny-based techniques, linguist-based approaches are better used onnarrative style accounts rather than short answer responses. Looking fortransitions is the easiest technique to use, especially if the texts areactually written by respondents themselves (rather than transcribed from taperecordings of verbal interviews). Searching for metaphors is also relativelyeasy once novices have been trained on what kind of things to look for in thetexts. Looking for connecting words and phrases is best used as a secondary waveof finding themes, once the investigator has a more definite idea of what kindsof themes he or she finds most interesting.
In the early stages of exploration, nothing beats a thorough reading andpawing through of the data. This approach is the easiest for novice researchersto master and is particularly good for identifying major themes. As theexploration progresses, investigators often find themselves looking forsubthemes within these major themes. The cutting and sorting techniques are mosthelpful here. Investigators can identify all text passages that are related to amajor theme, cut them out, and sort them into subthematic categories. Likewise,if they are marking texts for each newly discovered theme, then they can applythe unmarked text technique as they go. We have seen these three techniquesapplied successfully to both rich narrative data as well as simple responses toopen-ended questions.
An even more powerful strategy would be to combine multiple techniques in asequential manner. For example, investigators might begin by pawing through thedata to see what kinds of themes just stick out. As part of this process, theymight want to make comparisons between paragraphs and across informants. A quickanalysis of word repetitions would also be appropriate for identifying themes atsuch an early stage of the analysis. If key words or indigenous phrases arepresent, researchers might followed-up by conducting more focused KWIC analyses.If the project is examining issues of equality, investigators might also lookfor texts that are indicative of power differentials and access to resources.Texts representing major themes can be marked either on paper or by computer.Investigators can then search areas that are not already marked for additionalthemes or cut and sort marked texts into subthemes.
Researchers also might consider beginning by looking for identifying allmetaphors and similes, marking them, cutting them out and sorting them intothematic categories. There is no single way to discover themes. In themediscovery, we assume that more is always better.
(3) When do you know when you’ve found all the themes?
There is no magic formula to answer this question. The problem is similar toasking members of a population to list all the illnesses they know. One cannever be sure of the full range of illnesses without interviewing the entirepopulation. This is true because there is always the possibility that the lastperson interviewed will mention a new disease. We can simplify the processconsiderably, however, if we are willing to miss rarely-mentioned illness. Onestrategy would be to interview people until some number of respondents in a row(say five or more) fail to mention any new illnesses.
In text analysis, grounded theorists refer to the point at which no newthemes are being identified as theoretical saturation (Strauss and Corbin1990:188). When and how theoretical saturation is reached, however, depends thenumber of texts and their complexity, as well as on investigator experience andfatigue, and the number of investigators examining the texts. Again, more isbetter. Investigators who have more experience finding themes are likely toreach saturation latter than novices. Wilson and Hutchinson warn against prematureclosure where the researcher "fails to move beyond the face value ofthe content in the narrative (1990:123)."
Summary
Theme identification is one of the most fundamental tasks in qualitativeresearch. It also one of the most mysterious. Explicit descriptions of themediscovery are rarely described in articles and reports and if so are oftenregulated to appendices or footnotes. Techniques are shared among small groupsof social scientists and are often impeded by disciplinary or epistemologicalboundaries. The lack of clear methodological descriptions is most evident duringthe grant-writing phase of research. Investigators (ourselves included) struggleto clearly explain and justify plans for discovering themes in the qualitativedata. These issues are particularly cogent when funding reviewers are unfamiliarwith qualitative traditions.
In this article we have outlined a dozen techniques that social scientistshave used to discover themes in texts. The techniques are drawn from acrossepistemological and disciplinary boundaries. They range from quick word countsto laborious, in-depth, line-by-line scrutiny. Some work well for short answersto open-ended questions while others are more appropriate for rich, complexnarratives. Novices and non-native speakers may find some techniques easier thanothers. No single technique is does it all. To us, these techniques are simplytools to help us do better research.
Notes
1
ATLAS.ti (Scientific Software Development) and Nud•ist(Qualitative Solutions & Research) are qualitative analysis packagesdistributed in the United States by SCOLARI, Sage Publications, Inc., 2455Teller Road, Thousand Oaks, CA 91320. Tel: (805) 499 1325. Fax: (805) 499 0871.E_mail: atlasti@scolari.com.Web: www.scolari.com.2
Anthropac (Analytic Technologies) and Coda-A-Text (Cartwright) aresoftware packages that have the capacity to convert free flowing texts intoword-by-document matrices. Code-A-Text is distributed in the United States bySCOLARI, Sage Publications. Anthropac is created and distributed byAnalytic Technologies, Inc., Analytic Technologies, Inc., 11 Ohlin Lane,Harvard, MA 01451. Tel: (978) 456_7372. Fax: (978) 456_7373. E_mail: sales@analytictech.com.Web: www.analytictech.com.References Cited
Agar, Michael.
1973 Ripping and running: A formal ethnography of urban heroin addicts.New York: Seminar Press.
Agar, Michael and Jerry Hobbs
1985 How to grow schemata out of interviews. In Directions in CognitiveAnthropology. Janet Dougherty, ed. Pp. 413-431. Urbana, IL: University ofIllinois Press.
Barkin, Shari, Gery Ryan, Lillian Gelberg
1999 What clinicians can do to further youth violence primary prevention: Aqualitative study. Injury Prevention, 5:53-58.
Becker, Howard
1993 How I learned what a crock was. Journal of Contemporary Ethnography 22:28-35.
1998 Tricks of the trade: How to think about your research while you’redoing it. Chicago: University of Chicago Press.
Berelson, Bernard
1952 Content analysis in communication research. Glencoe, IL: FreePress.
Bernard, H. Russell
2000 Social Research Methods: Qualitative and Quantitative Approaches.Thousand Oaks, CA: Sage Publications.
Bogdan, Robert, and Sari Knopp Biklen
1992 Qualitative Research for Education: An Introduction to Theory andMethods, 2d ed. Boston: Allyn and Bacon.
Borgatti, Stephen
1999 Elicitation Methods for Cultural Domain Analysis. In J. Schensul &M. LeCompte (Ed.) The Ethnographer's Toolkit, Volume 3. Walnut Creek:Altamira Press, 115-151.
Bulmer, Martin
1979 Concepts in the analysis of qualitative data. Sociological Review 27(4)651-677).
Charmaz, Kathy
1990 "Discovering" Chronic Illness: Using Grounded Theory. SocialScience and Medicine 30:1161–1172.
Charmaz, Kathy
2000 Grounded theory: Objectivist and constructivist methods. In Handbookof Qualitative Research, 2nd Edition. Norman Denzin and YvonnaLincoln, eds. Thousand Oaks, CA: Sage Publications. Pp. 509-536.
D'Andrade, Roy
1995 The development of cognitive anthropology. Cambridge: CambridgeUniversity Press.
Dey, Ian
1993 Qualitative Data Analysis: A User_Friendly Guide for SocialScientists. London: Routledge and Kegan Paul.
Gal, Susan
1991 Between speech and silence: The problematics of research on language andgender. In Gender at the crossroads of knowledge: Feminist anthropology in thepostmodern era. Michaela di Leonardo, ed. Berkeley: University of CaliforniaPress. Pp. 175-203.
George, A. L. 1959. Quantitative and qualitative approaches to contentanalysis. In Trends in content analysis I. de Sola Pool, ed. Pp. 7_32. :University of Illinois Press.
Glaser, Barney G. and Anselm Strauss
1967 The Discovery of Grounded Theory: Strategies for Qualitative Research.New York: Aldine.
Gladwin, Christina
1989 Ethnographic Decision Tree Modeling. Newbury Park, CA: SagePublications.
Greenhalgh, Susan
1994 Controlling births and bodies. American Ethnologist 21:3_30.
Henley, N.M.
1969 A Psychological Study of the Semantics of Animal Terms. Journal ofVerbal Learning and Verbal Behavior 8:176-84.
Jehn, Karen A. and Lorna Doucet
1996 Developing Categories from Interview Data: Text Analysis andMultidimensional Scaling. Part 1. Cultural Anthropology Methods Journal 8(2):15–16.
1997 Developing Categories for Interview Data: Consequences of DifferentCoding and Analysis Strategies in Understanding Text. Part 2. CulturalAnthropology Methods Journal 9(1):1–7.
Lindsay, Peter H. and Donald A Norman
1972. Human information processing: An introduction to psychology. NewYork: Academic Press.
Maxwell, Joseph
1996 Qualitative research design: An interactive approach. Thousand Oaks, CA:Sage Publications.
Miles, Matthew and A. Michael Huberman
1994 Qualitative Data Analysis, 2d ed. Thousand Oaks, CA: SagePublications.
Patton, Michael Q.
1990 Qualitative Evaluation and Research Methods. Thousand Oaks, CA:Sage Publications.
Pool, de Sola, Ithiel, ed.
1959 Trends in Content Analysis. Urbana: University of Illinois Press.
Price, Laurie
1987 Ecuadorian Illness Stories. In Cultural Models in Language andThought. D. Holland and N. Quinn, eds. Pp. 313–342. Cambridge: CambridgeUniversity Press.
Ryan, Gery
1999 Measuring the typicality of text: Using multiple coders for more thanjust reliability and validity checks. Human Organization, 58(3):313-322.
Spradley, James
1972 Adaptive Strategies of Urban Nomads. In Culture and Cognition:Rules, Maps, and Plans. J. P. Spradley, ed. Pp. 235-278. New York: ChandlerPublishing Company.
Spradley, James
1979. The Ethnographic Interview. New York: Holt, Rinehart andWinston.
Strauss, Claudia
1992 What makes Tony run? Schemas as motive reconsideration. In Humanmotives and cultural models R. D'Andrade and C. Strauss, eds. Pp. 191-224.Cambridge: Cambridge University Press.
Strauss, Claudia and Naomi Quinn
1997 A cognitive theory of cultural meaning. Cambridge: CambridgeUniversity Press.
Wiener, Jon
1997 Deconstruction goes pop. (The increasing use of the word'deconstruction'). The Nation, April 7, 264(13):43-45.
Wilson, Holy Skodol and Sally Hutchinson
1990 Methodologic mistakes in grounded theory. Nursing Research,45(2):122-124.
Wright, Joanne
1997 Deconstructing development theory: Feminism, the public/privatedichotomy and the Mexican maquiladoras. The Canadian Review of Sociology andAnthropology, 34(1):71-92.
FAQs
Techniques to Identify Themes in Qualitative Data? ›
- Word repetitions. ...
- Indigenous categories. ...
- Key-words-in-context (KWIC) ...
- Compare and contrast. ...
- Social science queries. ...
- Searching for missing information. ...
- Metaphors and analogies. ...
- Transitions.
A popular and helpful categorization separate qualitative methods into five groups: ethnography, narrative, phenomenological, grounded theory, and case study.
How do you define a theme in qualitative research? ›'Themes' are features of participants' accounts characterising particular perceptions and/or experiences that the researcher sees as relevant to the research question. 'Coding' is the process of identifying themes in accounts and attaching labels (codes) to index them.
What are the four common methods of qualitative data analysis? ›Common approaches include grounded theory, ethnography, action research, phenomenological research, and narrative research. They share some similarities, but emphasize different aims and perspectives.
Which analytical technique is used by qualitative researchers? ›Content analysis: This is one of the most common methods to analyze qualitative data. It is used to analyze documented information in the form of texts, media, or even physical items. When to use this method depends on the research questions. Content analysis is usually used to analyze responses from interviewees.
What is identifying themes in data analysis? ›Thematic analysis is a method of analyzing qualitative data. It is usually applied to a set of texts, such as an interview or transcripts. The researcher closely examines the data to identify common themes – topics, ideas and patterns of meaning that come up repeatedly.
What are the five types of data analysis? ›While it's true that you can slice and dice data in countless ways, for purposes of data modeling it's useful to look at the five fundamental types of data analysis: descriptive, diagnostic, inferential, predictive and prescriptive.
What are quantitative analysis techniques? ›Quantitative analysis (QA) is a technique that uses mathematical and statistical modeling, measurement, and research to understand behavior. Quantitative analysts represent a given reality in terms of a numerical value.
What are the techniques or methods used to analyze quantitative data? ›Quantitative data analysis techniques typically work with algorithms, mathematical analysis tools, and software to gain insights from the data, answering questions such as how many, how often, and how much. Data for quantitative data analysis is usually gotten from avenues like surveys, questionnaires, polls, etc.