In search of a hybrid approach for analyzing large textual corpora.


“In search of a hybrid approach for analyzing large textual corpora.  An example with the environmental assessment of wind power in Quebec.

[ This blog paper is also available as a PDF, in English and in French ]

Large sets of qualitative data: advantage or constraint?

Verbal and textual data are essential in social science research. However, the exploration of large corpora requires significant investments, both in time and funds, limiting their use or making them dependent on substantial funding. In this context, where their richness is  paradoxically an obstacle to knowledge, it seemed particularly interesting to examine the potential offered by hybrid approaches of content analysis (manual and computer-assisted) and by mixed methods to facilitate the analysis of these large sets of textual data. 

The project we’re introducing here is part of a larger effort to assess the potential of computer-assisted textual data analysis in our research fields. In a previous bibliographical analysis, we explored the use of automated lexical analysis to investigate scientific publications from members of a research group in territorial development (Fournis and Dumarcher, 2017). This following project on environmental assessment aims (1) to deepen this exploration by examining the potential of semi-automated analysis tools available in some QDA software, such as MaxQda, (2) to explore combinations of computational tools, to see to what extent they optimize and facilitate the analysis of these large corpora.

Let’s start with a few words about the project, before explaining in more detail how we used MaxQda…

The project: « Thinking like the BAPE. Expertise building at the Bureau d’Audiences Publiques sur l’Environnement (Quebec) 

In addition to the methodological goals we have just discussed, this project also aimed to increase our knowledge of the Bureau d’Audiences Publiques sur l’Environnement (BAPE), an independent environmental assessment and decision-making support agency of the Government of Quebec (Canada). More specifically, we wanted to study the structure of knowledge generated by public stakeholders during a development project – in this case, all reports produced by the BAPE related to wind farm projects in Quebec.

We conducted a case study on the wind energy sector, using the 21 reports produced during the public hearings (1997-2016). The analysis aims to examine the joint evolution of the addressed issues (thematic content: technical, environmental, territorial issues) and the expertise mobilized and referenced (bibliographic content: public, private, citizens, technical, scientific expertise, etc.) The idea is to identify the socio-technical frameworks favoured by public policy, as well as their variations and evolution according to projects and periods.

This study was therefore carried out in two parts – expertise and issues – corresponding to two methods:

  1. To explore the issues addressed in the reports, we used automated textual data analysis (with Iramuteq software).
  2. To examine the kinds of expertise involved in the 21 reports, we identified (semi-automatically with MaxQda) and analyzed the 5867 bibliographic and documentary references listed, as well as their 9192 citations through the text.

Let’s look in more detail how MaxQda was successfully mobilized in this assessment of expertise…

How did we use Maxqda to retrieve, code and extract quotes?

In practice, MaxQda was used to retrieve, code and extract citations from reports. It was the “extended lexical search” function that allowed us to automatically retrieve quotes, with the use of wildcards and regular expressions. The citations were then coded with the “autocode” function, depending on the type of document to which they refer. The BAPE classifies documents into categories according to their source and intended purpose (procedural documents, submitted by the sponsor, by participants, transcripts of meetings, briefs, bibliography, etc.) These documents are referenced and classified in a relatively standardized format (letter chains and numbers), which has simplified the retrieval and automatic coding.

[ It should be noted that our approach seems to be easily adaptable to the analysis of other semi-structured textual data (various administrative documents, scientific publications, etc.)  We could also have used MaxQda to analyze the issues addressed in the reports, but in accordance with our objective to explore the potential of automated and semi-automated content analysis approaches, we conducted an automatic lexical analysis with Iramuteq software. ]

The coding results were then further explored with MaxQda’s analysis and visualization tools: we examined similarities between reports (code matrix browser, similarity and distance matrices), and compared the internal structure of the reports (document portraits). This shed light on the most frequently cited types of documents and associated trends but remained too limited because these results were based only on the BAPE documents’ classification: they provided only a very limited insight into our object.

Figure 1: Document portrait and distance matrix in MAXQDA.

We wanted to take this further and examine the types of expertise mobilized and the kinds of stakeholders involved. In this regard, we took advantage of MaxQda’s rich export capabilities to carry out further analysis in a spreadsheet-based software. We built a database in the spreadsheet (working in a similar way to a – simplified – relational database), which links three entries:

(1) the BAPE reports (and their features), 

(2) the bibliographical and documentary references listed in the reports (manually coded according to the type of expertise and stakeholder to which they refer: public, private, citizen, technical, scientific, etc.)

(3) citations of these references over the reports’ text, retrieved and already coded once through MaxQda.

The export of the retrieved segments, in MaxQda, also includes information on the location of the segments within the reports (start and end indicators). It allowed us to consider the evolution of the kinds of expertise mobilized over time, as well as the variations across projects and their characteristics. We were able to see more clearly the trends and changes, and distinct phases emerged over time as the wind energy sector was being structured.

With some additional work on conversion and formatting, we were also able to visualize the density at which the different types of expertise are mobilized throughout the text of the reports, and their co-occurrence: it allowed us to see which kinds of expertise are mobilized jointly, in the form of “dialogue” (see figure 2).

Illustration 2: Examples of “dialogue” between different types of expertise, via co-occurrence and density of citations in reports.


The results have shown that, consistent with our expectations, the structure of the mobilized expertise evolves over time in conjunction with the evolution of the addressed issues. There is thus a kind of progressive stabilization of the wind energy sector: the emergence and stabilization of successive issues, through the mobilization of various expertise, and with fluctuations in the extent of public participation.This seems to be confirming the potential of mixed methods and hybrid approaches to content analysis, both for political science and policy analysis (as in this project) but also for bibliographic and bibliometric analysis (see the intersection between reference analysis and thematic analysis).

What’s next?

We will further explore the potential of these methods in the next phase of the project, which will incorporate public participation: a content analysis of the briefs (submitted as part of the hearings processes) will be used to contrast the issues raised in the briefs with those outlined in the reports. It is also planned to analyze the BAPE reports and briefs for other energy and resource sectors.

This project was conducted in collaboration with Yann Fournis (Université du Québec à Rimouski) and Sébastien Chailleux (Université de Pau et des Pays de l’Adour), with the financial support of the Centre de Recherche sur le Développement Territorial (CRDT).

For more information and update on the project, please visit: du-Bureau-dAudiences-Publiques-sur-lEnvironnement-Quebec-FR-EN

Bilbliography :

FOURNIS Yann et DUMARCHER Amélie, 2017. Le territoire du CRDT. La construction d’un espace intellectuel, entre science et territoire., Rimouski (QC): Éditions GRIDEQ-CRDT, 161 p.

CHAILLEUX Sébastien, DUMARCHER Amélie, FOURNIS Yann, 2017. La construction de l’expertise du Bureau d’Audiences Publiques sur l’Environnement (Québec) Le cas des projets éoliens (1997-2016)., presented at the « 14ème Congrès de l’AFSP », Montpellier (France).

CHAILLEUX Sébastien, DUMARCHER Amélie, FOURNIS Yann, 2017. La construction de l’expertise du Bureau d’Audiences Publiques sur l’Environnement (Québec) Le cas des projets éoliens (1997-2016)., presented at the « 14ème Congrès de l’AFSP », Montpellier (France).

Leave a comment