6 Data Analysis, Results, and Discussion: Longitudinal Tracking Phase

The longitudinal phase of the LongSAL study involved understanding students’ searching as learning behaviour for the research paper writing task. The aim was to investigate how students’ information search behaviours and learning outcomes change over time. This chapter presents participant information, the timeline of data collection, and data analysis steps, followed by the results of the longitudinal phase of the study. The findings from the initial and final phases are presented in the next chapter for better narration.

6.1 Participants and Timeline of Collected Data

Timeline of collected data.

Figure 6.1: Timeline of collected data.

Figure 6.1 shows the timeline of the data collected in the LongSAL study, spread across the Spring 2022 semester at the University of Texas at Austin, USA. Eighteen participants enlisted their names in the Recruitment Questionnaire, QSNR0. Sixteen showed up for the initial phase and remained in the study till the mid-point of the semester (mid-term questionnaire, QSNR2). Then six participants dropped off, and ten participants fully completed the study (who stayed until the exit questionnaire, QSNR3.) In Figure 6.1, participant names in bold (bottom 10) indicate those who fully completed the longitudinal study, while participant names in italics (top 8) indicate those who dropped off at various stages along the way. For data analysis, we partitioned the semester duration into three phases along the green vertical dotted lines: beginning, middle, and end of the semester. Inspired by the Spanish TV series La Casa de Papel (English name: Money Heist)14, participants were assigned code-names which were aliases of geographic locations.

6.2 Data Analysis Framework

Data analysis framework followed in this dissertation.

Figure 6.2: Data analysis framework followed in this dissertation.

The general framework for analysing the data collected in the LongSAL study is described in Figure 6.2. Primarily, two categories of data were collected in the study: explicit responses via Qualtrics survey platform, and implicit search behaviours via YASBIL browsing logger. The data collected from Qualtrics were primarily search task written responses, as well as responses to the individual difference questionnaire instruments for motivation, metacognition, and self-regulation. Data from the questionnaires were used to divide participants into groups, which is discussed in Section 6.3. Log data from YASBIL was cleaned, processed, categorized (Section 6.6) and analysed, to examine (differences in) search behaviour of different participant groups.

For examining statistical differences between groups, we employed the non-parametric Mann-Whitney U test for null-hypothesis significance testing (Mann & Whitney, 1947). This choice was due to several reasons: the sample sizes were often small, the groups were imbalanced, and / or the data did not satisfy the assumptions of parametric tests such as ANOVA. Employing one statistical test allows for easy comparison between different categories of results.

Additionally, we also report Common Language Effect Size (CLES) for each statistical test result. The common language effect size is the proportion of pairs where \(x\) from the first group is greater than \(y\) from the second group. In other words, it is the probability that a score sampled at random from the first distribution will be higher than a score sampled from the second distribution. CLES was first introduced by McGraw & Wong (1992). The Python statistical library employed in the data analysis – Pingouin (Vallat, 2018) – uses a brute-force version of the formula given by Vargha & Delaney (2000). The advantage is of this method are twofold: first, the brute-force approach pairs each observation of x to its y counterpart, and therefore does not require normally distributed data; second, the formula takes ties into account and therefore works with ordinal data 15.

6.3 Latent Profile Analysis

Mean Values of Indicator Variables for the two identified groups via Latent Profile Analysis (LPA). The grouping was based on the self-reported values of motivation (IMI), metacognition (MAI), self-regulation (SRQ), and a Memory Span task.

Figure 6.3: Mean Values of Indicator Variables for the two identified groups via Latent Profile Analysis (LPA). The grouping was based on the self-reported values of motivation (IMI), metacognition (MAI), self-regulation (SRQ), and a Memory Span task.

According to Ambrose et al. (2010), students’ motivation, metacognition, and self-regulation are critical factors that determine, direct, and sustain what they do to learn. Given our interest in understanding how these traits impact students’ searching as learning behaviour, we collected self-perceived reports of all three constructs, via the IMI, MAI, and SRQ questionnaires (Section 5.3.2). However, it is important to note that these constructs are not single binary variables that can be used to easily group individuals. Rather, they are complex and multidimensional data that serve as observable indicators of a person’s underlying latent characteristics.

To cluster participants into meaningful groups based on these multiple constructs, we turned to the educational psychology literature. Latent Profile Analysis (LPA) is an increasingly popular statistical approach falling under the umbrella of person-centred techniques used in organizational psychology and child development research. It provides a framework for characterizing population heterogeneity in terms of differences across individuals on a set of behaviours or characteristics, as opposed to describing the variability of a single variable. By identifying latent subgroups within a population, LPA enables researchers to gain a more nuanced understanding of the complexity of human behaviour.

The person-centred approach underlying LPA is a departure from traditional variable-centred approaches such as multiple regression analysis. Instead of quantifying the role of particular variables in a study, LPA organizes a population into a finite number of mutually exclusive and exhaustive profiles, each comprising individuals who are similar to one another. In this way, LPA identifies distinct profiles of individuals who exhibit similar patterns of behaviour across multiple variables.

The identification and description of these latent profiles is a crucial step in LPA. Each profile represents a subgroup of individuals who share similar patterns of responses on a set of variables, which can provide insights into the underlying mechanisms driving their behaviour. Furthermore, the identification of the optimal number of profiles to represent a population is a critical issue in LPA. This involves balancing the complexity of the model with its ability to capture meaningful variability in the data, and requires careful consideration of both statistical and substantive criteria.

LPA has several advantages over traditional variable-centred approaches. It allows for a more nuanced understanding of the complexity of human behaviour, particularly in cases where individuals exhibit multiple and diverse patterns of behaviour across different sets of variables.

In the context of information search behaviour, LPA can help to identify distinct groups of individuals who engage in different search strategies or have different search motivations. This can be useful for understanding how people search for information online, what factors influence their search behaviour, and how search behaviour relates to other variables such as task performance, satisfaction, and learning outcomes.

The purpose of this study was to investigate the relationship between individual differences in motivation, metacognition, and self-regulation and search behaviour. To classify participants into high and low groups based on their scores on these questionnaires, we employed LPA. LPA is particularly useful when the relationship between variables is not well understood or when it is difficult to determine which variables should be used to classify individuals into groups. We employed LPA to identify latent profiles of participants based on their scores on the IMI, MAI, and SRQ questionnaires. LPA is particularly useful when the relationship between variables is not well understood, or when it is difficult to determine which variables should be used to classify individuals into groups.

The results of the LPA showed that there were two distinct groups (latent profiles) of participants based on their scores on the IMI, MAI, SRQ, and MS: a high group and a low group (Figure 6.3). The high group had generally higher average scores on the IMI, MAI, SRQ, and MS compared to the low group, indicating that they were more intrinsically motivated, more aware of their metacognitive processes, and had higher levels of self-regulation.

Diagram illustrating, at different timepoints, how participants stayed within their same high/low LPA profiles (green) or changed profiles (orange). Grey trajectories indicate participants who dropped off and did not complete the study.

Figure 6.4: Diagram illustrating, at different timepoints, how participants stayed within their same high/low LPA profiles (green) or changed profiles (orange). Grey trajectories indicate participants who dropped off and did not complete the study.

Figure 6.4 illustrates the memberships in the two groups at different points in time, and how one participant (P016_AUSTIN) changed group membership at the end of the semester. 12 participants started off the semester (QSNR1) in the high group, while 4 in the low group. The group membership remained the same in the middle of the semester (QSNR2). At semester end, 4 participants from the high group and 2 participant from the low group dropped off. One participant transitioned from high group to low group. This resulted in 7 participants in the high group and 3 participant in the low group, with no data for 6 participants at the semester end timepoint (QSNR3)

In the discussion that follows, all the Effect Sizes (ES) reported as part of Mann Whitney U tests, compare the scores of the low group (first distribution) with those of the high group (second distribution). In other words, an example effect size \(ES=0.19\) means that there is a 19% chance that a score from the low group will be greater than the corresponding score from the high group.

6.4 Learning and Search Outcomes

Self-reported learning and search outcomes (a, b), and instructor assigned grades for the high and low groups (c).

Figure 6.5: Self-reported learning and search outcomes (a, b), and instructor assigned grades for the high and low groups (c).

Figure 6.5 shows the mean values of the self-reported (perceived) learning outcomes (a), self-reported search outcome (b), and instructor assigned grades (c) for the Ethical Dilemma research paper writing task over the semester. The self-reported learning and search outcomes are inspired from work by Collins-Thompson et al. (2016).

We see that the high group had higher levels of perceived learning outcome and perceived search outcome compared to the low group, and these differences were statistically significant: \((U = 115.0, p = .0005, ES = 0.19)\) for the learning outcome, and \((U = 151.0, p = .005, ES = 0.25)\) for the search outcome. (The effect sizes indicate the probability that a value chosen at random from the low group’s scores will be greater than a value chosen at random from the high group’s scroes.)

The fact that the high group had statistically significant higher self-reported learning outcomes and self-reported search outcomes suggests that these students had a higher level of motivation, self-regulation, and better time management skills than the low group. This is because students who are motivated and self-regulated tend to be more efficient and effective in their information searching behaviours, which in turn may lead to better learning outcomes. Additionally, students in the high group were better at managing their time and resources, allowing them to engage in more thorough and comprehensive information-searching activities. This is corroborated by interview responses as well. For instance, when asked “How did you keep track of the sources you found?”, a participant in the low group responded:

…in the reference list, I put all the links that I found. I did [save the articles] at one point. And then before I knew it, I had 10 tabs (open), and I feel extremely unmotivated. I said, no, I can’t do this anymore. So I just closed all of them.

And I decided, okay, everything’s in the reference list, Whatever seems like its relevant. I’ll click on it, and I’ll see it later. So I don’t have to read 10 different papers. I relied too much on the reference list. I throw everything in there, like okay, I’ll deal with you later.

— P007_PARIS

In contrast, a participant in the high group responded:

I have a separate document with a table and three columns, one for the in-line citation, so I could just easily copy and paste it. And then the middle was direct quotes. And then the (last column) was notes that are like sentences, what I wanted to say. And so that’s how I organize my (sources).

— P021_JAVA

The response from the participant in the low group suggests a less organized approach to tracking sources. They mentioned initially saving articles and opening multiple tabs but eventually feeling overwhelmed and unmotivated, leading them to rely heavily on the reference list. This indicates a lack of systematic organization and a reliance on the reference list as a means of keeping track of sources, potentially leading to challenges in managing and accessing relevant information. In contrast, the participant from the high group described a more structured and organized method for tracking sources. They mentioned using a separate document with a table consisting of three columns for in-line citations, direct quotes, and notes. This approach demonstrates a deliberate and systematic way of organizing and capturing information from sources, allowing for efficient referencing and easy retrieval of relevant content during the writing process. These qualitative responses align with the quantitative findings of higher perceived learning outcome and perceived search outcome in the high group. The more organized and structured approach to tracking sources in the high group is likely to have contributed to their enhanced perception of learning and search outcomes.

The instructor-assigned grades for the high group also generally stayed higher than the low group (except for the Proposal stage). However, the differences were small. This may indicate that the instructors’ grading criteria may not have fully captured the impact of information-searching behaviours on learning outcomes. This is because the instructor’s grading criteria may have been focused more on the content of the research paper, rather than the process of information searching. As a result, students who were more effective in their information-searching behaviours may not have received a higher grade, even if their research paper was of higher quality. Another possibility is that the grading process may have been liberal towards the students.

Collins-Thompson et al. (2016) reported that “searchers’ perceived learning outcomes closely matched their actual learning outcomes” and this was also indirectly correlated with their information search behaviours in terms of dwell time on documents. Let us examine in the following sections how the findings from the LongSAL study compare and contrast with those reported by Collins-Thompson et al. (2016) and others.

6.5 Q: Query Formulations

6.5.1 Length and Count of Queries per Search Task

Lengths and count of queries for each search task in the Longitudinal Phase.

Figure 6.6: Lengths and count of queries for each search task in the Longitudinal Phase.

Query length was operationalized as the number of terms (words separated by spaces) in the search query that participants submitted to the search engines or other information retrieval sites. Query length can vary from a single word to several phrases or a complete sentence. Longer search queries may indicate a more specific or complex information need, while shorter search queries may be more general or broad in scope. Queries count per search task refers to the number of separate queries or search attempts that a participant issued in order to complete a task. This measure may vary depending on the complexity of the information need, the user’s level of expertise with the search system, and other factors.

Figure 6.6 (a) and (b) shows the differences in total and average query length of the high and low groups, while Figure 6.6 (c) illustrates the number of queries issued, at different stages of writing the research paper. The low group demonstrated a zig-zag pattern in their total query length and query count over the semester, with a low start at proposal, followed by a peak at outline, a dip at rough draft, and a peak at Final Paper. The high group had a steady increase in total query length and query count, from proposal to outline to rough draft, and took a very gentle dip (or remained steady) at final paper stage. Comparing the total query length and query count to the average query length (Figure 6.6 (b)), we see that the high group maintained a steady 4-5 terms per query throughout the semester, whereas the low group had a jump to more than 10 terms per query in the rough draft stage.

The low group issued few short queries during the proposal, many short queries during the outline, very few long queries during the rough draft, and again many short queries during the final paper phase. On the other hand, the high group kept issuing a steadily increasing count of similar-length (short) queries throughout the semester.

Combining these results we can posit that the low group demonstrated signs of struggling throughout the semester (Hassan et al., 2014). These students may have struggled to effectively search for information at the beginning of the semester, but then increased their search efforts as the deadlines approached. The fact that they issued few long queries during the rough draft phase may indicate that they were not able to effectively refine their search strategies to find more relevant and valuable information. In contrast, the high group’s pattern of issuing a steadily increasing count of similar-length (short) queries throughout the semester suggests that these students may have had a more consistent and effective search strategy. They may have been better able to refine their search strategies over time, which allowed them to find more relevant and useful information throughout the different stages of the research paper writing process.

6.5.2 Query Reformulation Types (QRTs)

Number of different query reformulation types (QRTs) as per taxonomy proposed by C. Liu et al. (2010).

Figure 6.7: Number of different query reformulation types (QRTs) as per taxonomy proposed by C. Liu et al. (2010).

Query reformulation refers to the process of modifying or refining a search query in order to improve the relevance of search results and better match the user’s information needs (Section 3.2.1). Query reformulation typically occurs due to a searcher’s improved understanding of how to better translate their information need into a search query. Using the taxonomy proposed by C. Liu et al. (2010) (Figure 3.3), we classified each previous-next query pair issued by participants into one of the five query reformulation types (QRTs): New, Generalization, Specialization, Word Substitution, and Repeat.

We faced a challenge in disentangling Repeat queries from “hub-and-spoke” behaviour, where the user goes back and forth between a SERP and different content page by using the browser’s forward and back buttons 16. Each back button press on the browser (to go back to the SERP from a content page) meant a fresh HTTP GET request was sent to the search engine. This resulted in YASBIL logging the move as a resubmission of the query. So for the discussions that follows, “Repeat” refers to repeat queries combined with hub-and-spoke behaviour..

The counts of the five QRTs are presented in Figures 6.7 (a) through (e). For the low group, the trend of counts followed similarly from their trends of query counts and total query lengths (Figures 6.6 (a) and (c)), with varying intensities: alternating between high and low values at successive points in the semester. For the high group, except Query Generalizations – which followed a zig-zag pattern – all the other QRTs showed an overall increase in count throughout the semester.

The high group issued the most new queries and generalized queries while writing the rough draft. In contrast they had the highest counts of specializations and word substitutions while writing the final paper. The low group, on the other hand, had their lowest counts of all QRTs, except repeat, while writing the rough draft. The most interesting tresnd is that of Query Generalizations (Figure 6.7(b)), where the high group and low group demonstrated diametrically opposite behaviour: maxima at outline and final paper for the low group, whereas minima at those stages for the high group. The high group also issued significantly fewer repeat queries (aka hub and spoke behaviour) throughout the semester, compared to the low group \((U = 228.0, p = .02, ES = 0.74)\).

The low group’s fewer counts of all QRTs while writing the rough draft suggests that they may have struggled to effectively reformulate their queries throughout the different stages of the research paper writing process, perhaps due to the complexity and depth of the research required for the tasks. The low group may have had more difficulty refining and targeting their search queries, resulting in more new and repeated queries at the final stage of the paper writing process. They may also have had more difficulty with the conceptualization of their research question or topic, leading to more generalizations and fewer specializations in their queries. Additionally, their higher use of repeat queries (or hub and spoke behaviour) may indicate that they were relying on a limited set of sources or search terms, which may have limited their ability to find new and relevant information.

The high group, however, had a different pattern of query reformulation compared to the low group. They had their highest counts of new queries and query generalizations in the Rough Draft Phase, and most specialization, word substitutions, and repeat queries while writing the Final Paper. This indicates that their queries were more exploratory in the early part of the semester, and became more precise and refined in the later parts of the semester. They may have been more proactive in identifying new avenues for research earlier in the semester. The highest count of query generalizations during the writing of the rough draft may suggest that they were better able to synthesize and generalize information from their sources at an earlier stage in the writing process. The high group may also have been better able to refine their search queries through word substitutions, which peaked while writing the final paper, indicating a greater level of precision and focus in their information seeking behaviour. In contrast, the low group had their highest count of repeat queries during the outline and final paper, indicating that they may have had more difficulty finding and retaining relevant information throughout the research process.

Interview responses from the participants also support these quantitative results. When asked which stage of the project needed the most amount of searching – the proposal, the outline, the rough draft, or the final paper – a participant in the low group responded:

Definitely the final one. Not only do I have to find the extra 10 source material, because in the rough draft, I only need 10. So not only do I have to find 10 new ones, I need to go back to look at the old 10 sources that I had before because I don’t remember what they’re about anymore.

— P007_PARIS

This participant highlighted the challenges they faced during the final stage of the project. They specifically mentioned that the final paper required the most amount of searching compared to the earlier stages (proposal, outline, and rough draft). This aligns with the quantitative findings that showed an increase in the number of query reformulations during the final paper stage for the low group. The participant’s response also shed light on the reasons behind the increased searching during the final paper stage. They mentioned the need to find additional sources, as the requirement was to include 20 sources in total. Furthermore, the participant mentioned the importance of revisiting the old 10 sources used in the rough draft. This suggests that they recognized the need for reviewing previously accessed sources to refresh their memory and ensure accurate and relevant citations in the final paper. This aligns with the notion of low metacognition, as the participant demonstrates the lack of awareness of the requirements of the final paper for successful completion of the task.

On the other hand, a participant in the high group responded:

searching, probably in the outline stage, but the most analyzation of those sources came in the rough draft stage, and then the final is just expanding upon that.

— P021_JAVA

The response from the participant in the high group offers a contrasting perspective on the stage of the project that required the most searching. According to this participant, the outline stage involved the most searching, suggesting a different pattern of information seeking behaviors compared to the participant from the low group. Furthermore, the participant mentioned that the rough draft stage involved the most analysis of the sources they had found. This suggests that during this stage, the high group participants were actively engaging with and evaluating the information they had gathered, potentially indicating a higher level of metacognition and critical thinking skills. Lastly, the participant noted that the final stage was primarily focused on expanding upon the analysis conducted during the rough draft stage. This implies that the high group participants had already established a foundation of information and analysis, and the final stage involved building upon that groundwork. This suggests a strategic approach to information utilization, with a focus on analysis and synthesis of the gathered sources during the rough draft stage, which is reflected by higher counts of query reformulations earlier in the semester, and lower counts later in the semester.

From the above observations, we posit that the high group were more effective in their query reformulation strategies. Specifically, the high group were better able to identify new information needs as they worked on the rough draft, and then refine and specialise their queries as they worked on the final paper. This ability to adapt and refine their queries may have allowed them to find more relevant and useful information, which in turn may have contributed to their higher self-perceived learning and search outcomes.

6.5.3 Entropy of Query Reformulation Types

Stationary and transition entropies of query reformulation type (QRT) sequences.

Figure 6.8: Stationary and transition entropies of query reformulation type (QRT) sequences.

Entropy is a measure of the diversity or unpredictability of a sequence of events. In the context of search queries, entropy can be used to quantify the variability or randomness of the query reformulations issued by participants. Transition analysis and entropy helps to cover differences in disparate tasks and activities. Inspired by previous works in analysing eye-movement sequences (Krejtz et al., 2014, 2015) and search tactics sequences (He et al., 2016), we employed a similar entropy analysis of query reformulation sequences, and search tactic patterns of the participants.

For query reformulations, the possible set of states were the five query reformulation types: Generalization, Specialization, Word Substitution, Repeat, and New. If we consider a sequence of query reformulations issued by a participant (e.g., New -> New -> Specialization -> Specialization -> Word Substitution -> Generalization) then this sequence can be considered as a first order Markov chain, wherein, the next step in the chain depends only on the current state. Entropy analysis on these Markov chains quantifies how predictable the states are, and yields two categories of uncertainty measures: transition entropy, \(H_t\), and stationary entropy \(H_s\). Similar stationary and transition entropy measures can be obtained for sequences of search tactics.

Guided by the above, we conducted entropy analysis of query reformulation sequences produced by the participants. In the context of query reformulations, the maximum transition entropy, \(s \log s\), can be reached when there is an equal probability of switching between each of the \(s = 5\) states, or QRTs (query reformulation types). The minimum transition entropy (0) is achieved in a fully deterministic Markov chain, where all transition probabilities are either 1 or 0. This means that with a higher transition entropy there is more randomness in the participant’s transitions between different QRTs. This randomness is an indication that the participants do not have a clear progression from one QRT to another. On the other hand, a lower transition entropy indicates that the participant’s transition between QRTs are highly predictable. Stationary entropy is calculated from the distribution of QRTs. A higher stationary entropy value indicates that the QRTs were used uniformly, while a lower stationary entropy indicates that some QRTs are preferred over others. Values of stationary entropy vary between 0 and \(\log s\), where \(s\) is the number of possible states (QRTs)17. All the entropy values presented in this chapter were normalized by their theoretical maximums, for equivalent comparison across different tasks.

The transition entropy of query reformulations followed interesting patterns for the low and high groups (Figure 6.8(a)). The low group had a V-shaped pattern showing a decrease in transition entropy from proposal to outline, then an increase from outline to rough draft to final paper. On the other hand, the high group had a zig-zag pattern, with low transition entropy during the proposal, an increase during the outline, a decrease during the rough draft, and then another increase during the final paper. This indicates that the low group had the least randomness in query reformulation strategies during the outline, while the high group had the most randomness at this stage. Subsequently, the randomness in the high group decreased, whereas that in low group increased. This suggests that during the outline stage, the low group had a more structured approach to query reformulations, compared to the high group. However, as the semester progressed, the low group’s approach became more random, while the high group’s approach became structured. These pattern suggests that the low group may have struggled to adapt their query reformulation strategies as they moved through the different stages of writing the paper, while the high group was more able to adjust their strategies and maintain a predictable structure in their approach.

The stationary entropy of query reformulations of the low and high groups varied over the different phases of the research paper writing process as well (Figure 6.8(b)). The low group’s stationary entropy reached its maximum value at the outline phase, whereas that for the high group became maximum at the rough draft phase. This indicates that these two groups had distinct information searching behaviours throughout the writing process. The low group tried out all possible types of query reformulations during the outline (as we saw from the query reformulation counts), and then settled on using repeat queries (or hub and spoke behaviour) more, during the rough draft phase. This lowered their stationary entropy at the rough draft phase, and may have limited their ability to find new and relevant information during the later stages of the writing process.

In contrast, the high group employed all the types query reformulations with equal probability during rough draft phase (Figure 6.7), resulting in a higher stationary entropy value. The increase in the high group’s stationary entropy from outline to rough suggests that they were exploring a broader range of topics and concepts at this stage of the writing process, which may have allowed them to identify more relevant and useful information. The subsequent decrease in stationary entropy from rough draft to final paper suggests that they were able to narrow down their focus and consolidate their understanding of the subject-matter as they progressed, which may have contributed to their higher self-perceived learning and search outcomes.

6.6 URL Categorization for Analysing Interactions

In order to understand the relationship between users’ information search behavior and the type of webpages they visit, we needed to categorize webpages into different types. To accomplish this, we developed a classification system based on URL patterns.

URL patterns were first extracted from the web browsing data collected in our study. These URL patterns contain information about the structure and content of each webpage visited by the users. Based on this information, we were able to classify each URL present in the log data into the following hierarchical taxonomy:

  • L: Search Result Pages, i.e., a List of Information Objects
    • L.PUB: Publication Search Results, e.g., on university library websites, digital libraries, Google Scholar, etc.
    • L.WEB: Web Search Engine Result Pages (SERPs)
  • I: Content pages, i.e., Individual Information Objects
    • I.PUB: Academic Publications
    • I.WEB: Webpages that have the potential to provide relevant (academic) information for the search task, but are not publication. E.g., Wikipedia articles, relevant blog posts, government and non-profit websites, etc. Some of them were classified automatically (e.g., Wikipedia), while others were classified after manual inspection.
  • MISC: URLs for webpages that did not fit in any of the above category

To identify search engine result pages, we looked for URLs that contained URL query parameters such as q (Google, Bing), search, query, or k (Yahoo) along with specific strings associated with popular search engines such as Google or Bing. We also identified content pages by looking for URLs that contained strings such as “article”, “blog”, or “news”. Scientific peer-reviewed publications were identified based on URLs that contained specific strings associated with academic publishers or databases (ACM DL, Elsevier, Scopus, Springer etc.), while Wikipedia articles were identified by their URLs containing the string “wikipedia” in the hostname. Library websites were identified by URLs that contained terms such as “library”, “catalogue”, or “database” as well as specific strings associated with major library systems (e.g., UT Austin uses Primo VE system from Ex Libris). Finally, we used the catch-all category MISC to identify other types of webpages that did not fit into any of the other categories.

The URL-based classification system provided a useful way to categorize webpages based on their type, allowing us to gain insights into how users’ search behaviour varies across different types of webpages. By analysing the patterns of webpage types visited by users during their information search process, we were able to identify which types of webpages were most commonly visited and how they related to users’ search behaviour.

6.7 L: Interaction with Lists / Search Results – Source Selection

6.7.1 Number of Clicks per Query

Number of clicks per query - longitudinal phase.

Figure 6.9: Number of clicks per query - longitudinal phase.

Number of clicks per query refers to the number of times a participant clicked a link on a search result page after conducting a search query. This metric reflects the level of interaction and engagement of the participant with the search results, as well as their ability to assess the relevance and usefulness of each search result presented to them. A higher value of clicks per query may indicate that the participant is more engaged and willing to explore a wider range of information sources, while a lower value may suggest a more focused and targeted search approach. Figure 6.9 shows the trend in total, average, and variability of clicks per query for the low and high groups at different points in the semester.

The low group had much higher total (Figure 6.9(a)) and average (Figure 6.9(b)) count of clicks per query, than the high group, at all stages of the semester – proposal, outline, and rough draft – except the last stage: writing the final paper. The highest total clicks per query was during the outline phase (about 100 clicks total), while the highest average clicks per query was during the rough draft phase (about 12 clicks per query on average). The high group maintained a relatively stable clicks per query during the proposal, outline, and rough draft stages (less than four clicks per query), and had a peak value of 7 clicks per query during the final paper. Additionally, the high group had lower variability (standard deviation) in their number of clicks per query, compared to the low group (Figure 6.9(c)).

The low group’s higher total, average, and standard deviation of clicks per query throughout most of the semester, except for the final paper stage, suggests that they may have been less efficient in their searching behaviours, requiring more clicks and potentially spending more time on each query. The may have struggled to refine their search strategies, resulting in more clicks per query throughout the semester. This could have contributed to their lower self-perceived learning and search outcomes.

In contrast, the high group’s fewer clicks per query throughout most of the semester, except for the final paper stage, suggests that they may have been more efficient and effective in their searching behaviours, requiring fewer clicks and potentially finding more relevant information per query. They may have been able to refine their search strategies and be more efficient during the semester, resulting in fewer clicks per query. This indicates the high group’s ability to better assess the relevance and usefulness of search results, and to use their knowledge and cognitive strategies more effectively during the search process. This could have contributed to their higher self-perceived learning and search outcomes. The peak value of clicks per query for the high group during the final paper stage suggests that they may have needed to invest more effort in finding the most relevant and useful information for their final paper. However, even at this stage, their average number of clicks per query was still lower than the low group’s peak value during the rough draft stage.

It is also possible that the differences in average clicks per query between the two groups during different phases of the semester reflect differences in the complexity or specificity of the information needed for the different tasks. For example, the paper outline phase may have required more broad and exploratory searches, while the final paper phase may have required more targeted and specific searches.

6.7.2 Counts and Dwell Time on Search Results

Interactions with search results - Longitudinal Phase.

Figure 6.10: Interactions with search results - Longitudinal Phase.

The Ethical Dilemma research paper writing task involved searching for 20 references, and incorporating them into the narrative of the research paper. Therefore, we structure the discussion around visits to scholarly publication search results, and (non-scholarly) web SERPs. Examples of scholarly publication websites are Google Scholar, digital libraries such as ACM DL, Springer, Elsevier, PubMed and others. We examined how visits to these two categories of websites changed across the different stages of writing the research paper for each group. Figure 6.10 shows the counts of publication search results and web search results visited by the two groups, as well as total and average dwell time on such pages by both the groups.

For both publication search results and web search results, the high group visited more of those webpages than the low group (Figure 6.10(a) and (b)). The high group visited more publication search results during the outline phase, and more web SERPs during the rough draft phase. The low group on the other hand, had a dip in their visit count for both the search result categories during the rough draft phase. However, they had significantly longer average dwell time on academic publication search results, in all the four stages, compared to the high group \((U = 153.0, p = .01, ES = 0.77)\) (Figure 6.10(e)). Total dwell time of the low group was also longer, except at the final paper stage, when the high group surpassed the low group in dwelling on publication search results. The difference was approaching significance \((U = 138.0, p = .08, ES = 0.70)\) (Figure 6.10(c)). This suggests that the low group spent more time examining and considering scholarly publication search results compared to the high group. It is possible that the low group’s lower levels of self-regulation and metacognition may have led them to spend more time on search results, as they may have found it harder to quickly evaluate and assess the relevance of search results to their task. On the other hand, the high group’s higher levels those individual traits may have enabled them to quickly identify relevant search results and move on to the next stage of their task. However, at the final paper stage, the high group spent more time on search results as they may have needed to ensure that they had not missed any important information and had thoroughly covered their topic.

In general, the high group engaged more with web search results, and less with scholarly publication search results. The low group demonstrated the opposite pattern. This is an interesting finding. The high group may have relied more on web search engines and popular sources, such as news articles or blogs, to get a broader understanding of their research topic and its context. In contrast, the low group may have relied more on scholarly publications to find more in-depth and specialized information. They may have been more thorough in searching for and evaluating academic publications, which was arguably one of the main aspects of writing the research paper. This difference in strategy may have contributed to the difference in perceived learning outcomes between the two groups. The high group may have been able to find a broader range of information from a variety of sources, while the low group may have limited themselves to more specialized sources. It is also possible that the high group’s use of web search engines may have resulted in them encountering more diverse perspectives and interpretations of the research topic, which could have enhanced their critical thinking and analytical skills.

6.8 I: Interaction with Information Objects – Sources / Content Pages

Interactions with content pages - Longitudinal Phase.

Figure 6.11: Interactions with content pages - Longitudinal Phase.

We analysed interactions with content page in the same manner as counts and dwell time on search results (Section 6.7.2): looking at visits to scholarly publications, vs visits to non-scholarly content pages. Figure 6.11 shows the counts of scholarly publications and non-scholarly content pages visited by the low and high groups, as well as total and average dwell time on such pages by both groups.

In a similar vein to publication vs web search results (Section 6.7.2), the high group engaged less with scholarly publications, and more with non-scholarly content pages, compared to the low group. Earlier in the semester, while writing the outline, the low group visited more academic publications, while the high group visited more content pages. The trend reversed later in the semester while writing the final paper, when the low group viewed fewer publications and more content pages, while the high group did the opposite (Figures 6.11(a) and (b)).

Speaking of dwell times, the low group spent significantly more time viewing academic publications, in total \((U = 219.0, p = .04, ES = 0.71)\) and on average \((U = 227.0, p = .02, ES = 0.74)\), compared to the high group (Figures 6.11(c) and (e)). While writing the rough draft, the high group spent more time viewing non-scholarly content pages.

The high group’s preference for non-academic content pages may reflect their use of web search engines to find relevant information, as such search engines may often prioritize non-academic content over scholarly publications. The high group seemed to be more focused on finding information that was relevant to their topic, regardless of its source. They also appeared to be more interested in exploring a wider range of topics and concepts, which is reflected in their higher stationary entropy of query reformulations during the rough draft phase.

On the other hand, the low group’s preference for academic publications may reflect their reliance on scholarly publication search engines to find information, as they may have perceived that being the main ask of the research paper. The low group seemed to be more concerned with finding information from scholarly publications, perhaps as a way to demonstrate the rigour of their research. They also appeared to have a more limited scope of inquiry, as reflected in their lower stationary entropy of query reformulations during the rough draft phase.

Overall, these findings suggest that the high group may have been more creative and adaptable in their information searching behaviours, while the low group may have been more rigid and focused on adhering to traditional academic norms.

6.9 Search Result Pages vs Content Pages

Differences in interactions with search results vs. content pages.

Figure 6.12: Differences in interactions with search results vs. content pages.

Figure 6.12 describes differences in visits to all search results pages versus all content pages (scholarly and non-scholarly combined), for the high and low groups. The high group visited more search result pages and content pages as the semester progressed, while the low group had a drop in these visits after the outline phase (Figure 6.12 (a), (b)). The difference in the count of Search Results pages visited was almost significant \((U = 136, p = .07, ES = 0.33)\). This suggests that the high group may have been more engaged and persistent in their information searching behaviours, while the low group may have experienced a decrease in motivation or self-regulation as the semester progressed, specifically before writing the rough draft. It is also noteworthy that the low group had a rebound in the count of pages visited while writing the final paper. This could indicate that they were able to re-engage with the task, and their information searching behaviours improved as the final-paper deadline approached.

However, although the high group visited more webpages in general, they dwelt less on search results, and more on content pages. The differences are significant for total and average dwell times on search results (Figure 6.12 (c) and (e)). This indicates that the high group was more efficient at evaluating search result pages and quickly identifying relevant content. On the other hand, the low group may have spent more time on search result pages, possibly indicating that they had a more challenging time evaluating and selecting relevant results. For instance, when asked when if the participant had an experience where initially they thought they had found a useful resource, but upon later examination, they found it almost useless, a participant in the high group said:

I don’t think so. I feel like I used all the papers that I found in my outline in my final paper, like some sort of degree.

— P023_LONDON

Another participant said:

I did find some articles that weren’t useful, but I think I kind of quickly ruled them out all I’m looking like, from the title, they looked okay. Or sometimes even from the title, I just, like, didn’t even look at them. But if I like looked at them a little bit closer and just clicked into them. And I didn’t really find what they were saying very useful. Like, even in the introduction, I would just click out of them.

— P012_MIAMI

The responses from participants in the high group suggest that they were more efficient at evaluating search result pages and quickly identifying relevant content. The first participant, P023_LONDON, mentioned using all the papers found during the outline stage in their final paper, indicating a careful selection process early on and a successful integration of relevant sources throughout the writing process. The second participant, P012_MIAMI, described their approach of quickly ruling out articles that appeared less useful based on title or a cursory examination. This suggests a discerning approach to source evaluation and a proactive effort to focus only on the most relevant materials.

In contrast, the response from a participant in the low group highlights the challenges they faced in identifying relevant sources:

Yes. Quite a lot, actually. A lot of them, like I saw Google highlights certain parts of the paper where it has the key words that I put in, and then I read about it. And it only briefly mentioned about that word or a couple of times. The distributed justice part, there’s quite a lot. There’s a lot of papers that has distributive justice in it. And then it’s not really relevant. It just talks about distributive justice. Overall, it doesn’t talk about distributive justice, as in terms of event, ensuring privacy. So it was pretty irrelevant.

— P007_PARIS

The participant mentioned encountering papers that initially appeared useful based on Google’s highlighted keywords, but upon closer examination, found that they did not actually address the relevant topic of interest. This demonstrates the difficulties the low group participant experienced in assessing the relevance and content of potential sources. These qualitative responses provide valuable insights into the differences in information evaluation and selection processes between the high and low groups. The high group participants seemed to possess more efficient strategies for quickly identifying relevant sources and filtering out irrelevant ones. In contrast, the low group participants faced challenges in accurately assessing the relevance and content of potential sources, leading to more time spent on search result pages and potentially encountering less relevant materials. These differences in information behaviour could be related to the higher self-perceived learning outcomes and search outcomes seen in the high group, as they were able to find and engage with relevant content efficiently.

6.10 Entropy of Search Tactic Sequences

Entropy of search tactic sequences for longitudinal phase.

Figure 6.13: Entropy of search tactic sequences for longitudinal phase.

Similar to the entropy analysis of Query Reformulation types (Section 6.5.3), we performed entropy analysis of search tactic transitions. This analysis is directly inspired by He et al. (2016), who in turn were inspired by Krejtz et al. (2015). To characterise the entropy of students’ search tactics, we used the following set of states, or tactics:

  1. QUERY: issuing a search query
  2. CLICK: mouse click
  3. IDLE: participant stays idle for more than one minute (adapted from Taramigkou et al. (2018))
  4. SESSION_BREAK: participant becomes idle for more than 30 minutes (Google Analytics defines a session break as 30 minutes of inactivity 18)
  5. L.PUB: visiting a publication search result page
  6. L.WEB: visiting a web SERP
  7. I.PUB: visiting a scholarly publication
  8. I.WEB: visiting a non-scholarly content page
  9. TASKPAGE: visiting webpages related to the study, i.e. the Qualtrics questionnaires

In the context of search tactic transitions, a higher value of transition entropy (\(H_t\)) indicates more randomness and uncertainty in the participant’s search behaviour (which is composed of different search tactics, and transitioning or switching between those tactics). A lower value of transition entropy indicates that the search behaviour (i.e. tactic switching behaviour) is highly predictable. For stationary entropy of search tactics (\(H_s\)), a higher value indicates that participants utilize all the search tactics with equal probability, while a lower value suggests that certain search tactics are favoured over others. Figure 6.13 describes the trends in this values over the duration of the semester.

The low group demonstrated a sharp increase in both transition entropy and stationary entropy of search tactics from the proposal stage to the outline stage. Both entropies subsequently decreased in the final paper stage. The high group had a relatively stable trend in their transition entropy of search tactics, while their stationary entropy decreased as the semester progressed. Also, the high group’s stationary entropy was significantly lower than that of the low group \((U = 252.0, p = .04, ES = 0.70)\).

The low group’s sharp increase in transition and stationary entropy from the proposal to the outline stage suggests that they were exploring a wider range of search tactics, and were less certain of which tactics to use during the early stages of the writing process. This is consistent with their high counts of query reformulations, and tendency to click on many search results and content pages. The subsequent decrease in both entropies in the final paper stage suggests that the low group became more focused and efficient in their search tactics, possibly due to the feedback they received during the earlier stages of the writing process.

The high group’s relatively stable trend in transition entropy suggests that they were consistent in their use of search tactics throughout the writing process. However, their decreasing stationary entropy suggests that they became more focused and efficient in their use of search tactics as the semester progressed, which is consistent with fewer query reformulations, and tendency to dwell more on content pages.

These quantitative findings align with the qualitative responses from the participants. When asked if they had a plan before starting the project, or starting to search, a participant in the low group responded:

Not really. My plan was just to find sources and see how they kind of fit in with my research question. And if I didn’t really find anything, I would just alter the research question.

— P012_MIAMI

In contrast, a participant in the high group responded:

whenever I start searching, I go in with the mindset of: this is what I’m searching for… I always try to reference back to that main goal to help guide my searching process,… like, Okay, I’m searching for content articles, right? And so I’m not gonna go ahead and get distracted by anything else, that’s going to be like my main goal. And as long as I’m able to go ahead and reference that back to my main goal, then I’m doing a good job.

I feel this is something that I’ve gotten used to being a college student. I realized, this is probably the most efficient way to search for things (and) not get overwhelmed. Breaking it down and having a very clear set goal makes things more manageable for me.

— P015_LIMA

The participant from the low group expressed a lack of a specific plan before starting the project or the search process. They mentioned their approach of finding sources and altering their research question if they did not find anything relevant. This suggests a more exploratory and adaptive approach to the search process, aligning with the higher entropy observed in the quantitative analysis. In contrast, the participant from the high group described a clear plan and a goal-oriented mindset before starting the search process. They emphasized the importance of having a main goal and using it as a reference point to guide their search process efficiently. This aligns with the lower entropy observed in the high group and suggests a more focused and directed approach to information searching.

Overall, these quantitative results from the entropy analyses, combined with the qualitative responses from the participants, provide further support and insights into the differences in search tactics and planning approaches of the high and low groups.

6.11 Summary of Longitudinal Phase

We summarise the findings from this chapter, on the longitudinal searching-as-learning task of writing a research paper, as follows.

Participants were divided into high and low groups based on their self-reported values of motivation, metacognition, self-regulation, and memory span. Their search behaviour and perceived learning and search outcomes were studied over the course of a semester. During the longitudinal phase, both the high and low groups exhibited changes and differences in their search behaviour from the proposal to the outline stage to the rough draft, and finally to the final paper stage.

Investigating into specific interaction patters, we saw that the high group visited more search result pages, but spent less time dwelling on them. They also visited more content pages in the later parts of the semester, and spent more time dwelling on them. The low group visited fewer search result pages, but spent more time on evaluating search results.

Transition entropy refers to the uncertainty of the next search tactic that a user will apply during their search. In other words, it is a measure of how much the user explores different search tactics over the course of their search. Stationary entropy refers to the uncertainty of the search tactic distribution over time. It is a measure of how stable or consistent the user is in their use of search tactics throughout their search process. The low group showed sharp increases in both transition entropy and stationary entropy of search tactics from the proposal stage to the outline stage, followed by a decrease in both entropies in the final paper stage. This indicates that their search tactics became more diverse and unpredictable in the middle stages of the task, possibly due to uncertainty and lack of clarity on the topic. The high group had a relatively stable trend, or gentle increase in their transition entropy of search tactics, indicating that they had a consistent and well-defined search strategy. This group’s stationary entropy decreased as the semester progressed, suggesting that they became more focused and efficient in their search tactics over time. Additionally, the high group’s stationary entropy was significantly lower than that of the low group, indicating that they had a more structured and organized approach to their search tactics.

Regarding learning and search outcomes, the high group demonstrated significantly higher perceived learning and search outcomes compared to the low group. Although the high group received slightly better grades on their research paper assignments, though there was no significant difference between the two groups. Possibly, other factors beyond self-perceived searching and learning outcomes may have contributed to the overall grade.

References

Ambrose, S. A., Bridges, M. W., DiPietro, M., Lovett, M. C., & Norman, M. K. (2010). How Learning Works: Seven Research-Based Principles for Smart Teaching. John Wiley & Sons.
Collins-Thompson, K., Rieh, S. Y., Haynes, C. C., & Syed, R. (2016). Assessing learning outcomes in web search: A comparison of tasks and query strategies. Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval, 163–172.
Hassan, A., White, R. W., Dumais, S. T., & Wang, Y.-M. (2014). Struggling or exploring? Disambiguating long search sessions. Proceedings of the 7th ACM International Conference on Web Search and Data Mining, 53–62.
He, J., Qvarfordt, P., Halvey, M., & Golovchinsky, G. (2016). Beyond actions: Exploring the discovery of tactics from user logs. Information Processing & Management, 52(6), 1200–1226.
Krejtz, K., Duchowski, A., Szmidt, T., Krejtz, I., González Perilli, F., Pires, A., Vilaro, A., & Villalobos, N. (2015). Gaze transition entropy. ACM Transactions on Applied Perception (TAP), 13(1), 1–20.
Krejtz, K., Szmidt, T., Duchowski, A. T., & Krejtz, I. (2014). Entropy-based statistical analysis of eye movement transitions. Proceedings of the Symposium on Eye Tracking Research and Applications, 159–166.
Liu, C., Gwizdka, J., Liu, J., Xu, T., & Belkin, N. J. (2010). Analysis and evaluation of query reformulations in different task types. Proceedings of the American Society for Information Science and Technology, 47(1), 1–9.
Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 50–60.
McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111(2), 361.
Taramigkou, M., Apostolou, D., & Mentzas, G. (2018). Leveraging exploratory search with personality traits and interactional context. Information Processing & Management, 54(4), 609–629.
Vallat, R. (2018). Pingouin: Statistics in python. J. Open Source Softw., 3(31), 1026.
Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and wong. Journal of Educational and Behavioral Statistics, 25(2), 101–132.