3 Background: Information Searching

This second chapter on background literature discusses relevant concepts from the disciplines of Information Sciences, and more specifically Interaction Information Retrieval. First, we introduce some terminology around information behaviour, information need, and information relevance. Then we discuss relevant findings various empirical studies, from the lens of three-stage interactions in the information search process. Then we discuss some overall generic characteristics of information search behaviour, and how they are linked to expertise and working memory. Next we discuss how learning has been assessed in recent search-as-learning studies. We also discuss some limitations of current search systems to foster learning, including the lack of sufficient number of longitudinal studies. In the last section, we state what implications these findings had for shaping the longitudinal study conducted for this dissertation.

3.1 Terminology

Information retrieval (IR) is the process of obtaining information objects, that are relevant to an information need, from a collection of those objects (Wikipedia). Information objects are entities that can potentially convey information. They can take many forms, such as documents, webpages, facts, music, spoken words, images, videos, artefacts, and other forms of human expression. Areas where information retrieval techniques are employed include search engines, such as web search, social search, and desktop search; media search, as in image, music, video; digital libraries and recommender systems, as well as domain specific applications like geographical information systems, e-Commerce websites, legal information search, and others.

Multiple perspectives exist around how users interact with information, and IR systems. In the Search Engine application view, the interactions are restricted to the search engine interface. In the Human-computer interaction (HCI) view, interactions are between a person and a system; but the system can go beyond supporting only retrieval, to supporting more complex tasks. In the cognitive view of IR, which is the broadest, the interactions for obtaining information can be between a person and a system, as well as between people, for retrieval of information.

Figure 3.1: Nested model of information behaviour by T. D. Wilson (1999).

People’s behaviour around information can be modelled as a nested Venn diagram as proposed by T. D. Wilson (1999) (Figure 3.1). Information behaviour is the more general field of investigation. Information-seeking behaviour can be seen as a sub-set of the field, particularly concerned with the variety of methods people employ to discover, and gain access to information objects. Information search behaviour is yet a sub-set of information-seeking, concerned with the interactions between the user and computer-based information systems. In this dissertation, we focus on information search rather than the other two higher hierarchical concepts. This is because online IR systems, such as search engines or digital libraries, have become the primary source for people to obtain information in modern times, and web search is becoming ever more pervasive and ubiquitous in our day-to-day lives.

The field of interactive information retrieval (IIR) posits that IR systems should operate in the way that good libraries do. Good libraries provide both the information a visitor needs, as well as a partner in the learning process — the information professional — to navigate that information, make sense of it, preserve it, and turn it into knowledge. As early as in 1980, Bertram Brookes stated that searchers acquire new knowledge in the information seeking process (Brookes, 1980). Fifteen years later, Gary Marchionini described information seeking, as “a process, in which humans purposefully engage in order to change their state of knowledge” (Marchionini, 1995). So we have known for quite a while that search is driven by the higher-level human need to gain knowledge. Information Retrieval is thus a means to an end, and not the end in itself. Thus, the ideal IR system should not only help users to locate information, but also help them to bridge the gap between information and knowledge.

This brings us to the concept of information need. Information Need is the desire to locate and obtain information to satisfy a conscious or unconscious human need. Most search systems of today assume that the search query is an accurate representation of a user’s information need. However, Belkin et al. (1982) observed that in many cases, users of search systems are unable to precisely formulate what they need. They miss some vital knowledge to formulate their queries. As humans, we have difficulty in asking questions about what we do not know. Belkin called this phenomenon as Anomalous State of Knowledge, or ASK. Later, Huang & Soergel (2013) identified an exhaustive set of criteria that should be considered in order to ideally represent a user’s information need. These criteria for information need are highly dependent on the user context: user attributes, tasks or goals, as well as the situation the user is embedded in. This brings us to another closely related concept: information relevance.

Relevance is a fundamental concept of Information Science and Information Retrieval, and perhaps the most celebrated work in this area has been done by Tefko Saracevic (Saracevic, 1975, 2007a, 2007b, 2016). Webster dictionary define relevance as “a relation to the matter at hand”. In most circumstances, relevance is a “y’know” notion. People apply it effortlessly, without anybody having to define for them what “relevance” is. This creates one of the most fascinating challenges in the information field: humans understand relevance intuitively, while it is an open research problem to represent relevance effectively for use by algorithmic systems. The situation becomes more interesting because relevance always depends on context, and the context is ever dynamic, as the matter at hand changes.

3.2 Three-stage Interactions with Online Search Systems

Figure 3.2: Models of information search process, with our coloured annotations identifying the three stages: (1) query formulation, (2) list-item selection, and (3) item examination.

As we saw in the previous section, information search behaviour is the (study of) interactions between a user, and digital Information Retrieval (IR) systems. The field of Information Science/Studies has developed multiple models explaining how information search works (T. D. Wilson, 1999). A few of them are presented in Figure 3.2. Across many of these models, we observe that most major Information Retrieval (IR) systems have three fundamental ways of letting users interact with the system, and the underlying information: (1) an interface for entering search queries; (2) an interface for viewing and evaluating a list of retrieved information-objects, or search results; (3) an interface for viewing and evaluating individual information-objects. For instance, Marchionini (1995)’s ISP model hints at these three interfaces in the fourth, sixth and seventh stages, namely “formulate query”, “examine results”, and “extract info”. Spink (1997)’s model of the IR interaction process consists of sequential steps or cycles, and each cycle comprises one or more interactive feedback occurrences of user input (query), IR system output (list), and user interpretation and judgement (of individual information-objects). Consequently, findings from the large body of empirical research in interactive IR (especially those with web based search systems) can be grouped around these thee stages of interactions with search systems:

Stage 1: search query (re)formulation
Stage 2: list-item selection: search results evaluation (aka source selection)
Stage 3: item examination: content page evaluation (aka interacting with sources)

The discussions in the following subsections are based around these three stages of interactions. The empirical studies discussed below generally follow some common principles of user studies in Interactive IR (IIR) (Borlund, 2013; Kelly, 2009): participants are presented with a search task or search topic, and then they are asked to search the internet (or a simulation of the open web) for information. During the search, the various interactions (queries, clicks, webpages opened etc.) are recorded, and these are analysed and correlated with other sources of data to answer research questions.

3.2.1 Stage 1: Query (Re)formulation

How do users behave when submitting search queries (to an IR system)?

Figure 3.3: Comparison of Query Reformulation Types (QRTs) proposed by Boldi et al. (2009) and C. Liu et al. (2010).

Query formulation is the process of composing a search query that describes the information need of a searcher. Query reformulation refers to the act of either modifying a previous query, or creating a new query. Query reformulation typically occurs due to a searcher’s improved understanding of how to better translate their information need into a search query. The relationship between two successively issued queries have been classified in a number of ways. These classifications are called Query Reformulation Types, or QRTs. Amongst many other, Boldi et al. (2009) used cognitive aspects of the searchers issuing the query to propose a taxonomy of QRTs, while C. Liu et al. (2010) proposed a similar taxonomy focusing more on the linguistic properties of the two successive queries. These are compared and contrasted in Figure 3.3.

Task-type, task-topic, task-goal, and domain-expertise were found to influence query reformulation patterns of searchers (Eickhoff et al., 2015; Jiang et al., 2014; Mao et al., 2018). At first glance, a significant portion of the query reformulation terms ($\sim86\%$) seemed to be coming from the task-description itself (Jiang et al., 2014; Mao et al., 2018). This was characterized by significantly more fixations on the task-description, rather than other SERP elements. Jiang et al. (2014) and Mao et al. (2018) investigated this phenomenon further. Jiang et al. (2014) controlled for the task-type and task-goal, using the faceted-framework by Li & Belkin (2008). Mao et al. (2018) controlled for the task-topic and the domain-expertise of the searchers.

If search tasks had factual goals, searchers relied heavily on the task-description for reformulating their queries (Jiang et al., 2014). For interpretive tasks (intellectual tasks with specific goals), users spent more time reading search result surrogates, before reformulating their queries. This was observed by increased eye-fixations (indicative of visual attention) and dwell time on search result snippets (surrogates). For exploratory tasks, searchers fixated the longest on query-autocompletion (QAC) suggestions, indicating that they were possibly looking for help and suggestion based on their specific query, as the search-task had non-specific (amorphous) goals.

Searchers also relied on the task-description for reformulating queries, when the search-task was outside their domain of expertise (Mao et al., 2018). For in-domain tasks, they used query terms from their own knowledge, that were not fixated on in visited SERPs and content pages. Eickhoff et al. (2015) reported that a significant share of new query terms came from visited SERPs and content pages, and query reformulation (specialization) often did not literally re-use previously encountered terms, but highly related ones ⁸ instead. These observations can possibly be explained by Mao et al. (2018)’s findings: when exploring a new domain, the searcher may accumulate vocabulary and learn how to query during the search; when performing in-domain search-tasks, the searcher may have enough prior knowledge to come up with effective query terms. It was also seen that searchers from medicine domain used more unread query terms for their in-domain search-tasks, compared to politics and environment domains (Mao et al., 2018). This suggested that domain knowledge and expertise is more important for formulating good search queries in highly technical disciplines (e.g., medicine), compared to less technical domains (e.g., politics).

Figure 3.4: Investigating user-interactions with queries: (a) Visualizing the distribution of retrieved search results prior to running a query, for helping searchers understand their queries’ effectiveness (Qvarfordt et al., 2013). The visualization is a stacked column chart with ten columns. Each column represents ten search results: first column represents results ranked 1-10, second column represents results 11-20, etc. Individual columns have three divisions, indicating the counts of results that: are already seen by the searcher (dark blue, top), will be re-retrieved, but have not been seen by the searcher (medium blue, middle), and will be newly-retrieved (bright teal blue, bottom). The system evaluates the searcher’s query continuously as it is being typed, and updates the visualization in real-time. (b) Interfaces for examining interactions with query auto-completion (QAC), by (i) Smith et al. (2016), and (ii) Hofmann et al. (2014) (overlaid with heatmaps of eye fixations for all participants). This figure is best viewed in colour.

Query Auto Completion (QAC) is a technological feature that suggests possible queries to web search users from the moment they start typing a query. It is nearly ubiquitous in modern search systems, and is thought to reduce physical and cognitive effort when formulating a query. QAC suggestions are usually displayed as a list (Figure 3.4(b) and (c)), and users interact in a variety of ways with the list. Hofmann et al. (2014) observed a strong position bias among searchers who examined the QAC list: the top suggestions received the highest visual attention, even when the ordering of the suggestions were randomized. Average fixation time decreased consistently on suggested items from top to bottom. Even when the ranking of suggestions were randomized, time taken to formulate queries did not significantly differ.

Search topics were found to have a large effect on QAC usage (Jiang et al., 2014; Smith et al., 2016). Search was easiest for the topics with the highest QAC usage. Total eye-gaze duration was longest when visual attention was shared between the QAC suggestions and the actual search query input box. Some additional time was probably due to decision making on whether to use a QAC suggestion. Typing was faster when a QAC was not used. However, the IR system’s retrieval performance (measured using NDCG@3), was greater when QAC was used. So Smith et al. (2016) speculated that the value of using QAC suggestions was realized later in the search session by users, when they saw a reduction in the number of additional queries needed, or an increase in the value of the information found.

Figure 3.5: Comparison of User behaviour profiles identified around Query Auto-Completion (QAC), from eye-tracking data, by Hofmann et al. (2014) and Smith et al. (2016).

Several user behavioural profiles were identified by exploring associations between visual attention from eye-tracking, search interactions from mouse and keyboard activity, and the use of QAC suggestions (Hofmann et al., 2014; Smith et al., 2016). These profiles are described in Figure 3.5. An interesting, yet common-sense observation was that participants’ touch-typing ability greatly influenced their interactions with QAC suggestions.

The native language of searchers was found to influence their overall querying and searching behaviour. Ling et al. (2018) explored this space using four variations of a multi-lingual search interface. They observed that participants strongly preferred to issue queries in their first or native language. A second or non-native language was the next preferred choice. Mixing of first and second-languages occurred very rarely. In 80% of the total 300 tasks (25 users $\times$ 4 interfaces $\times$ 3 task-types), participants used a single language for querying. In the rest 20% of the tasks, participants switched languages for querying, with a transition from first language to second language being the most common.

3.2.2 Stage 2: Search Results Evaluation / List-Item Selection

How do users behave when examining a list of information-objects (returned by an IR system)?

After a user submits a query to an IR system, the next action they generally perform is examining and evaluating the list of search results returned by the IR system. In this section, we discuss empirical studies which investigated information-searching behaviour around a list of information-objects, or a representation of information-objects (also called surrogates). We identified some common themes in the research questions investigated. The discussion below is grouped along these themes, as relationships between search behaviour and: (i) ranking of search results; (ii) information shown in search results; (iii) individual user characteristics; and (iv) relevance judgement and feedback.

Figure 3.6: Example interfaces for studying user-interactions with a search-engine results page (SERP): (a) a simplified SERP without query input facility, to judge relevance of search results (on a 4-level scale) for pre-determined search queries (in this case ‘why do airplanes have differently shaped wings?’), from Scharinger et al. (2016); (b) eye-tracking heatmap on an organic SERP from Buscher et al. (2010; Dumais et al., 2010), showing the F-shaped pattern of visual attention; (c) a multilingual SERP from Ling et al. (2018). This figure is best viewed in colour.

3.2.2.1 Ranking of search results

Most search engines display results in a rank ordered list, with the highest algorithmically relevant results placed at the top, and others results ordered below. Granka et al. (2004; Lorigo et al., 2008) studied eye-movement behaviour of searchers examining SERPs, and reported observations from three user studies. They saw that in 96% of the queries, participants looked at only the first result page, containing the top 10 results. No participant looked beyond the third result page for a given query. Participants looked primarily at the first few results, with nearly equal attention (dwell time) given to the first and the second results. However, despite equal attention, the first result was clicked 42% of the time, while the second was clicked only 8% of the time. If none of the top three results appeared to be relevant, then users chose not to explore further results, but issued a reformulated query instead. When the ranking of the search results were reversed (i.e. placing less relevant results in the higher ranked positions), participants spent considerably more time scrutinizing and comparing results (more fixations and regressions) before making a decision to click or reformulate.

Some effects of gender were found to influence SERP examination (Lorigo et al., 2008). Females clicked on the second result twice as often, and made more regressions or repeat viewings of already visited abstracts, compared to males. Males were more likely to click on lower ranked results, from entries 7 through 10, and also look beyond the first 10 results significantly more often than women. Males were also more linear in their scanning patterns, with less regressions. Pupil dilation did not differ significantly between gender groups.

Effects of task-type and task-goals also influenced SERP examination behaviour. Guan & Cutrell (2007) used Broder (2002)’s taxonomy of navigational vs. informational searches. The authors reported that when users could not find the target results for navigational searches, they either selected the first result, or switched to a new query. However, for informational searches, users rarely issued a new query and were more likely to try out the top-ranked results, even when those results had lower relevance to the task. This illustrated possible strong confidence of searchers in the search engine’s relevance ranking, even though searchers clearly saw target results at lower positions. Thus, people were more likely to deprecate their own sense of objective relevance and obeyed the ranking determined by the search engine. Jiang et al. (2014) used Li & Belkin (2008)’s framework of search-tasks, and saw that in tasks having specific goals, searchers fixated more on lower ranked results after some time. On the other hand, for tasks having amorphous goals, there was a wider breadth in viewing the SERP, and less effort spent in viewing the content pages. Fixations tended to decrease as search session progressed, indicating decreased interest and increasing mental effort, which could demonstrate satisficing behaviour (Simon, 1956). A comprehensive overview of various behavioural traits associated with task-types and task-goals can be found in (Jiang et al., 2014 Table 8).

3.2.2.2 Information Shown in Search Results (Surrogates)

The amount and quality of different kinds of information shown on SERPs also affected user’s information searching behaviour. Cutrell & Guan (2007) saw that as the length of the surrogate information (result snippets) was increased, user’s search performance improved for informational tasks, but degraded for navigational tasks (Broder, 2002). Analyzing eye-tracking data, they posited that the difference in performance was due to users paying more attention to the snippet, and less attention to the URL located at the bottom of the search result. This led to performance deterioration in navigational searches. Buscher et al. (2010) studied the effects of the quality of advertisements placed in the SERPs (Figure 3.6(b)). Similar to findings discussed above, a strong position bias of visual attention was found towards the top few organic result entries — the well known F-shaped pattern of visual attention — which was stronger for informational than for navigational tasks. However, a strong bias against sponsored links was observed in general. Even for informational tasks, where participants generally had a harder time finding a solution, the ads did not receive any additional attention from the participants. Lorigo et al. (2008) compared the visual attention patterns of searchers using two different search engines: Google, and Yahoo!. Behavioural trends followed similar patterns for both search engines, even though Google was rated as the primary search engine of all but one of the participants. They found slight variations in some eye-tracking measures (reading time of surrogates, time to click results, and query reformulation time), and some self-reported measures (perceived ease of use, perceived satisfaction, and success rate). However, none of these differences were statistically significant.

The novel query-preview interface by Qvarfordt et al. (2013) was discussed in Section 3.2.1 and in Figure 3.4(a). The authors also reported several observations about user behaviour on SERPs. They saw that the presence of the preview visualization enabled participants to look deeper into the results lists. Participants tried to use the preview as a navigation tool, although it was not designed as such. The tool increased the rates at which participants examined documents at middle ranks in query results, and thus helped discover more useful documents in those middle ranks than without the preview widget. The preview tool also helped to increase the diversity of documents found in a search session, which could in turn lead to better performance in terms of recall and precision. Thus, the tool helped searchers overcome the strong position bias towards top-ranked results, as observed by other studies discussed previously.

Figure 3.7: Effects of differences in user characteristics on interactions with SERPs: (a) exhaustive or depth-first user (User 1), vs. economic or breadth-first user (User 2), examining mostly irrelevant results in Task A, and mostly relevant results in Task B (both users followed the second link in Task B); vertical axis denotes vertical location on SERP, and horizontal axis denotes temporal ordering of result examination; from Aula et al. (2005); (similar patterns were identified by Bilal & Gwizdka (2016), in the SERP examination behaviour of children) (b) children vs. adults examining SERPs from a German search engine for children (left), and Google (right); differently from adults, children exhaustively explored all search results, paid more attention to thumbnails and embedded media, and read less text-only snippets; from Gossen et al. (2014). Similar observations as with children were reported for searchers with dyslexia (Palani et al., 2020). This figure is best viewed in colour.

3.2.2.3 Individual User Characteristics

Individual traits of searchers also influence their pattern of interactions with a SERP, and these patterns can be revealed by analyzing eye-tracking data. For instance, searchers have been classified as economic vs. exhaustive, based on their style of evaluating SERPs (Aula et al., 2005). Economic searchers were found to scan less than half (three) of the displayed results above the fold, before making their first action (query re-formulation, or following a link). Exhaustive searchers evaluated more than half of the visible results above the fold, or even scrolled the results page to view all of the results, before performing the first action. Thus, economic searchers demonstrated depth-first search strategy, while exhaustive users favoured the breadth-first approach (Figure 3.7(a)). Dumais et al. (2010) demonstrated the use of unsupervised clustering to re-identify the economic-exhaustive user groups, based on differences in total fixation impact ⁹, scanpaths, task outcomes, and questionnaire data. The economic cluster was further broken down by users who looked primarily at results (economic-results cluster), and users who viewed both results and ads (economic-ads cluster). All three groups spent the highest amount of time on the first three results, with the exhaustive group being substantially slower than the other two groups. The exhaustive and economic-results groups spent the second-highest amount of time on results four through six, while the economic-eds group spent this time on the main advertisements. This group spent more than twice as much time on the main ads as the economic-results group, and even more time on main ads than the exhaustive group. This observation is incongruent to Buscher et al. (2010)‘s findings, as they observed a generally strong bias against viewing sponsored links. Abualsaud & Smucker (2019) conducted further analysis using these user types, and, in general, reconfirmed the previous findings. They found that the results above the fold, especially, the first three search results are special, more so for economic users. On submitting a ’weak’ query, if economic users did not find a correct result within the first three results, they abandoned examination, and reformulated their query.

Age of searchers also influence SERP evaluation behaviour. Gossen et al. (2014) demonstrated differences in SERP evaluation for children and adults (Figure 3.7(b)). When answers were not found within the top search results, the adults reformulated the query starting a new search, while young users exhaustively explored all the ten results, and used the navigation buttons between results pages to continue further examination. Children also paid more attention to thumbnails and embedded media, and focused less on textual snippets. Children saw the query suggestions at the bottom of the Google SERP (because they navigated to the bottom), while the adults did not. Bilal & Gwizdka (2016; Gwizdka & Bilal, 2017) investigated this phenomenon further, and observed that even within children, age plays a role in SERP evaluation behaviour. Younger children (grade six, age 11) clicked more often on results in lower-ranked positions than older children (grade eight, age 13). Older children’s clicking behaviour was based more often on reading result snippets, and not just on the ranked position of a result in a SERP. Whereas, younger children made less deliberate choices in choosing which result to click, and were more exhaustive in the exploration of results. Thus, using Aula et al. (2005)’s classification and Dumais et al. (2010)’s observations, it can be posited that (younger) children start out as exhaustive searchers. With increase in age and maturity, older children and adults evolve into economic searchers. Interestingly, very similar behaviour patterns as with children (scrolling further down on SERPs, exhaustive exploration, etc.) were also observed recently for searchers with dyslexia (Palani et al., 2020).

Searcher’s native language also influenced SERP interaction behaviour (Ling et al., 2018) (Figure 3.6(c)). We discussed in Section 3.2.1 that users strongly preferred issuing queries in a single language, especially their native language. However, while examining SERPs, they marked search results in both their first language and second language to be relevant, to an equal degree. This confirms the usefulness of search result pages that integrate results from multiple languages. However, a clear separation in the language of the search results was strongly preferred, and an ‘interleaved’ presentation (e.g. odd numbered results in one language and even numbered results in another language) was least preferred.

$Google search engine result page (SERP) for the queries: (a) “coronavirus” (b) “toyota” (c) “evaporation”, and (d) “life of pie”. All screenshots are from ‘above-the-fold’, viewed on a $2560 \times 1440$ monitor. These examples highlight that modern SERPs have come a long way from a list of “ten blue links”. SERPs are becoming consumable information-objects in their own right, and thus require different kinds of cognitive processing and interactions, than from the early days of the internet. Inspired and adapted from Wang et al. (2018). Accessed on May 5, 2020. This figure is best viewed in colour.$

Figure 3.8: Google search engine result page (SERP) for the queries: (a) “coronavirus” (b) “toyota” (c) “evaporation”, and (d) “life of pie”. All screenshots are from ‘above-the-fold’, viewed on a $2560 \times 1440$ monitor. These examples highlight that modern SERPs have come a long way from a list of “ten blue links”. SERPs are becoming consumable information-objects in their own right, and thus require different kinds of cognitive processing and interactions, than from the early days of the internet. Inspired and adapted from Wang et al. (2018). Accessed on May 5, 2020. This figure is best viewed in colour.

3.2.2.4 Relevance Judgement

Balatsoukas & Ruthven (2010, 2012) proposed a list of relevance criteria for understanding how searchers evaluate search results, or perform relevance judgement. These criteria were developed based on literature reviews and their empirical findings from eye-tracking studies. The final list contains 15 relevance criteria (e.g., topicality, quality, recency, scope, availability, etc.) and can be found in (Balatsoukas & Ruthven, 2012 Appendix B).

Search engines are increasingly adding different modalities of information on the SERP, besides the “ten blue links”. These include images, videos, encyclopaedic information, and maps (Figure 3.8). Z. Liu et al. (2015) studied the influence of these different forms of SERP information – called ‘verticals’ – on searcher’s relevance judgements. A general observation was that if verticals were present in a SERP, they created strong attraction biases. The attraction effect was influenced by the type of verticals, while the vertical quality (relevant or not) did not have a major impact. For instance, ‘images’ and ‘software download’ verticals had higher visual attention, while news verticals had equal attention as the “ten blue links” search results.

3.2.3 Stage 3: Content Page Evaluation / Item Examination

How do users behave when examining a single information-object (e.g., a a non-search-engine webpage, aka content page) obtained from an IR system?

In online information searching, searchers repeatedly interact with individual webpages, a.k.a. ‘content pages’ in IR terminology. These webpages can be visited by following links from a search engine, following links between different webpages, or directly typing the URL in the browser.

The first group of papers we discuss investigated users’ visual attention and reading behaviour on webpages. Pan et al. (2004) studied whether eye-tracking scanpaths on webpages varied based on task-type, webpage type (business, news, search, or shopping), viewing order of webpages, and gender of users. The found significant differences for all factors, except for task-type, which seemed to have no effect on scanpaths. They used weak task-types: remembering what was on a webpage vs. no specific task. In a later work on using informational vs. navigational search-tasks, they again saw limited effect of task-type on visual attention (Lorigo et al., 2006). Findings from Josephson & Holmes (2002)’s study suggested that users possibly follow habitually preferred scanpaths on a webpage, which can be influenced by factors like webpage characteristics and memory. However, they used only three webpages, making the findings difficult to generalize. Goldberg et al. (2002) studied eye movements on Web portals during search-tasks, and saw that header bars were typically not viewed before focusing the main part of the page. So they suggested placing navigation bars on the left side of a page. Beymer et al. (2007) focused on a very specific feature on webpages: images that are placed next to text content and how they influence eye movements during a reading task. They found significant influence on fixation location and duration. Those influences were dependent on how the image contents related to the text contents (i.e., whether they showed ads or text-related images). Buscher et al. (2009) presented findings from a large scale study where users performed information-foraging and page-recognition tasks. They observed that in the first few moments, users quickly scanned the top left of the page, presumably looking for clues about the content, provenance, type of information, etc. for that page. The elements that were normally displayed in the upper left third of webpages (e.g., logos, headlines, titles or perhaps an important picture related to the content) seemed to be important for recognizing and categorizing a page. After these initial moments, influence of task-type set in. For page-recognition tasks, the attention remained in the top-left corner of the webpage. However, for information-foraging tasks, fixations moved to the center-left region of the webpage, where the user was possibly trying to find task-specific information. The right third of webpages attracted almost no visual attention during the first one-second of each page view. Afterwards as well, most users seemed to entirely ignore this region, or only occasionally look at it. This suggested that users had low expectations of information-content or general relevance on the right side of most webpages. As many webpages display advertisements on the right side, this was a plausible observation, and are in line with the observed “F-shaped-patterns” ¹⁰ on webpages.

Buscher et al. (2009) also proposed an eye-tracking measure called fixation impact. This measure first appends a circular Gaussian distribution around each fixation on a webpage element, to create a fuzzy area of interest. This is called the distance impact value. If a webpage element completely covers the fixation circle (Gaussian distribution), it gets a distance impact value of 1. If the element partially covers the fixation circle, its distance impact value is smaller. Multiplying the distance impact value with the fixation duration gives the fixation impact for the given webpage element. Thus, an element that completely covers the fixation circle gets the full fixation duration as fixation impact value. Elements which are partially inside the circle get a value proportional to the Gaussian distribution. The authors posited that the rationale behind creating the fixation impact measure was motivated by observations from human vision research, which indicates that fixation duration correlates with the amount of visual information processed; the longer a fixation, the more information is processed around the fixation centre. Using the fixation impact measure, Buscher et al. (2009) proposed a model for predicting the amount visual attention that individual webpage elements may receive (i.e. visual salience).

Another group of studies investigated how users judged relevance of webpages w.r.t. an assigned search-task or information need. (Gwizdka, 2018; Gwizdka & Zhang, 2015a, 2015b) observed that when relevant pages were revisited, the webpages were read more carefully. Pupil dilations were significantly larger on visits and revisits to relevant pages, and just before relevance judgements were made. Certain conditions of visits and revisits also showed significant differences in EEG alpha frequency band power, and EEG-derived attention levels. Relevance of individual webpage elements were also assessed as click-intention: whether users would click on an element they were looking at. Slanzi et al. (2017) used pupillometry and EEG signals to predict whether a mouse click was present for each eye fixation. EEG features included simple statistical features of signals (mean, SD, power, etc.), as well as sophisticated mathematical features (Hjorth features, Fractal Dimensions, Entropy, etc.). A battery of classifier models were tested. However, the results were not promising. Logistic Regression had the highest accuracy (71%), but very low F1 score (0.33), while neural network based classifiers the had highest F1 score (0.4). The authors suspected that the low sampling rate of their instruments (30 Hz eye-tracker and 128 Hz 14-channel EEG) impacted their classifier performances. González-Ibáñez et al. (2019) compared relevance prediction performances in the presence and absence of eye-tracking data, and argued that when eye-tracking data collection is not feasible, mouse left-clicks can be used a good alternative indicator of relevance.

The ‘Competition for Attention’ theory states that items in our visual field compete for our attention (Desimone & Duncan, 1995). Djamasbi et al. (2013) studied web search and browsing from the perspective of this theory. Theoretical models suggest that in goal-directed searches, information-salience and/or information-relevance drives search behaviour (i.e. competition for attention does not hold true), whereas exploratory search behaviour is influenced by competition among stimuli that attracts a user’s attention (i.e. competition for attention holds true). However, in practice, information search behaviour often becomes a combination of both types of visual search activities (Groner et al., 1984). Djamasbi et al. (2013) found that, despite the goal directed nature of their search-task (finding the best snack place in Boston to take their friends) competition for attention had some effect at the content page level. Some of the users’ attention was diverted to non-focal areas on content pages. However, there was little effect of competition for attention on how the results were viewed on SERPs. Users exhibited the familiar top-to-bottom pattern of viewing (Section 3.2.2), paying the most attention to the top two entries.

3.3 Effects of Expertise and Working Memory on Search Behaviour

Figure 3.9: Literature reviews by Rieh et al. (2016) and Vakkari (2016) identified the following search behavioural traits as indicative of domain experts, or novices undergoing learning to become experts.

Our focus of discussion in this dissertation is information searching and learning. As we saw in Chapter 2, learning and expertise are closely connected: expertise is an evolving characteristic of users that reflects learning over time, rather than being a static property (Rieh et al., 2016; Sawyer, 2005). (White, 2016a, Chapter 7) considers three types of expertise, that are relevant in information seeking settings: (i) domain or subject-matter expertise; (ii) search expertise; and (iii) task expertise. Domain or subject-matter expertise describes people’s knowledge in a specialised subject area such as a domain of interest. Search expertise refers to people’s skill level at performing information-seeking activities, both in a Web search setting and in other settings such as specialised domains. Task expertise describes people’s expertise in performing particular search tasks, potentially independent of domain. Although considered distinctly, the boundaries between these expertise types are quite blurred, and therefore difficult to estimate at the time of search, and model it in a way that can be consumed by search systems.

Previous work on domain knowledge and expertise have linked ¹¹ domain expertise and search behaviour in terms of metrics, behavioural patterns, and criteria (M. J. Cole et al., 2013; Mao et al., 2018; O’Brien et al., 2020; White et al., 2009). A representative summary is presented in Figure 3.9, and is adapted from literature reviews by (Rieh et al., 2016) and (Vakkari, 2016). Briefly, (Wildemuth, 2004) showed that novices converge toward the same search patterns as experts, as they are exposed to a topic and learn more about it. (X. Zhang et al., 2011) found that features such as document retention, query length, and the average rank of results selected could be predictive of domain expertise. (M. J. Cole et al., 2013) showed that eye-gaze patterns could be used to predict an individual’s level of domain expertise using estimates of cognitive effort associated with reading. (White et al., 2009) showed that measures such as diverse website visitation, more narrow topical focus, less diversity (or entropy), more ‘branchiness’ of search sessions, less dwell time, and higher query and session complexity are indicative of expert knoweldge and/or search behaviour.

As a stark contrast, (Zlatkin-Troitschanskaia et al., 2021) reviewed literature on higher education students’ information search behaviour. Students can be considered as novices in all three respects: domain/subject-matter, search skills, and task. The authors report that across literature, higher education students’ information search behaviour tends to follow some general general patterns: (i) foraging: no explicit (task-specific) research plan and little understanding of the differences (pros/cons) between various IR systems; (ii) Google dependence: no intention to use any search tool other than Google, causing students to struggle to understand library information structures and engage with scholarly literature effectively; (iii) rudimentary search heuristic: reliance on one and the same simple search strategy, regardless of search context; (iv) habitual topic changing: students change the search topic after rather superficial skimming, and before evaluating all search results; and (v) overuse of natural language: students type questions into the search box that are phrased as if posing them to a person. Highly ranked online sources accessed via a well-known search engine were perceived as trustworthy.

Effects of memory span and working memory capacity have also been found to influence search effort and search behaviour (Arguello & Choi, 2019; Bhattacharya & Gwizdka, 2019a; L. Cole et al., 2020; Gwizdka, 2013, 2017). Working memory (WM) is considered a core executive function is defined as someone’s ability to hold information in short-term memory when it is no longer perceptually present (Diamond, 2013; G. A. Miller, 1956). (Bailey & Kelly, 2011) showed that the amount of effort was a good indicator of user success on search tasks. (Smith & Kantor, 2008) studied searcher adaptation to poorly performing systems and found that searchers changed their search behaviors between difficult and easy topics in a way that could indicate that users are satisficing. Differences in search effort between different types systems (higher effort invested in searching library database vs. web) were found by (Rieh et al., 2012). A couple of studies showed that mental effort involved in judging document relevance is lower for irrelevant and higher for relevant documents (Gwizdka, 2014; Villa & Halvey, 2013). (Gwizdka, 2017) found that that higher WM searchers perform more actions and that most significant differences are in time spent on reading results pages. Behaviour of high and low WM searchers were also found to change differently in the course of a search task performance.

3.4 Assessing Learning during Search

In order for IR systems to foster user-learning at scale, while respecting individual differences of searchers, there is a need for measures to represent, assess, and evaluate the learning process, possibly in an automated fashion. Consequently, a variety of assessment tools have been used in prior studies. These include self reports, close ended factual questions (multiple choice), open ended questions (short answers, summaries, essays, free recall, sentence generation), and visual mapping techniques using concept maps or mind maps. Each approach has its own associated advantages and limitations. Urgo & Arguello (2022) compare and contrast these assessment techniques extensively in their very comprehensive literature review.

Self-report asks searchers to rate their self-perceived pre-search and post-search knowledge levels (Ghosh et al., 2018; O’Brien et al., 2020). This approach is the easiest to construct, and can be generalised over any search topic. However, self-perceptions may not objectively represent true learning. Closed ended questions test searchers’ knowledge using factual multiple choice questions (MCQs). The answer options can be a mixture of fact-based responses (TRUE, FALSE, or I DON’T KNOW), (Gadiraju et al., 2018; Xu et al., 2020; Yu et al., 2018) or recall-based responses (I remember / don’t remember seeing this information) (Kruikemeier et al., 2018; Roy et al., 2020). Constructing topic-dependant MCQs may take time and effort, since they are topic dependant. Recent work on automatic question generation may be leveraged to overcome this limitation (Syed et al., 2020). Evaluating close ended questions is the easiest, and generally automated in various online learning platforms. Multiple choice questions, however, suffer from a limitation: they allow respondents to answer correctly by guesswork. Open ended questions assess learning by letting searchers write natural language summaries or short answers (Bhattacharya & Gwizdka, 2018; O’Brien et al., 2020; Roy et al., 2021). Depending on experimental design, prompts for writing such responses can be generic (least effort) (Bhattacharya & Gwizdka, 2018, 2019b), or topic-specific (some effort) (Syed et al., 2020). While this approach can provide the richest information about the searcher’s knowledge state, evaluating such responses is the most challenging, and requires extensive human intervention (Kanniainen et al., 2021; Leu et al., 2015; M. J. Wilson & Wilson, 2013) (as discussed in Section 2.4.2). Visual mapping techniques such as mind maps and concept maps have also been used to assess learning during search (Egusa et al., 2010, 2014a, 2014b, 2017; Halttunen & Jarvelin, 2005). Concept maps have been discussed at length in Section 2.3.1. Learning has also been measured in other ways, such as user’s familiarity with concepts and relationships between concepts (Pirolli et al., 1996), gains in user’s understanding of the topic structure, e.g., via conceptual changes described in pre-defined taxonomies (P. Zhang & Soergel, 2016), and user’s ability to formulate more effective queries (Chen et al., 2020; Pirolli et al., 1996).

3.5 Limitations of Current Search Systems in Fostering Learning

3.5.1 Longitudinal studies

Learning is a longitudinal process, occurring gradually over time (Sections 2.3 and 2.2). Therefore, information researchers have studied participant’s search behaviour in prior, albeit few, longitudinal studies. Examples include studies by (Kelly, 2006a, 2006b; Kuhlthau, 2004; Vakkari, 2001a; White et al., 2009; Wildemuth, 2004).

(Wildemuth, 2004) examined the search behaviour of medical students in microbiology. In this experiment, students were observed at three points of time (at the beginning of the course, at the end of the course, and six months after the course), under the assumption that domain expertise changes during a semester. Some search strategies, most notably the gradual narrowing of the results through iterative query modification, were the same throughout the observation period. Other strategies varied over time as individuals gained domains knowledge. Novices were less efficient in selecting concepts to include in search and less accurate in their tactics for modifying searches. (Pennanen & Vakkari, 2003; Vakkari, 2000, 2001a, 2001b) also examined students at multiple points in time, as they were developing their thesis proposal. One important change in behaviour was the use of a more varied and more specific vocabulary as students learned more about their research topic. (Weber et al., 2019) examined a large sample of German students from all academic fields in a two wave study and found that the more advanced they are in their studies, the more students show a more advanced search behaviour (e.g., using more English queries and accessing academic databases more frequently). Advanced search behaviour predicted better university grades. (Weber et al., 2018) also provide mixed evidence on the potential long-term effects of such interventions, as some of their participants reverted to their previous habits two weeks after the study and therefore exhibited only short-term changes in their information-seeking behaviour.

Overall, results regarding the promotion of user’ search and evaluation skills are encouraging. But there is a clear need for more longitudinal studies. The general body of search-as-learning literature examines the learner in the short-term, typically over the course of a single lab session (Kelly et al., 2009; Zlatkin-Troitschanskaia et al., 2021). The trend is similar in other Human-Computer Interaction (HCI) research venues. A meta-analysis of 1014 user studies reported in the ACM CHI 2020 conference revealed that more than 85% of the studies observed participants for a day or less. To this day, “longitudinal studies are the exception rather than the norm” (Koeman, 2020). “An over-reliance on short studies risks inaccurate findings, potentially resulting in prematurely embracing or disregarding new concepts” (Koeman, 2020).

3.5.2 Supporting sensemaking and reflection

As we saw in Section 2.3, learning is sensemaking. Yet, modern search systems are still quite far from supporting sensemaking and learning, and rather, at best are good locators of information. (Rieh et al., 2016) says that modern search systems should support sensemaking by offering more interactive functions, such as tagging for annotation, or tracking individuals’ search history, so that a learner could return to a particular learning point. In addition, a system could provide new features that allow learners to reflect upon their own learning process and search outcomes, thus facilitating the development of critical thinking skills.

It’s easy to be impressed by the scientific and engineering feats that have produced web search engines. They are, unquestionably, one of the most impactful and disruptive information technologies of our time. However, it’s critical to remember their many limitations: they do not help us know what we want to know; they do not help us choose the right words to find it; they do not help us know if what we’ve found is relevant or true; and they do not help us make sense of it. All they do is quickly retrieve what other people on the internet have shared. While this is a great feat, all of the content on the internet is far from everything we know, and quite often a poor substitute for expertise.

— Ko (2021)

(emphasis our own)

3.6 Summary

In this second chapter of the background literature review, we discussed (i) how searchers interact with three stages / interfaces of modern information retrieval system: query formulation, search results evaluation, and content page evaluation; (ii) how expertise and working memory influence overall search behaviour; (iii) how learning or knowledge gain during search has been assessed in recent search as learning literature; and (iv) what are the limitations of current search systems to foster learning, including gaps in literature about long term search behaviour and learning outcomes, as well as lack of support for sensemaking.

We saw that while we have a plethora of studies investigating search behaviour searchers in the short term, we have merely a handful of studies observing the same participant for more than a day. To the best of the author’s knowledge, most of these studies were conducted over a decade ago. Thus, while we have excellent knowledge of short term nature of influence of searching on learning, we do not know what are the longer term effects. Furthermore, we we have gaps in our knowledge of (i) how practices like articulation and externalization, and user attributes like metacognition, motivation, and self regulation moderate the searching as learning process; (ii) how these moderator variables change over time; and (iii) what these phenomena collectively entail for the design of future learning-centric IR systems. In the next chapter, we take these gaps in knowledge and use them to inform our research questions and hypotheses.

References

Abualsaud, M., & Smucker, M. D. (2019). Patterns of search result examination: Query to first action. Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 1833–1842. https://doi.org/10.1145/3357384.3358041

Arguello, J., & Choi, B. (2019). The effects of working memory, perceptual speed, and inhibition in aggregated search. ACM Transactions on Information Systems, 37(3). https://doi.org/10.1145/3322128

Aula, A., Majaranta, P., & Räihä, K.-J. (2005). Eye-tracking reveals the personal styles for search result evaluation. In M. F. Costabile & F. Paternò (Eds.), Human-computer interaction - INTERACT 2005 (pp. 1058–1061). Springer Berlin Heidelberg.

Bailey, E., & Kelly, D. (2011). Is amount of effort a better predictor of search success than use of specific search tactics? Proceedings of the American Society for Information Science and Technology, 48(1), 1–10.

Balatsoukas, P., & Ruthven, I. (2010). The use of relevance criteria during predictive judgment: An eye tracking approach. Proceedings of the American Society for Information Science and Technology, 47(1), 1–10. https://doi.org/10.1002/meet.14504701145

Balatsoukas, P., & Ruthven, I. (2012). An eye-tracking approach to the analysis of relevance judgments on the Web: The case of Google search engine. Journal of the American Society for Information Science and Technology, 63(9), 1728–1746. https://doi.org/10.1002/asi.22707

Belkin, N. J., Oddy, R. N., & Brooks, H. M. (1982). ASK for information retrieval: Part i. Background and theory. Journal of Documentation.

Beymer, D., Orton, P. Z., & Russell, D. M. (2007). An eye tracking study of how pictures influence online reading. IFIP Conference on Human-Computer Interaction, 456–460.

Bhattacharya, N., & Gwizdka, J. (2018). Relating eye-tracking measures with changes in knowledge on search tasks. Symposium on Eye Tracking Research & Applications (ETRA).

Bhattacharya, N., & Gwizdka, J. (2019b). Measuring learning during search: Differences in interactions, eye-gaze, and semantic similarity to expert knowledge. Proceedings of the 2019 Conference on Human Information Interaction and Retrieval, 63–71.

Bhattacharya, N., & Gwizdka, J. (2019a). Measuring learning during search: Differences in interactions, eye-gaze, and semantic similarity to expert knowledge. CHIIR’19.

Bilal, D., & Gwizdka, J. (2016). Children’s Eye-fixations on Google Search Results. Proceedings of the 79th ASIS&T Annual Meeting, 79, 89:1–89:6. https://doi.org/10.1002/pra2.2016.14505301089

Boldi, P., Bonchi, F., Castillo, C., & Vigna, S. (2009). From" dango" to" japanese cakes": Query reformulation models and patterns. 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, 1, 183–190.

Borlund, P. (2013). Interactive Information Retrieval: An Introduction. Journal of Information Science Theory and Practice, 1(3), 12–32. https://doi.org/10.1633/JISTAP.2013.1.3.2

Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36(2), 3–10. https://doi.org/10.1145/792550.792552

Brookes, B. C. (1980). The foundations of information science. Part i. Philosophical aspects. Journal of Information Science, 2(3-4), 125–133.

Buscher, G., Cutrell, E., & Morris, M. R. (2009). What Do You See When You’re Surfing? Using Eye Tracking to Predict Salient Regions of Web Pages. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 10.

Buscher, G., Dumais, S. T., & Cutrell, E. (2010). The good, the bad, and the random: An eye-tracking study of ad quality in web search. Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 42–49. https://doi.org/10.1145/1835449.1835459

Chen, Y., Zhao, Y., & Wang, Z. (2020). Understanding online health information consumers’ search as a learning process. Library Hi Tech.

Cole, L., MacFarlane, A., & Makri, S. (2020). More than words: The impact of memory on how undergraduates with dyslexia interact with information. Proceedings of the 2020 Conference on Human Information Interaction and Retrieval, 353–357. https://doi.org/10.1145/3343413.3378005

Cole, M. J., Gwizdka, J., Liu, C., Belkin, N. J., & Zhang, X. (2013). Inferring user knowledge level from eye movement patterns. Information Processing & Management, 49(5), 1075–1091.

Cutrell, E., & Guan, Z. (2007). What are you looking for? An eye-tracking study of information usage in web search. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 407–416. https://doi.org/10.1145/1240624.1240690

Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual Review of Neuroscience, 18(1), 193–222.

Diamond, A. (2013). Executive functions. Annual Review of Psychology, 64, 135–168.

Djamasbi, S., Hall-Phillips, A., & Yang, R. (Rachel). (2013). Search Results Pages and Competition for Attention Theory: An Exploratory Eye-Tracking Study. In S. Yamamoto (Ed.), Human Interface and the Management of Information. Information and Interaction Design (pp. 576–583). Springer Berlin Heidelberg. http://link.springer.com.ezproxy.lib.utexas.edu/chapter/10.1007/978-3-642-39209-2-64

Dumais, S. T., Buscher, G., & Cutrell, E. (2010). Individual differences in gaze patterns for web search. Proceedings of the Third Symposium on Information Interaction in Context, 185–194. https://doi.org/10.1145/1840784.1840812

Egusa, Y., Saito, H., Takaku, M., Terai, H., Miwa, M., & Kando, N. (2010). Using a Concept Map to Evaluate Exploratory Search. Proceedings of the Third Symposium on Information Interaction in Context, 175–184. https://doi.org/10.1145/1840784.1840810

Egusa, Y., Takaku, M., & Saito, H. (2014a). How Concept Maps Change if a User Does Search or Not? Proceedings of the 5th Information Interaction in Context Symposium, 68–75. https://doi.org/10.1145/2637002.2637012

Egusa, Y., Takaku, M., & Saito, H. (2014b). How to evaluate searching as learning. Searching as Learning Workshop (IIiX 2014 Workshop). http://www.diigubc.ca/IIIXSAL/program.html

Egusa, Y., Takaku, M., & Saito, H. (2017). Evaluating Complex Interactive Searches Using Concept Maps. SCST@ CHIIR, 15–17.

Eickhoff, C., Dungs, S., & Tran, V. (2015). An eye-tracking study of query reformulation. Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 13–22. https://doi.org/10.1145/2766462.2767703

Gadiraju, U., Yu, R., Dietze, S., & Holtz, P. (2018). Analyzing knowledge gain of users in informational search sessions on the web. Conference on Human Information Interaction & Retrieval (CHIIR).

Ghosh, S., Rath, M., & Shah, C. (2018). Searching as learning: Exploring search behavior and learning outcomes in learning-related tasks. Conference on Human Information Interaction & Retrieval (CHIIR).

Goldberg, J. H., Stimson, M. J., Lewenstein, M., Scott, N., & Wichansky, A. M. (2002). Eye tracking in web search tasks: Design implications. Proceedings of the 2002 Symposium on Eye Tracking Research & Applications, 51–58.

González-Ibáñez, R., Esparza-Villamán, A., Vargas-Godoy, J. C., & Shah, C. (2019). A comparison of unimodal and multimodal models for implicit detection of relevance in interactive IR. Journal of the Association for Information Science and Technology, 0(0). https://doi.org/10.1002/asi.24202

Gossen, T., Höbel, J., & Nürnberger, A. (2014). A comparative study about children’s and adults’ perception of targeted web search engines. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 1821–1824. https://doi.org/10.1145/2556288.2557031

Granka, L. A., Joachims, T., & Gay, G. (2004). Eye-tracking analysis of user behavior in WWW search. Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 478–479. https://doi.org/10.1145/1008992.1009079

Groner, R., Walder, F., & Groner, M. (1984). Looking at faces: Local and global aspects of scanpaths. In Advances in psychology (Vol. 22, pp. 523–533). Elsevier.

Guan, Z., & Cutrell, E. (2007). An eye tracking study of the effect of target rank on web search. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 417–420. https://doi.org/10.1145/1240624.1240691

Gwizdka, J. (2013). Effects of working memory capacity on users’ search effort. Proceedings of the International Conference on Multimedia, Interaction, Design and Innovation, 11:1–11:8. https://doi.org/10.1145/2500342.2500358

Gwizdka, J. (2014). Characterizing Relevance with Eye-tracking Measures. Proceedings of the 5th Information Interaction in Context Symposium, 58–67. https://doi.org/10.1145/2637002.2637011

Gwizdka, J. (2017). I Can and So I Search More: Effects Of Memory Span On Search Behavior. Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval, 341–344. https://doi.org/10.1145/3020165.3022148

Gwizdka, J. (2018). Inferring Web Page Relevance Using Pupillometry and Single Channel EEG. In F. D. Davis, R. Riedl, J. vom Brocke, P.-M. Léger, & A. B. Randolph (Eds.), Information Systems and Neuroscience (pp. 175–183). Springer International Publishing. https://doi.org/10.1007/978-3-319-67431-5-20

Gwizdka, J., & Bilal, D. (2017). Analysis of Children’s Queries and Click Behavior on Ranked Results and Their Thought Processes in Google Search. Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval, 377–380. https://doi.org/10.1145/3020165.3022157

Gwizdka, J., & Zhang, Y. (2015a). Differences in Eye-Tracking Measures Between Visits and Revisits to Relevant and Irrelevant Web Pages. Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 811–814. https://doi.org/10.1145/2766462.2767795

Gwizdka, J., & Zhang, Y. (2015b). Towards Inferring Web Page Relevance An Eye-Tracking Study. Proceedings of iConference’2015, 5. https://www.ideals.illinois.edu/handle/2142/73709

Halttunen, K., & Jarvelin, K. (2005). Assessing learning outcomes in two information retrieval learning environments. Information Processing & Management, 41(4), 949–972. https://doi.org/10.1016/j.ipm.2004.02.004

Hofmann, K., Mitra, B., Radlinski, F., & Shokouhi, M. (2014). An eye-tracking study of user interactions with query auto completion. Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 549–558. https://doi.org/10.1145/2661829.2661922

Huang, X., & Soergel, D. (2013). Relevance: An improved framework for explicating the notion. Journal of the American Society for Information Science and Technology, 64(1), 18–35. https://doi.org/10.1002/asi.22811

Jiang, J., He, D., & Allan, J. (2014). Searching, browsing, and clicking in a search session: Changes in user behavior by task and over time. Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, 607–616. https://doi.org/10.1145/2600428.2609633

Josephson, S., & Holmes, M. E. (2002). Visual attention to repeated internet images: Testing the scanpath theory on the world wide web. Proceedings of the 2002 Symposium on Eye Tracking Research & Applications, 43–49.

Kanniainen, L., Kiili, C., Tolvanen, A., Aro, M., Anmarkrud, Ø., & Leppänen, P. H. T. (2021). Assessing reading and online research comprehension: Do difficulties in attention and executive function matter? Learning and Individual Differences, 87, 101985. https://doi.org/10.1016/j.lindif.2021.101985

Kelly, D. (2006a). Measuring online information seeking context, Part 1: Background and method. Journal of the American Society for Information Science and Technology, 57(13), 1729–1739. https://doi.org/10.1002/asi.20483

Kelly, D. (2006b). Measuring online information seeking context, Part 2: Findings and discussion. Journal of the American Society for Information Science and Technology, 57(14), 1862–1874. https://doi.org/10.1002/asi.20484

Kelly, D. (2009). Methods for evaluating interactive information retrieval systems with users. Foundations and Trends in Information Retrieval, 3(1—2), 1–224.

Kelly, D., Dumais, S., & Pedersen, J. O. (2009). Evaluation challenges and directions for information-seeking support systems. IEEE Computer, 42(3).

Ko, A. J. (2021). Seeking information. In Foundations of Information. https://faculty.washington.edu/ajko/books/foundations-of-information/#/seeking

Koeman, L. (2020). HCI/UX research: What methods do we use? – lisa koeman – blog. https://lisakoeman.nl/blog/hci-ux-research-what-methods-do-we-use/.

Kruikemeier, S., Lecheler, S., & Boyer, M. M. (2018). Learning from news on different media platforms: An eye-tracking experiment. Political Communication, 35(1), 75–96.

Kuhlthau, C. C. (2004). Seeking meaning: A process approach to library and information services (Vol. 2). Libraries Unlimited Westport, CT.

Leacock, C., & Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. WordNet: An Electronic Lexical Database, 49(2), 265–283.

Leu, D. J., Forzani, E., Rhoads, C., Maykel, C., Kennedy, C., & Timbrell, N. (2015). The New Literacies of Online Research and Comprehension: Rethinking the Reading Achievement Gap. Reading Research Quarterly, 50(1), 37–59. https://doi.org/10.1002/rrq.85

Li, Y., & Belkin, N. J. (2008). A faceted approach to conceptualizing tasks in information seeking. Information Processing & Management, 44(6), 1822–1837.

Ling, C., Steichen, B., & Choulos, A. G. (2018). A comparative user study of interactive multilingual search interfaces. Proceedings of the 2018 Conference on Human Information Interaction & Retrieval, 211–220. https://doi.org/10.1145/3176349.3176383

Liu, C., Gwizdka, J., Liu, J., Xu, T., & Belkin, N. J. (2010). Analysis and evaluation of query reformulations in different task types. Proceedings of the American Society for Information Science and Technology, 47(1), 1–9.

Liu, Z., Liu, Y., Zhou, K., Zhang, M., & Ma, S. (2015). Influence of vertical result in web search examination. Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 193–202. https://doi.org/10.1145/2766462.2767714

Lorigo, L., Haridasan, M., Brynjarsdóttir, H., Xia, L., Joachims, T., Gay, G., Granka, L., Pellacini, F., & Pan, B. (2008). Eye tracking and online search: Lessons learned and challenges ahead. Journal of the American Society for Information Science and Technology, 59(7), 1041–1052. https://doi.org/10.1002/asi.20794

Lorigo, L., Pan, B., Hembrooke, H., Joachims, T., Granka, L., & Gay, G. (2006). The influence of task and gender on search and evaluation behavior using google. Information Processing & Management, 42(4), 1123–1131.

Mao, J., Liu, Y., Kando, N., Zhang, M., & Ma, S. (2018). How does domain expertise affect users’ search interaction and outcome in exploratory search? ACM Transactions on Information Systems, 36.

Marchionini, G. (1995). Information Seeking in Electronic Environments. Cambridge University Press.

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81.

O’Brien, H. L., Kampen, A., Cole, A. W., & Brennan, K. (2020). The role of domain knowledge in search as learning. Conference on Human Information Interaction and Retrieval (CHIIR).

Palani, S., Fourney, A., Williams, S., Larson, K., Spiridonova, I., & Morris, M. R. (2020). An eye tracking study of web search by people with and without dyslexia. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 729–738. https://doi.org/10.1145/3397271.3401103

Pan, B., Hembrooke, H. A., Gay, G. K., Granka, L. A., Feusner, M. K., & Newman, J. K. (2004). The determinants of web page viewing behavior: An eye-tracking study. Proceedings of the 2004 Symposium on Eye Tracking Research & Applications, 147–154.

Pennanen, M., & Vakkari, P. (2003). Students’ conceptual structure, search process, and outcome while preparing a research proposal: A longitudinal case study. Journal of the American Society for Information Science and Technology, 54(8), 759–770.

Pirolli, P., Schank, P., Hearst, M., & Diehl, C. (1996). Scatter/gather browsing communicates the topic structure of a very large text collection. Conference on Human Factors in Computing Systems (CHI’96).

Qvarfordt, P., Golovchinsky, G., Dunnigan, T., & Agapie, E. (2013). Looking ahead: Query preview in exploratory search. Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, 243–252. https://doi.org/10.1145/2484028.2484084

Rieh, S. Y., Collins-Thompson, K., Hansen, P., & Lee, H.-J. (2016). Towards searching as a learning process: A review of current perspectives and future directions. Journal of Information Science, 42(1), 19–34. https://doi.org/10.1177/0165551515615841

Rieh, S. Y., Kim, Y.-M., & Markey, K. (2012). Amount of invested mental effort (AIME) in online searching. Information Processing & Management, 48(6), 1136–1150.

Roy, N., Moraes, F., & Hauff, C. (2020). Exploring users’ learning gains within search sessions. Conference on Human Information Interaction and Retrieval (CHIIR).

Roy, N., Torre, M. V., Gadiraju, U., Maxwell, D., & Hauff, C. (2021). Note the highlight: Incorporating active reading tools in a search as learning environment. Proceedings of the 2021 Conference on Human Information Interaction and Retrieval, 229–238.

Saracevic, T. (1975). Relevance: A review of and a framework for the thinking on the notion in information science. Journal of the American Society for Information Science, 26(6), 321–343.

Saracevic, T. (2007a). Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: Nature and manifestations of relevance. Journal of the American Society for Information Science and Technology, 58(13), 1915–1933. https://doi.org/10.1002/asi.20682

Saracevic, T. (2007b). Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance. Journal of the American Society for Information Science and Technology, 58(13), 2126–2144.

Saracevic, T. (2016). The Notion of Relevance in Information Science: Everybody knows what relevance is. But, what is it really? Synthesis Lectures on Information Concepts, Retrieval, and Services.

Sawyer, R. K. (2005). The Cambridge handbook of the learning sciences. Cambridge University Press.

Scharinger, C., Kammerer, Y., & Gerjets, P. (2016). Fixation-Related EEG Frequency Band Power Analysis: A Promising Neuro-Cognitive Methodology to Evaluate the Matching-Quality of Web Search Results? HCI International 2016 Posters’ Extended Abstracts, 245–250. https://doi.org/10.1007/978-3-319-40548-3-41

Simon, H. A. (1956). Rational choice and the structure of the environment. Psychological Review, 63(2), 129.

Slanzi, G., Balazs, J. A., & Velásquez, J. D. (2017). Combining eye tracking, pupil dilation and EEG analysis for predicting web users click intention. Information Fusion, 35, 51–57. https://doi.org/10.1016/j.inffus.2016.09.003

Smith, C. L., Gwizdka, J., & Feild, H. (2016). Exploring the use of query auto completion: Search behavior and query entry profiles. Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval, 101–110. https://doi.org/10.1145/2854946.2854975

Smith, C. L., & Kantor, P. B. (2008). User adaptation: Good results from poor systems. Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 147–154.

Spink, A. (1997). Study of interactive feedback during mediated information retrieval. Journal of the American Society for Information Science.

Syed, R., Collins-Thompson, K., Bennett, P. N., Teng, M., Williams, S., Tay, D. W. W., & Iqbal, S. (2020). Improving learning outcomes with gaze tracking and automatic question generation. The Web Conference (WWW).

Urgo, K., & Arguello, J. (2022). Learning assessments in search-as-learning: A survey of prior work and opportunities for future research. Information Processing & Management, 59(2), 102821.

Vakkari, P. (2000). Cognition and changes of search terms and tactics during task performance: A longitudinal case study. In Content-based multimedia information access-volume 1 (pp. 894–907).

Vakkari, P. (2001a). Changes in search tactics and relevance judgements when preparing a research proposal a summary of the findings of a longitudinal study. Information Retrieval, 4(3), 295–310.

Vakkari, P. (2001b). A theory of the task-based information retrieval process: A summary and generalisation of a longitudinal study. Journal of Documentation, 57(1), 44–60. https://doi.org/10.1108/EUM0000000007075

Vakkari, P. (2016). Searching as learning: A systematization based on literature. Journal of Information Science, 42(1), 7–18. https://doi.org/10.1177/0165551515615833

Villa, R., & Halvey, M. (2013). Is relevance hard work? Evaluating the effort of making relevant assessments. Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, 765–768.

Wang, Y., Yin, D., Jie, L., Wang, P., Yamada, M., Chang, Y., & Mei, Q. (2018). Optimizing whole-page presentation for web search. ACM Trans. Web, 12(3). https://doi.org/10.1145/3204461

Weber, H., Becker, D., & Hillmert, S. (2019). Information-seeking behaviour and academic success in higher education: Which search strategies matter for grade differences among university students and how does this relevance differ by field of study? Higher Education, 77(4), 657–678. https://doi.org/10.1007/s10734-018-0296-4

Weber, H., Hillmert, S., & Rott, K. J. (2018). Can digital information literacy among undergraduates be improved? Evidence from an experimental study. Teaching in Higher Education, 23(8), 909–926. https://doi.org/10.1080/13562517.2018.1449740

White, R. (2016a). Interactions with search systems. Cambridge University Press.

White, R., Dumais, S., & Teevan, J. (2009). Characterizing the influence of domain expertise on web search behavior. Proceedings of the Second ACM International Conference on Web Search and Data Mining - WSDM ’09, 132. https://doi.org/10.1145/1498759.1498819

Wildemuth, B. M. (2004). The effects of domain knowledge on search tactic formulation. Journal of the American Society for Information Science and Technology, 55(3), 246–258. https://doi.org/10.1002/asi.10367

Wilson, M. J., & Wilson, M. L. (2013). A comparison of techniques for measuring sensemaking and learning within participant-generated summaries. Journal of the American Society for Information Science and Technology, 64(2), 291–306.

Wilson, T. D. (1999). Models in information behaviour research. Journal of Documentation, 55(3), 249–270.

Xu, L., Zhou, X., & Gadiraju, U. (2020). How does team composition affect knowledge gain of users in collaborative web search? Conference on Hypertext and Social Media (HT).

Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., & Dietze, S. (2018). Predicting User Knowledge Gain in Informational Search Sessions. The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 75–84. https://doi.org/10.1145/3209978.3210064

Zhang, P., & Soergel, D. (2016). Process patterns and conceptual changes in knowledge representations during information seeking and sensemaking: A qualitative user study. Journal of Information Science, 42(1), 59–78.

Zhang, X., Cole, M., & Belkin, N. (2011). Predicting Users’ Domain Knowledge from Search Behaviors. Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1225–1226. https://doi.org/10.1145/2009916.2010131

Zlatkin-Troitschanskaia, O., Hartig, J., Goldhammer, F., & Krstev, J. (2021). Students’ online information use and learning progress in higher education A critical literature review. Studies in Higher Education, 1–26. https://doi.org/10.1080/03075079.2021.1953336

2 Background: Knowledge and Learning

4 Research Questions