ACKNOWLEDGEMENTS

Table of Contents

1.0 Rationale of the Study 1

1.1 Objectives of the Study 3

1.2 Hypotheses 3

1.3 Method 4

1.4 Organization of the Study 5

CHAPTER II: DOCUMENT RETRIEVAL SYSTEMS 7

2.0 Introduction 7

2.1 Overview of a Document Retrieval System 7

2.2 Documents Database 9

2.3 Indexing Documents 10

2.4 Query Formulation Process 11

2.5 Formal Query 12

2.6 The User Interface 12

2.7 Retrieval Rules 14

2.7.1 The Use of Clustering in Document Retrieval Systems 14

2.7.2 Review of Retrieval Rules 16

2.8 Measures of Retrieval Effectiveness 19

2.9 Relevance Feedback Concepts 22

2.10 Summary 27

CHAPTER III: FAILURE ANALYSIS IN DOCUMENT RETRIEVAL SYSTEMS: A CRITICAL REVIEW OF STUDIES 28

3.1 Analysis of Search Failures 28

3.2 Methods of Analyzing Search Failures 29

3.2.1 Analysis of Search Failures Utilizing Retrieval Effectiveness

Measures 29

3.2.2 Analysis of Search Failures Utilizing User Satisfaction Measure 34

3.2.3 Analysis of Search Failures Utilizing Transaction Logs 38

3.2.4 Analysis of Search Failures Utilizing the Critical

Incident Technique 42

3.2.5 Summary 45

3.3 Review of Studies Analyzing Search Failures 45

3.3.1 Studies Utilizing Precision and Recall Measures 46

3.3.1.1 The Cranfield Studies 46

3.3.1.2 Lancaster's MEDLARS Studies 50

3.3.1.3 Blair and Maron's Full-Text Retrieval System Study 53

3.3.1.4 Markey and Demeyer's Dewey Decimal Classification

Online Project 54

3.3.2 Studies Utilizing User Satisfaction Measures 56

3.3.3 Studies Utilizing Transaction Logs 58

3.3.4 Studies Utilizing the Critical Incident Technique 62

3.3.5 Other Search Failure Studies 63

3.3.6 Related Studies 66

3.4 Conclusion 68

CHAPTER IV: SEARCH FAILURES IN ONLINE CATALOGS: A CONCEPTUAL MODEL 71

4.0 Introduction 71

4.1 Searching and Retrieval Process 71

4.2 Search Failures in Online Catalogs: A Conceptual Model 72

4.3 Failures Caused by Faulty Query Formulation 75

4.4 Failures Caused by User Interfaces and Mechanical Failures 77

4.4.1 Failures Caused by Menu-Driven and Touch-Screen User Interface 77

4.4.2 Failures Caused by Command Language Interfaces 77

4.4.2.1 Failures Caused by Parsing Process 78

4.4.2.1.1 Boolean Searching 79

4.4.3 Failures Caused by Natural Language Query Interfaces 80

4.4.4 Failures Caused by Mechanical Errors 82

4.5 Retrieval Rules 84

4.6 Ineffective Retrieval Results 92

4.6.1 Zero Retrievals 92

4.6.2 Collection Failures 93

4.6.3 Information Overload 94

4.6.4 Retrieving Too Little Information 97

4.6.5 False Drops 97

4.6.6 Failures Caused by Indexing Practices and Vocabulary Mismatch 98

4.7 Summary 100

CHAPTER V: THE EXPERIMENT 102

5.0 Introduction 102

5.1 The Experiment 102

5.2 The Experimental Environment 103

5.2.1 The System 103

5.2.2 Test Collection 110

5.2.3 Subjects 112

5.2.4 Queries 112

5.3 Preparation for the Experiment 113

5.3.1 Preparation of Instructions for Users 113

5.3.2 Preparation of the Data Gathering Tools 114

5.3.3 Recruitment of Users to Participate in the Experiment 116

5.4 Data Gathering 117

5.5 Data Analysis and Evaluation Methodology 119

5.5.1 Quantitative Analysis and Evaluation 120

5.5.1.1 Analysis of Transaction Logs 120

5.5.1.2 Calculating Precision and Recall Ratios 121

5.5.1.3 Analysis of Questionnaire Forms and Critical Incident

Report Forms 128

5.5.2 Qualitative Analysis and Evaluation 130

5.6 Summary 132

CHAPTER VI: FINDINGS 133

6.0 Introduction 133

6.1 Users 133

6.2 Description and Analysis of Data Obtained From Transaction Logs 135

6.2.1 Description and Analysis of Searches and Sessions 135

6.2.2 Description and Analysis of Search Statements 139

6.2.3 Analysis of Search Outcomes 143

6.3 Description and Analysis of Data Obtained From Questionnaires 146

6.4 Description and Analysis of Data Obtained From Critical Incident Reports 151

6.5 Descriptive and Comparative Analysis of Data Gathered Through

All Three Data Collection Methods 153

6.6 Multiple Linear Regression Analysis Results 164

6.7 Summary 170

CHAPTER VII: ANALYSIS OF RETRIEVAL PERFORMANCE IN CHESHIRE 174

7.0 Introduction 174

7.1 Determining Retrieval Performance 174

7.2 Retrieval Performance in CHESHIRE 177

7.2.1 Analysis of Causes of Search Failures in CHESHIRE 177

7.2.1.1 Analysis of Collection Failures 179

7.2.1.2 Analysis of the Causes of User Interface Problems 180

7.2.1.3 Analysis of Failures Caused by Search Statements 181

7.2.1.4 Analysis of the Causes of Known-item Search Failures 182

7.2.1.5 Analysis of the Causes of Cluster Failures 183

7.2.1.6 Analysis of Search Failures Caused by the Library of Congress Subject Headings 187

7.2.1.7 Analysis of Search Failures Caused by CHESHIRE's Stemming Algorithm 189

7.2.1.8 Analysis of Search Failures Caused by No Apparent Reason 190

7.2.1.9 Analysis of Search Failures Caused by Specific Queries 191

7.2.1.10 Analysis of Search Failures Caused by Imprecise Cluster Selection 192

7.2.1.11 Search Failures Caused by Telecommunication Problems 192

7.2.1.12 Analysis of Failures Caused by Users' Unfamiliarity with the Scope of the CHESHIRE Database 192

7.2.1.13 Analysis of Search Failure Caused by False Drops 193

7.2.1.14 Analysis of Search Failure Caused by Call Number Search 194

7.2.2 Analysis of Zero Retrievals 194

7.2.3 Discussion on Search Failures 197

7.2.4 Search Effectiveness in CHESHIRE 205

7.3 Summary 212

CHAPTER VIII: CONCLUSION 217

8.0 Summary 217

8.1 Conclusions 217

8.2 Further Research 222

BIBLIOGRAPHY 223

APPENDICES 236

Appendix A: Background Information About CHESHIRE and Guidelines for CHESHIRE Searches 237

Appendix B: Access to CHESHIRE: An Experimental Online Catalog 241

Appendix C: Transaction Log Record Format 266

Appendix D: Questionnaire 269

Appendix E: Critical Incident Report Form for Effective Searches 272

Appendix F: Critical Incident Report Form for Ineffective Searches 274

Appendix G: Invitation Letter Sent to MLIS Students 276

Appendix H: Invitation Letter Sent to Ph.D. Students 279

Appendix I: Queries Submitted to CHESHIRE 282

Appendix J: Retrieval Performance in CHESHIRE 292