SMU Research Data Repository (RDR)
Browse (17.55 kB)

Data and code for "Discoverability beyond the library: Search engine optimization (case study)"

Download (17.55 kB)
posted on 2022-09-05, 08:56 authored by Danping DONGDanping DONG, Aaron ( TAYAaron ( TAY

This record includes the replication data and code supporting results of a published book chapter  Discoverability beyond the library: Search engine optimization (case study) [link to be added]. The case study compares the discoverability of two hosted institutional repository solutions, Digital Commons and Figshare using a randomized controlled experiment. Two randomly selected groups of journal articles were deposited and made open access in institutional repositories hosted on Digital Commons and Figshare respectively. Download count data were collected over 7 months to measure and compare the open access discoverability and search engine visibility of the two platforms. 


This readme file was generated on 2022-07-04 by Dong Danping

Author Contact

Name: Dong Danping ORCID: 0000-0002-2229-6709 Institution: Singapore Management University Email:

Name: Aaron Tay ORCID: 0000-0003-0159-013X Institution: Singapore Management University Email:

Date of data collection: 2021-04-01 to 2021-10-01


Licenses/restrictions placed on the data: CC-BY-4.0 License
Links to publications that cite or use the data: [to be updated]
Links to other publicly accessible locations of the data:

Recommended citation for this dataset: Dong, D., & Tay, A. (2022). Data and code to compare the discoverability of Digital Commons and Figshare.


File List

This is the Jupyter Notebook containing notes and scripts of statistical analysis for the case study.

This file contains clean data for analysis containing download stats from Apr to Oct 2021 for both InK(Digital Commons) and RDR(Figshare).

Relationship between files: The Jupyter Notebook and data file should be placed in the same folder for the code to run.

Data Dictionary


Number of variables: 15
Number of cases/rows: 92

Variable List
Identifier: unique ID for each record. Also the URL to access the record. (Note: Figshare records have been unpublished after the study thus no longer accessible)
IR: Name of the IR. InK is on Digital Commons and RDR is on Figshare.
Title: title of the deposited journal article
Column D-J: monthly download count excluding bots downloads from April to October 2021.
Total: sum of column D-J, total download count during the study period
AugToOct: sum of column H-J from Aug to Oct 2021
GS_avail: whether the record can be found in Google Scholar.
uniq_PDF: whether the record provides the only PDF in Google Scholar
primary: whether the record is displayed as the primary record in Google Scholar.

Missing data codes: blank


Methods are described in 2021-11_DataAnalysis_DCvsFig.ipynb and published book chapter [link to be added]



Confidential or personally identifiable information

  • I confirm that the uploaded data has no confidential or personally identifiable information.

Usage metrics

    SMU Libraries




    Ref. manager