Setting up a repository

1.  Issues to be considered

There are a number of basic issues that must be resolved in preparing a business case and a proposal for an institutional repository (IR). These range from deciding on whether to ‘go it alone’ or collaborate with other institutions, what software is to be used, which unit in the institution will manage the repository, what will be included (scope), and how the repository will be funded. Standards need to be identified and a decision made on the level to which the organisation can comply with these (a resource-dependant decision). Consideration needs to be given to Research Libraries Group’s (2005) checklist of attributes of a trusted digital repository.

Models There has been some argument in the literature about whether a repository focused on material in a specific discipline, which will contain resources submitted by researchers across a wide range of institutions serves the academic community better, or whether, as the concept of an IR matures, rates of deposit increase, and as a result greater access to knowledge will be achieved by developing IRs that all members of an academic institution are expected to contribute. Peters (2002) provides a good overview of the various options, and discusses the value of repoitories maintained by consortia as an alternative model. He concludes that for the immediate future we are likely to have a ‘hodgepodge’ of types of digital repositories-individual, discipine-based, institutional, consortial and national.

Ware (2004) in the PALS report and Jones et al. (2006, Ch.1) note the early successes achieved by repositories in physics (arXiv), computer science (Networked Computer Science Technical Reference Library), economics (EconPapers) and cognitive science, and more notably in the PubMed central model. They also note that these successes have not been followed universally by repositories in other subject areas. DSpace and other software commonly in use (such as EPrints and Fedora) allow for disciplinary and inter-disciplinary communities and special topics to be given separate categories and treated as a separate archive. At the same time the use of OAI-PMH protocols across repositories allows for integrated searching on a specific topic, increasing the value of individual repositories to researchers, who are not required to have any knowledge of the content of individual repositories in their searching.

Ware (2004) in the PALS report summarises progress in establishing institutional repositories up to December 2003 in terms of numbers of items deposited in repositories that are registered OAI sites (that is, allow their metadata to be harvested and are compliant with the OAI-PMH protocol). These figures are updated by later reports, such as Lomangino (2006), but show that growth, although increasing, has been slower than predicted. Allen (2005) provides UK figures.

Allen’s (2005) report on research on interdisciplinary differences in attitudes towards deposit in IRs, undertaken for the UK Arts and Humanities Research Council, contains a useful summary of many of the historical arguments about the cost of scholarly publishing, and the claim that publishers ‘resell’ to educational institutions the research that their own staff and tax-payer funded research institutions have created. This is a balanced summary, with space given to the publishers’ perspective. The important role of peer review, difficulties in quality assurance with repositories dependant on self-archiving, and the ongoing relevance of these to the creation and continued success of digital repositories are discussed in some depth. However, Allen’s (2005) findings, based on some of the most extensive research carried out in the UK on the subject (an analysis of the contents of 25 UK repositories and a national survey of arts and humanities scholars) show that there is a great variety in scope of the IRs, in the type of content they contain, that several are small, little utilised, and that the contents are dominated by science and technology deposits. His survey data shows that humanities scholars have low awareness of repositories and their value to the research community, but perceive the value of repositories to be to the reader, rather than the scholar depositing. The humanities scholars also have on-going concerns about repositories, including concerns with peer review processes, plagiarism, and intellectual property ownership.

2.  Building a business case

In order to secure funding for a new development in any organisation there is an expectation that the project sponsors will able to make a ‘business case’ for their project, outlining the costs and befits to the organisation and reasons for its adoption. A business case normally focuses on a problem (in this case, the rising cost of scholarly publishing to which the open access movement is offering a solution, the need to raise research impact of the institution’s research output if others are doing it, or the need to stay current developments in other institutions) and outlines a costed solution. It is a key part of the strategic planning and decision-making process. The business case should be written in terms of the organisation’s own strategic goals and using its methods of assessing costs and benefits.

There is useful advice on developing a business case on a number of general web sites (such as the US Government Computing Network site and the Ethos Toolkit business case site). Several libraries have shared their experience online as well, including the Robert Gordon University in Scotland.

Ideally, the business case should include key elements that will need to be addressed (for example software choice and technology support, scoping policy, overall management, preservation plan, promotion within the institution, and predictions on uptake) as well as general justifications. Costs should be as specific as possible and realistic (a case that underestimates the true cost is likely to cause considerable problems in the future if the estimated cost is accepted as a final budget).

3.  Attributes of a trusted digital repository - a goal to aim for

The concept of a trusted digital repository, defined by Research Libraries Group (2005) as “one whose mission is to provide reliable, long-term access to managed digital resources to its designated community, now and in the future” is critical to gaining the trust of the academic community, and acceptance of the concept (an issue discussed in the Advocacy section).

The attributes defined by the team of excerpts established by the RLG-OCLC, Research Libraries Group (2005) include:

  • Compliance with the reference model for an Open Archival Information System (OAIS)
  • Administrative responsibility
  • Organizational viability
  • Financial sustainability
  • Technological and procedural suitability
  • System security
  • Procedural accountability

The report contains helpful comments and advice on each of these attributes and the level at which the criteria should be applied (e.g. management level, collection level, item/record level).

4.  Defining the scope

One of the major tasks in establishing a repository is that of defining scope - what will be included in the repository and how these decisions will be made. Allen (2005) noted the great variety of types of items in the 25 UK repositories he studied, a situation that can lead to loss of trust or reputation, and make it more difficult to persuade contributors of the value of the repository.

Research Libraries Group (2005) also notes that “the sheer variety of digital materials and the role that they play in the collection make development and application of collections policies very challenging. The existence or lack of a physical equivalent influences decisions about whether and how the digital resource is preserved”.

The rules on scope must be clear, including whether the repository will contain all or some of:

  • Theses
  • Articles (pre-prints, post-prints, or both)
  • Research reports funded by the institution or funded by Government
  • Items not already on the Internet
  • Historical items being digitised for preservation or access
  • Audio-visual media

Overall, decisions on scope are closely linked to quality control both at the initial stages and as a part of ongoing maintenance of the repository.

A related issue concerns whether articles and conference papers should be deposited and retained only following peer review and acceptance, or at the time of submission, and whether they should be updated if changes were required or made before publication. Probets and Jenkins (2006) list a range of attributes of items for deposit that must be spelled out in an effective policy, including:

  • type of document,
  • status (peer-reviewed, accepted, published etc.),
  • format,
  • who may contribute (only employers, co-researchers, or affiliates), and
  • related output (presentations, workshops, work in the same series).

SPARC (2002) detail why an IR should include post-print, grey literature, and, in particular, pre-prints (as well as theses), as long as this material is carefully handled and properly managed. The later SPARC position paper, Crow (2002), also encourages a broader view of scope, including work in progress, grant applications and reports, as well as student reports (not confined to research degrees), classroom and teaching materials, computer programs, audio-visual material, creative works, institutional documents and reports. Crow (2002) suggests that each institution will want to make its own decision on these matters and should do so within the framework of the over-riding principle of the changing nature of scholarly publishing and open access. In such a case, careful attention to assignment of categories, content management systems, and version control will be required. This view is also supported by Lynch (2003), who argues that repositories should “reflect campus life, symposia, performances, lectures.”

Genoni (2004) discusses the questions of scope in some detail, highlighting some of the difficult decisions to be made, but arguing that these should be made by individual institutions and should not be mandated by international standards. He raises important questions about how libraries will assert quality control if not through the peer review process, the duplication of material in several repositories (in possibly different versions), how to prioritise content (what content is more valued and should be targeted), and how to manage ‘grey literature.’

The AuseAccess wiki (pronounced ozzieaccess) has a model collection policy for an IR that provides a good starting point, although, for their own reasons institutions may want to vary from it. It lists a number of criteria for items deposited to support quality assurance, necessary to maintain a trusted repository (it mentions ‘DEST reportable’ publications; TEC PBRF reportable standards might be substituted for these) The Queensland University of Technology policy provides additional details and specifies that articles must be “post-peer review stage” that is a post-print. Their policy also accepts non peer-reviewed material, once accepted for publication (conference contributions, book chapters, etc). Material must be added to the repository in the same categories as used for reporting to DEST. PBRF categories may be more useful here for New Zealand repositories as long as a clear distinction is made between peer reviewed and non-peer reviewed, and published and not-yet-published items. The University of Melbourne policy is similar, and also spells out the range of contributors whose work may be deposited.

5.  Costing the project

There is useful material in Jones, Andrew, and MacColl (2006, Ch.2) about the costs involved in setting up and maintaining an IR, based on their experience at the University of Edinburgh. They and others make the point that staff can be redeployed from other parts of the library, as long as they have appropriate skills, and that not all costs incurred are ‘new costs’. They also warn that it would be naive to assume savings can be sought in reduced periodical subscriptions. At the very least, the length of time it takes a repository to build a critical mass of content, and the partial coverage each has of even their own institutional output would suggest that such a reduction in subscriptions is a very long way off.

Ware (2004) in the PALS report provides costs from the D-Space development at MIT. Estimates from both these sources suggest that initial costs may start around 40–50,000 pounds (NZ $100,000–130,000) rising to as high as NZ$250,000–350,000 at maturity (these figures will increase over time with inflation).

References for Implementation Considerations

(edit)Allen, J. (2005). Interdisciplinary Differences in Attitudes Towards Deposit in Institutional Repositories. http://eprints.rclis.org/archive/00005180/

(edit)Crow, R. (2002). The Case for Institutional Repositories: A SPARC Position Paper. http://www.arl.org/sparc/bm~doc/ir_final_release_102.pdf

(edit)Genoni, P. (2004). Content in Institutional Repositories: A Collection Management Issue. Library Management. 25(6/7), 300–306.

(edit)Jones, Andrew & MacColl. (2006).The Institutional Repository. Oxford, Chandos Publishing.

(edit)Lomangino, K. (2006). Institutional Repositories: Their Emergence and Impact on Scholarly Publishing. http://www.sheridanpress.com/assets/pdf/inst_repositories.pdf

(edit)Lynch, C.A. (2003). Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age. Portal: Libraries and the Academy. 3(2), 327–336.

(edit)Peters, T. A. (2002). Digital Repositories: Individual, Discipline-Based, Institutional, Consortial, or National? The Journal of Academic Librarianship. 28(6), 414–17.

(edit)Probets, S. & Jenkins, C. (2006). Documentation for Institutional Repositories. Learned Publishing. 19(1), 57–71. http://hdl.handle.net/2134/782

(edit)Research Libraries Group. (2005). Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC). National Archives and Records Administration. Maryland. Retrieved 1 May 2008 from http://www.crl.edu/PDF/trac.pdf

(edit)Scholarly Publishing and Academic Resources Coalition. (2002). SPARC Institutional Repository Checklist & Resource Guide. http://www.arl.org/sparc/bm~doc/IR_Guide_&_Checklist_v1.pdf

(edit)Ware, M. (2004). Pathfinder Research on Wed-Based Repositories: Final Report. Publisher and Library / Learning Solutions. http://www.palsgroup.org.uk/palsweb/palsweb.nsf/pubframe

Further reading

(edit)Ethos. http://ethostoolkit.rgu.ac.uk/?page_id=70

(edit)Robert Gordon University in Scotland http://www2.rgu.ac.uk/library/etds/business.html

(edit)US Government Computing Network site http://www.gcn.com/print/23_23/26951-1.html

PmWiki

pmwiki.org

edit SideBar

Creative Commons License
This wiki is licensed under a Creative Commons Attribution-Share Alike 2.5 License.
GNU Free Documentation License
Edit · History · Print · Recent Changes · Search · Links
Page last modified on 01 March 2009, at 02:29 PM