[an error occurred while processing this directive]

Project Prism

Funded by Digital Libraries
Initiative Phase 2

[a breakdown of information integrity image]

Virtual Remote Control:
Preservation Risk Management for Web Resources
Nancy Y. McGovern, ECURE 2002

1

The Project

2

The Team

Anne R. Kenney
Nancy Y. McGovern
Peter Botticelli
Richard Entlich
William R. Kehoe
Carl Lagoze
Sandra Payette

3

Preservation Risk Management

4

The Research Agenda

see, "Preservation Risk Management for Web resources: Virtual Remote Control in Cornell’s Project Prism,"
by Anne R. Kenney, Nancy Y. McGovern, Peter Botticelli, Richard Entlich, Carl Lagoze, and Sandra Payette
in DLib Magazine, January 2002

http://www.dlib.org/dlib/january02/kenney/01kenney.html

5

The Approach

  1. Process
  2. Identification
  3. Analysis
  4. Appraisal
  5. Strategy
  6. Detection
  7. Response

6

Process

Adapt the Risk Management Model stages:

Typical Risk Management

Prism Risk Management

1. Risk identification 1. Data gathering and characterization
2. Risk classification 2. Simple risk declaration and detection
3. Risk assessment 3. Contextualized risk declaration and dection
4. Risk analysis 4. Automated preservation policy enforcement
5. Risk management implementation  

7

Identification

Establish boundary; Characterize content:

example: parse the URL

[defining parts of a uniform resource locator]

8

Analysis

Define risks associated with:

9

Contextual Layers

[diagram of the external and internal environments of a website]

10

Page-level Monitoring

  • Formatting: TIDY
  • Standards compliance
  • Document structure
  • Metadata:
    • HTTP headers
    • HTML headers
  • Changes
    • Content
    • Location
  • Links
    • Out-link struction
    • In-link struction
    • Intra-site
    • Hub
    • Volatility
  • Page provenance
    • URL parsing
  • Log analysis

11

Site-level Monitoring

12

Appraisal

enable portfolia management:

Hypothetical appraisal of a Web resource:
Scope: highly relevant
Value: high value, not essential; numerous links to page
Relationship: secondary archives; informal agreement
Maintenance: key indicators of good management
Redundancy: captured by more than one archive
Risk response: very responsive to risk notifications
Capture: complex structure; cyclical updates; formats
Size: medium-sized; 3-level crawl

13

Portfolio Management

[graph of trust vs. control]

14

Strategy

Develop an organization-specific program:

  Low Trust High Trust
Low Control
  • no agreement; monitor and as-is metadata capture; no risk notification
  • informal agreement; monitor and metadata capture with permission; minimal risk notification
  • formal secondary archive agreement; as-is captuer; minimal risk notification
  • no formal agreement, but consortial arrangement with third party; monitor and as-is capture; low risk notification
  • formal secondary archive agreement; monitor and capture; risk detection/response
High Control
  • informal agreement; actively monitor and capture; full risk detection with recommended response
  • no agreement; actively monitor and capture; risk detection with minimal notification
  • formal archival responsibility, e.g., government mandate; actively monitor and capture; enforced risk response/mitigation
  • archival responsibility for the organization’s web site; actively monitor and capture; full risk detection and enforced response

15

Detection

Monitor change; initiate response:

16

Detection (cont.)

Monitor change; initiate response:

Correlate to program-define response levels
Identify appropriate risk/response scenario(s)

17

Resonse

Develop a toolkit:

Inventory and evaluate existing tools
Assess functionality for Prism stages
Adopt/adapt existing tools
Develop new tools
Apply to appropriate contextual layers
Integrate tools into customizable toolkit

18

Types of Tools

19

Future Directions

20