[ Skip links ]
Clifford Lynch Prenote Presentation
Arizona State University
March 2, 2005
Thank you Sherrie for that very kind introduction and let me before doing anything else add my welcome on behalf of the Coalition for Networked Information. CNI, as Rob indicated, has been a sponsor and more importantly in my mind an advocate of this sort of work that this conference has been addressing since its inception. For those of you who are familiar with CNI's work and in particular CNI's current program initiatives I think you should find it quite easy to identify a series of convergences between the sort of issues that are coming up here and work that the Coalition is doing and issues around the management of institutional content, stewardship, e-science, and e-research and those sorts of things.
That takes me to what I want to address today, and I will reassure you before I get started that I do understand that this is an after-dinner speech, and personally at least I have very strong views about how after dinner speeches should be handled. I believe that they should be reasonably brief, and that they should operate at a fairly high level and basically give you a few things to sleep on. So that's what I'm going to try and do. I have sat through after dinner speeches that did things like go through people's latest book, chapter by chapter, when there are really a lot of chapters, and we won't do that sort of thing here.
I really want to talk about two inter-related phenomena that I see happening. A sort of a set of tidal wave phenomena and a set of convergences that are taking place sometimes under the pounding of these tidal waves, sometimes for other reasons.
Probably one of the hardest things for us to deal with collectively as we plan our institutional strategies is understanding that we're making a transition from a time when digital things were sort of special niche things to when non-digital things are going to be the special niche things. How many times can you think back to where you've seen some organization that has been around for a while that assures you its' addressing the digital and it has some special projects office off to the side that does digital stuff, you know with three people in it and a little specially earmarked money to deal with this digital thing. We're going to need to sort of flip that around, and the psychological displacement from that, the disciplinary and organizational displacements from that are really quite severe.
The first and I think ultimately most important tidal wave that I see coming, which is going to change an awful lot for higher education, is this move to what we sometimes call e-science, or e-research or e-scholarship. Essentially a sort of an explicit recognition that the practice of scholarship in many, many fields now is shifting and is now integrally involved in computation, in simulation, in large scale data collection and observational data sets, the control of very sophisticated data acquisitions systems. That data software, visualization and observation now are every bit as important and as significant as traditional monographs and journal articles. In the US we talk about this kind of funny sometimes. What we do is instead of talking about e-scholarship or e-research or e-science as is commonplace for example in Europe, we talk about cyberinfrastructure, and I'm sure you've heard various discussions about the report that Dan Atkins' committee wrote for the National Science Foundation about cyberinfrastructure to support science and engineering research and education in the United States. Some of you are probably also aware of the commission that the American Council of Learned Societies has set up with funding from the Mellon Foundation, that John Unsworth is chairing. This is looking at cyberinfrastructure for the humanities and the social sciences with a particular focus within the social sciences on the non-quantitative social sciences. In other words these sort of social sciences that organizations like the National Science Foundation typically do not fund very much if at all. There you see talk of cyberinfrastructure, a focus on the tools and systems and people skills and other things to support these changes, rather than talking about the changes in scholarly practice as a starting point. It doesn't really matter which way you choose to go at it the two are integrally related. But I find that talking about e-research is probably a more direct way of communicating with the broadest number of people about this.
Now, I want to underscore that while some of this has its roots in the sciences and if you look at sort of the level of deployment of these new practices its probably higher statistically in the sciences, but these developments are not limited to the sciences. We are seeing some of the most innovative work coming out of the humanities, out of places like history. We also need to recognize that its not even just the classic trinity of sciences, social sciences and humanities. The arts are at play here too, and these technologies give us very powerful ways to capture performance and document it, for example, and to make that documentation part of a permanent scholarly record. It also allows people to do performance on a distributed basis, which is something almost entirely new.
Basically what this says to me is that we are going to need to gear up to manage a tremendous flood of digital content: digital research data, software, things that are not part of the scholarly literature as we have historically conceived it, but which are indeed an integral part of the practices of scholarly communication as we go into the twenty-first century. We're going to need organizations to deal with this, we're gonna need strategies to deal with this, and I will say a number of things about that in just a couple of minutes. Before doing that though I want to make one other set of points, which is, concurrently with this move to digital, digitally enabled research and scholarship, we are seeing another phenomenon, or at least I think I'm seeing another phenomenon start to emerge, and one way we might characterize that is, research and the documentation of research is getting more formal. Another way we might characterize it is the expectations about public access to research and research results, particularly those that are paid for by public funds; these expectations are growing week by week almost. I would point you at the current open access debates, but I would point you at more things than just the current debates about open access, and I'd note parenthetically that the place where we're seeing the big public policy pressure, interestingly enough, about open access is right where you'd expect it: the life sciences, the health sciences, biomedicine. The government is a massive funder of that and its very hard for people to understand why they shouldn't be able to get immediate access to the results of the research that they are paying for with their tax dollars, especially if it may help them make what are in some cases literally life and death decisions. That's a very, very compelling political argument.
But it goes deeper than that. We are starting to see our science funding agencies demanding data management and dissemination plans for large grants. I think you're going to see a lot more of that. We are starting to see disciplinary norms emerge in many disciplines that basically say that when you publish that article, you are expected to have deposited a set of supporting data in some form of public repository or database, and the details vary a great deal from discipline to discipline and type of research to type of research. But there's a much greater emphasis on reproducibility, on other people being able to inspect and run your data, and I would just note that this gets very tricky because data and programs are profoundly intertwined. When you start talking about doing data analysis for many kinds of observational data, its almost meaningless to talk about that without also talking about the availability of software, of various kinds of calibration and error correction built into that software, so the software and data mix is getting very substantial.
Now having said that I can point you then to two major sources of this enormous tidal wave of digital content, which we're gonna to need to step up and deal with. The changing practices of scholarship and scholarly communication, and the expectations of science funders and scholarly funders more broadly, and of the public which indeed underwrites the funding for many of these funding agencies. What are we doing about this? What kind of strategies do we have in place to deal with it? I think this is a critical question, which begins to raise issues about how we are, on our campuses in our institutions of higher education, going to configure various sorts of historic interests to support these new kinds of scholarship and scholarly practice. I'll just give you a couple of examples of things that are going on that are very important.
The first, believe it or not, is just understanding what faculty are doing and what faculty need. You will hear tomorrow morning from Sarah Pritchard, who has done one of the few good studies that we've got that begin to give us insight into what's going on out there. Into how practices are changing out there in offices, and in labs, especially when we move away from these really large-scale projects, which have enough money to bring in lots of professional staff and go about their own business. You can find a few other people who are looking at various aspects of this now, I'd commend to you for example the work of Susan Gibbons and her colleagues at the University of Rochester as another good example. I think you'll find in the Dan Atkins report that I mentioned and in the forthcoming report from the American Council of Learned Societies Commission on Cyberinfrastructure and the Humanities, some useful insights on what's going on here.
All the data is not in, but I will speculate because that's one of the things you can do after dinner. I would speculate that if you look around at on the one hand what's happened to academic computing on many of our campuses, and on the other hand the kind of insights that people like Sarah are capturing into faculty needs and behavior, you may be getting a window into an opportunity to fundamentally realign some of our support services and stewardship services on our campuses, and the choices we make there are going to be very crucial going forward if I'm right.
If you look at academic computing, in many cases its retreated into getting site licenses and good deals on workstations and helping people setup personal computers and running the campus network, and not too much else. A lot of the more discipline oriented computing has moved into other sort of specialty niches, which often aren't funded centrally. You'll find things like social science computing labs, which are actually funny organizations because they'll teach you about certain kinds of software like statistical analysis software or maybe GIS software, but they'll also teach you about about census data sets and ICPSR and other kinds of resources for social scientists. You'll see Geographic Information Systems is another sort of niche area. You will find hidden and parked in various departments, people who know about bioinformatics or astronomical data management or things like that, but you no longer see this very centrally. You see needs on the part of faculty and graduate students to deal with something that's more than computing, its about information management or informatics or something like that. The sorts of skills that are implicated here are in general not the kind of places that academic computing or broader campus computing organizations excel at. It uses tools that those organizations are very good with but, it requires a set of expertises that are more common in disciplinary informatics, in library and information science, in archives, in records management, in knowledge management and a whole complex of fields like that. One of our challenges perhaps is how to realign our organizations to deliver these kinds of services and to understand how much of it is going to be aggregated at a campus level, how much of it is going to be at a disciplinary level and parked in specific schools or departments, and presumably heavily funded by those schools and departments as opposed to sort of central funds, and you can readily see a number of the tensions and the implications that are implicit in this kind of a world.
Second set of issues are about content stewardship and dissemination. Who'se going to do it? Where is the locus of responsibility gonna sit? Now on the one hand you have some fields stepping up to disciplinary strategies, you can find this in some areas of the life sciences for example, particularly around bioinformatics and molecular biology. This is helped by the fact that at least in the United States we have the marvelous institution of the National Library of Medicine, which acts as sort of a powerful central clearinghouse for a good deal of this work.
In a lot of other fields though there is no disciplinary funding, there is no disciplinary structure. What we're seeing instead is a reliance on institutional activities, to the extent that institutions are beginning to step up and doing this. Many of you have probably heard about services that are being deployed at many institutions called institutional repositories. These are services that take material that is part of the intellectual life of the campus, disseminate it, manage it and preserve it on behalf of its creators and the institution. A very powerful idea; an essential safety net.
Now, let me tell you some of the bad news about institutional repositories though, and this goes back to this business of shifts. Institutional repositories, I believe, are going to come to represent a pretty significant investment. That investment is probably going to come basically out of core operating funds, namely for library and archival functions. Perhaps for some other research support functions, because the preservation and dissemination of scholarship is a pretty core function of our institutions. But something tells me unfortunately that on most of our campuses the provost and the president and the chief financial officer aren't sitting around saying "Oh yeah I see science stuff is really a significant shift and since it's a shift what we're gonna do is triple the library's budget so they've got a whole pile of new money to address this." Instead what gonna to have to happen is one of these navigation and partial reallocations away from traditional publishing venues and over to dealing with this broader world, and its going to be a very painful, delicate, tricky transformation I think to manage. It is one that institutions are starting to grapple with.
I would invite you to go and have a look at the Association of Research Libraries website. CNI and ARL co-sponsored a workshop symposium in I guess it was around November of last year, looking specifically at these issues of what happens on campus when you deal with e-science and e-research. What happens when you start thinking in terms of support structures like institutional repositories? How do you build the alliances necessary to make that happen? Not only do we have by the way this cross hatch of disciplinary and institutional activity, the campus and the discipline, which by the way need have to mesh smoothly, creating an enormous challenge for technologies, standards and policy because we want things to migrate smoothly back and forth between institutional and disciplinary venues. But we also need some conversations on the national level about this, also internationally, but since there's not a lot of international funding the national level conversations about roles and responsibilities are very critical.
You'll hear later in this conference about some of the conversations that have been happening at a national level in Canada, and I know that the Canadian research libraries and the Canadian government have been thinking harder about these issues, and there's been some very good leadership there. In the United Kingdom you have seen ongoing investment from the Joint Information Systems Committee and the funding councils that support scientific and scholarly research. I would point you to the founding in, well the official launch was last November, of an operation based at the University of Edinburough called the Digital Curation Center, jointly funded by JISC and the funding councils to serve as a center of expertise for the extensive e-science programs in the UK. In the Netherlands you'll find a national repository strategy that's been deployed by SURF, which is an organization a bit like JISC in the UK and a bit like the NSF here. They have put an institutional repository into every one of the universities in the Netherlands, and they've got a backup strategy for them that links into the national library. Now that's a little easier to do in the Netherlands, if I recall correctly there are either eleven or thirteen research universities, and they're all centrally funded. This provides a sort of a different locus of decision-making than we have in the states. Nonetheless, we really do need to get serious about these conversations in the US.
We have seen NSF set up within the, within CISE, the Computing and Intelligence Systems Engineering Division, a subdivision, and I'm probably getting my NSF hierarchy wrong. I believe what I really meant to say is the CISE Directorate has got a division that is dealing with shared cyberinfrastructure for NSF. So we are starting to see this, but there's so much more that needs to happen. I would also direct your attention to a report that I believe is due out in March, at least for public comment, out of the National Science Board, that is dealing with issues around policy for long-term preservation of scientific data sets. This is the result of a fairly large study that the National Science Board has been conducting, and for those of you who aren't familiar with the National Science Board, it is sort of the policy making body that among other things oversees the NSF in a very real sense. So one can hope that this report will lead to some serious discussion.
Let me point at one other set of conversations that I think are very critical about e-science and e-research. That's the nature of stewardship.
We throw around three words: stewardship, preservation and curation. None of these are particularly well defined. Curation in particular seems to have a very broad spectrum of meanings all the way from being treated essentially as a synonym for preservation by some people, to a disciplinary activity where we speak now of curating knowledgebases or databases that reflect sort of the best collective understanding of researchers in a specific subfield of science. Where curation really is starting to take on a sizeable, if you will, editorial function. We need to get our language straight here, and I think there are again some very critical issues about understanding the spectrum and what's done in disciplines and what's done in support services. This seems to be a set of questions that is very squarely on the agenda of the UK Digital Curation Centre for example, but its one that I think we need to talk about at considerable length, and which will help us to get a better understanding of what's common infrastructure and what's really best left to the disciplines to do individually, following the specific unique traditions and practices of each discipline.
So that's the set of things a want to say about the tidal wave of e-science, e-research and e-scholarship data. I want to just make a few other related kinds of comments about convergence of organizations and tidal waves. One I want to comment on is about records. Now when we talk about records management, of course we seem to be talking about at least three or four different things. There is a flavor of records management that's highly administrative and deals with retention schedules and dealing with meeting legal requirements and covering yourself against various kinds of legal issues. There's a set of, there's another stream which deals with documenting the records of research, documenting the intellectual life of the campus, where its really, yes sometimes we park it in records management, but it really in a sense is more like archives or more like e-research or something like that. I think these are gonna get pulled apart.
It's a commonplace now to talk about electronic records overtaking print records. That you know in a few years when you go to most records management kinds of archival operations you should find the office of print and other special materials, and everybody else will be going electronic. I think what's actually going to happen there though is a lot more complicated, and I think we need to be open to really breaking up and restructuring some of our organizational silos. One of the things that's likely to happen with administrative electronic records is that, with fairly rare exceptions they're not going to be sent off to records central. They'll live inside of the systems that create and manage and provide access to them. That's their natural home. And dealing with that will be part of the specs for the system, there will be a administrative role for records management in determining retention schedules for some of this, but it will happen very, these things will live very much inside the context of operational systems. You can see that and you can see many, many examples of this sort of thing happening in corporate systems, you can see it starting to happen in government systems, to the extent that they're really seriously starting to grapple with the flood of electronic records.
There's another set of records which are permanent, which aren't so much about legal things as about history and organizational continuity and documenting scholarship and intellectual life. Those are probably going to be the kind of electronic content, and I'm uneasy about whether the records word is a good word, which are going to perhaps dominate the attention of archives and libraries going forward as they try and deal with this electronic record flood. I think that there are some very funny things going on there. Just look at the uneasy fit in our institutions between the archive, records management and the library. Particularly when you think of the library as playing a leadership role in dealing with the e-research tidal wave, and lets be realistic, you could make an argument that "well the e-research title wave is more of an archives kind of problem than a library kind of problem". Just compare, let's be realistic here, compare the level of resources available to your university library and your university archive. Look at the level of content asset management systems that are already deployed. Look at where the expertise mostly resides, and I think its fairly clear that supporting e-scholarship is going to ride mostly on the libraries because they're the ones with the capability, the resources and the flexibility to deal with it. Archives are just too small...
This takes me to another convergence point that I just want to just kind of underscore, and that's one about systems. It's very easy, it's very nice if you are an underfunded archive or an underfunded university museum, that's another popular group in this area, to sit around dreaming of the perfect system. If somehow you and similar organizations to yours at other institutions, could bring this into being. If you had enough money to create a marketplace where people would produce these. This doesn't seem to be happening. It really doesn't. There's just not enough money to see the development of highly specialized systems for some of these small areas. I think one of the challenges that we're gonna have is to come up with converged systems, probably basing very heavily off of the digital collections management stuff that is going into place in our libraries, primarily to support our university museums, our university archives, to the extent that its necessary, perhaps our university records management systems. I think that its high time to really get serious about thinking about how we can do this kind of convergence, how we can think about making incremental investments that leverage investments already in place and fairly robust systems that are already in place, in order to bring in a much larger and very needy and very significant set of new content assets that need to preserved and disseminated and made available as digital surrogates. So that's the last piece of convergence that I want to underscore for you. It's not just about reshuffling our organizations, it about reshuffling the systems infrastructure that sits underneath that set of organizations.
It's a commonplace today to talk, sometimes a little glibly, but ultimately accurately about the trinity of cultural memory organizations, our libraries, our archives and our museums, and about the fact that these three are converging in various ways. That its getting harder and harder in the digital world to really tell a museum from a library from an archive; at least if they are taking aggressive advantage of the capabilities and power and potential of the networked information environment. I think that when we talk about those three classes of institutions: libraries, museums and archives, we're leaving off some things, and I believe, I'm coming to believe more and more, that some of the most crucial set of conversations we need to have in the next few years is about where various other activities fit within that sort of classic trinity or in relationship to it. Specifically I would point you to scientific and scholarly data management and informatics. I would point you to public broadcasting and a complex of issues around there which is a whole another long discussion that I don't want to have tonight but I want to sort of, tag it as another allied area, and records management.
All of those areas are at least in part components of our cultural memory organization base and other parts perhaps they're not, and figuring out how they relate, how they perhaps converge along with libraries, archives and museums into some sort of future vision of digital cultural memory, digital management and dissemination of our scholarly and intellectual record, I think is one of the biggest challenges before us. I think that if you look through the presentations you're going to hear in the next day and a half you'll find many of these threads emerging. If you look at other activities going on in your institutions, and beyond your institutions in your professional and scholarly spheres, I think you will again find many threads which lead you into these questions and I would urge you all to join me, and join Rob and CNI and ARL and many other interested organizations in trying to sort this out because I think this really is one of the central questions we have in shaping the future of higher education,scholarship and cultural memory going forward. Thanks.