Societal Implications of Computer-based Information Retrieval and Dissemination (s)

Page 1

H*7 Societal Implications of Computer-based Information Retrieval and Dissemination

How many of us have heard that computer-based searching is merely a faster way to accomplish something that could Just as well be done manually? I wonder how many of us have taken time to consider this frequently voiced misconception. Let me assert that popular acceptance and use of information retrieval services is to some extent limited by this very notion. What is frequently not realized is that the process of access--providing a means whereby a person can interact with a machine to selectively verify and identify information of interest From a large mass of stored information—is a sophisticated process which goes well beyond that of the simple look-up capability of manual systems. This afternoon I would like to examine the notion of access and to illustrate some of the more profound implications of information retrieval technology. To provide perspective let me explore a few of the milestones which have permitted the development of the technology, and examine some of the currer •ends in the industry and conclude with some observations oT whal ihe future holds for those/involved with information retrieval. The Challenge A letter I received in December of last year from Professor Peter d'Errico oF the University of nassachusetts at Amherst, Department of Legal Studies very nicely sets the stage: "Having used Dialog in a teaching framework For about two years, I am struck by how few academics are aware of or understand online searching. This in itself probably does not surprise you; what may surprise you is that I do not think that Dialog is fully aware of the academic potential of online searching either. I have come to the conclusion that the deepest significance for scholarship of Conline searching} has been missed. In particular, there is virtually no awareness that computer database searching is not simply a matter of doing on a machine what one could da in books. The real potential of online searching in an academic context is the way in which it facilitates innovative and expecially interdisciplinaru research. "What I would call the misconception of ofiline searching... is an important factor constricting the growth of the information market; eg., its expense is not justified when we have access to the same information in the library. "I would encourage some exploration of the special kind of 'benefit' that search requestors can realize when they begin to understand how online searching facilitates the crossing of academic disciplinary boundaries and the transcending of divisions and categories in the arrangement of material within disciplines. Online searching enables


a =r w olar to find information in and across fields, either without faa n t y of the traditional taxonomy of an ar.-a or in -C'tradistinction to traditional arrangements cf • Mc^ledge V! It is interesting to me that someone outside the field sums up so well jrchlem ue face in persu.au.. ng these untrained and -rfamiliar with „rc~^-3S of the utility ar,d potential he-efitB to be derived from -rl.-e --..^arching. Keeping Professor d Irrico's cbservaticns in mind, let .5 ; ; behind t K e scenes to e: amine some of t~a milestones 3s = r.oiat^d u11h information retrieval.

o c B r r THUE T FEPSr

Ire services v.e take so much for granted today have developed over a 30-..-y.ar span. As far back as the late 195C'3, . = 5:ern Reserve .332.1 one of the first machines designed specifically for rmat ion retrieval People ca~e from ~ear art far to witness the ers of a machine that could a -tomat ical ly secroh encoded The Fie-cur.car en the left reads met a Liurgical literature CFig. ID. library from punched paper tape. The you^g -onan has programmed up to a •-axirnum of ten search questions which are matched against the encoded library records simultaneously. This machine las a predecessor to general purpose computers which u = re just beginning to be cut in use at about this same time.

v/5-;

The first app ication; i of the gereral purpose cc_::uter dealt with s c : e; i.e., scientific and accounting applications, "u _ erical p r c c e s s m .t was realized that computers were ai-cst a elusiveiy . ge-aral i.red symbol manipulators, the processing cf text became more widespread. ricrtimar Ta_be and nel Day were pic-eers in the definition of multi- purpose databases which were used to generate o n - t e d publications as uell as to serve as the _3Sis for computer searching. The design structures used in these databases set the standard which is used m most databases today. 13 Sus- ?m The "vASA database of Scientific and Technical Retort literature was ore of the first databases so designed. A key to this design was the ide-tity tagging of fields within the records to as to field- type eg. title, author, descriptor, etc . ) ~!any Tiach.ne-readabie files de.aLcped solely for publishing only contained p~otccemp-pcsition cedes which indicated typesetting conventions argins, fonts, les are d: to process character sizes and the like =se for more than the simplist retrieval because of their lack cf field specification. NASA had also developed a tape-fed retrieval system which utilized an IE"! 1401 computer to sequentially searched the \*5A database of 20*2,000 citations. It operated in much the same ~ay as the Western Reserve Search Selector machine. As a so-called batch search system, search time was a direct function of the file size and it was not pcssible to modify one's search request once the process started.


Total elapsed time to process a batch of searches through the r*ASA system of the day was approximately 22 hours. If a search was specified to broadly teg., "welding"}, the requestor might walk away wit.'-' a box Full of output. If a search was too -arrcjj (eg., arcwelding of aluminum in an oxygen-free environment), one might wind up wit - zero hits. Turn around time from search to search was between two iau,s and two weeks. Even thc^gh inconvenient, tr^re uas a Large de~and for searches from NASA scientists and contractors. DI-LuS Develccment. These conditions led a small grcup of us in Lockheed to begin cc~3idermg if there were not a better way to do searching. In 1964 we cursuaded Lockheed to form the Information Systems Laboratory in Pale Alto. The equTpment consisted of an ISff 363/30 computer (Fig. 2) tc-ether with a datacell (Fig. 3) with 400 megabytes cf storage, two s _ =ll disc drives, and a telecommunications interface. The machine cc-tained 3CK bytes of central storage, less than even the smallest cf personal computers, today. Uith an objective cf overcoming the prcclems associated with batch searching, we established the following design criteria for the system which was to be developed: •The language was to be command driven •Alphabetically near terms to any input term were to be displayable together with pesting counts to assist in formulating a search •The result of any search statement cculd be used within the definition of any successive search (recursion) •Output options were to include visual display, local typing, and offline printing •The system was to be usable by librarians and end-usars alike •The system was to be called, "Dialog" Reccgnizmg that independent research funded projects at Lockheed resced early proof-of-cencept far continued investment we submitted an ursciicited prcpcsal to NASA for a demonstration contract utilizing Dialog on the then-existing NASA database. The -jida acceptance of the resulting implementation led to a major competitive request for prcpcsal CRFP) frc.m NASA for develoopment of the NASA/RECON retrieval s a stsm which we bid based on Dialog, and wen. Over the next three years we developed systems for European Space Research Organization (ESRO, now known as European Space Agency or ESA), Atomic Energy Commission CAEC), and others. In the late 1960's we entered into our first services-only contract with the U. S. Office of Education with the ERIC database. From this point forward, Dialog operated as a service as opposed to a systems development organization.


In 1371 we set up as a profit center and began plans to launch a cc:—ercial retrieval service. Dur activitiers were spurred on by a survey wa recieved frcm Carlos Cuadra indicating that System Ce-alopment Corporation CSDC) also h3d plans For such a service, A k e a point here was that tue launched the commercial service frcm a fc_-dation of profitable government contracts which by this time had bee- expanded to include the databases of National Technical Infcrmaticn Service CNTIS) and the National Agricultural Library CN^L). By -he mid-' 's database publis hing by computer h3d found its j^ay into the prcfessional societies maki ng possible the addition of still other dat abases. Each database tande d to attract these fcr whom the dat aease was a primary source, b_it equally important was the exc e n e n c e that the primary use rs branched out a~d used the ether dat abases available on the syst em. They had even at this early stage dis covered cross-disciplinary s earchirg. Fig. 4 shews the buildup of dat abases to present. A curve of sales revenues Cnct shown) largely fol lews this same pattern. (t9t~ Fig -res 5 and 6 sh cw a recent c onfiguratj^fn of computer and/stcrags "^^\ fac ilities at Dial og . vJe have roughly 420 gigabytes CHEG/thcusand mil iion characters ) of storage and operate with four vsr'y large scale IBM compatible com puters. Alth ough our storage doubles approximately eve ry three years, the storage densities have increased to such a poi -t that we have not been req ired to expand c^r facility space. Fur thermcre as can be seen in F ig. 7 both storage and computing costs hav e decreased to a point that we can continue to add databases with lit tie increase in cost. It ca n be seen that both computer and stc rage costs half about every 10 years according to our experience. Keu Technologies - Historical The retrieval services we use today depend on several key technologies of the past which are shown in Fig. 3. Boolean operators "and," "or," and "not" lend themselves particularly well in the slicing and subsetting of databases that gees on in searching. It still amazes xe that one can work through a database of several millions of records and define a subset in a matter cf seconds that ccntains the particular pattern of words and phrases one desires. The inverted file Cwhich can be thou_ ht of as a massive hack-ef-thebook index which keys back ~ot to peg e but rather tc fi eld and wcrd If *;he computer position within field) makes s^ch searching possible, had to sequentially scan through the master Cor sequent lal ) file, such searching would take many times as long. It should be noted hewever that the inverted file is ~ot without its cost. It is at .east the size of the linear or master file, and it must be relea dad with each update 'because of the sorting necessary to resequence tthe terms). At Dialog, far example, we must reload approximately 220 g igabytes per mo*-th Just to maintain the inverted files. It also tak es about 300 tirrs as much processing for a Boolean search as foq a s imple lookup.

^

>


Third generation computing technology which was introduced by h e I BM 350 series of computers, provided the f a c i l i t i e s n e c e s s a r y f o r ,_v "e design cf today's information retriev a l s y s t e m s . This technology includes random-access storage, tale c o m m u n i c a t i o n s c a p a b i l i t y , and 3 interactive, multi-user programming, andcm-acc2ss i n e s s e : 1 3 . to interactive retrieval; teleccmmun icat ions aliens the cost-effective concentration of a collection of data bases at a central sit^; m..Itiuser programming allows the sharing c f the encr-cus costs cf such a service by a number of users. Dialog , for example, has invested sc^e 533-40 million in hard-are and facili ties which each L..ser shares with each search. Cne customer neatly dss cribed Dialog as his P/C's. '~v^ N favorite peripheral. Its interesting to be able to tie into a /540 million peripheral for a few dollars whenever its needed. (ZUyiM^T Packet-switched telecommunication net ujcrks, introduced by Tymnet m the early 1970 's were essential to the development of both naticnal and international service networks. r^ *-

tl0

CURRENT TRENDS Figure B in the right hand column identifies present and future technologies which are affecting information systems at present. By reviewing the trends in each of these component areas we should be able to get a good idea of what the Future holds for information retrieval. Personal computers. A most interesting current trend in personal computers is that toward becoming integrated workstations. It takes only a little imagination to visulize the personal computer -orkstation that will incorporate voice-mail, facsimile, conmnection to local area networks as well as remote facilities, with optical discs for mass storage of Frequently used, relatively static information, and automatic connection to information services for unanticipated needs and to receive various current awareness services storage. Although CBSOM has much of the current attention in this area, the storage of the Full text oF source documents For Facsimile transmission on demand to personal computers shews great promise in the near Future. - f S ^ U *\MJLOÂŤJL&> r~ Software developments ^^JJ Hypert ext by providing links to related material from the contents oF documents seems to provide an interesting browsing cool, sc something largely overlooked in most retrieval systems of today. Equally interesting, though is the Flurry of effort which is developing within commercial houses in an area which can generally be thought of as similarity searching. Although the concept is not new, it has until recently been pursued largely within the academic domains of people such as Gerald Salton and Karen Sparck Jones. Thinking Machines Corp. has developed a configuration / of parallel processors Ccalled the Connection Machine} which allows ^>y very rapid sequential searching of documents. Cow ones j^sr paid "S/& million for two such machines. Examples of other companies working in this area are Third Eye with its Elexir system, and Computer Power Pty . Ltd. with ^ts Status/IQ system

/O

7


,^

3^vte</e<

Integrated Services Digital Netwqrk (ISDN). If one listens to ATT, there 'Jill soon (within the next 3-5 years) be a new cc-munications protocol known as ISDN. Whereas today we communicate at various speeds from 300-2400 baud Cbits per second) analog, ISDN defines a standard of 64,000 baud digital. The impact of general acceptance of this new standard by organizations has exciting implications for information services. High-resolution facsimile can be transmitted at Lirier 10 seconds per page. Gone will be the modem (computers like to talk digital) and noisy lines. Gone will be several telephone lines r : a an office for different services (such as voice, facsimile, data, etc.). One line will carry all of these services simultaneously. It will not be necessary to connect to remote services through a telephone, your workstation/PC will seem to be continuously connected Just as teletypes and facsimile machines are today. In short, I5DN_j.s an application designers dream come tru3^_ ~7 Fu'. 1 text (source documents) UJhereas early information servicers stcred and retrieved codes or citations to identify retrieved documents, as the cost of storage decreased abstracts ardymGre recently the full text of documents were included as searchable and deliverable items. This is a trend which will contmua^T Dialog currently stores the complete text of approximately 400 periodicals online, with a continuing high priority for adding more. It is safe tc assume that within the next 5-10 years nearly all interesting J c r n a l s should be fully available online, A Ue are living at a marvelous time in history with respect tc knowledge access. We can reasonably anticipate that any literate person can easily and conveniently become familiar with society's collected knowledge in any area that piques his or her professional or personal

W-T-

7

ihough a majority of Dialog users are intermediaries and fation specialists, an analysis of 1387 signups shcjjs that endconstitute by far and away a majority of new custerers. Of Dialog's current 86,000 customers, approximately 90 percent are information specialist^*. Of our 193^signups, however, approximately 72 percent are end-usens from a variety of professions. Usage by information specialists\ continues to grow which suggests that endgained at the expense cf a lessening demand users' usage is not bei fcr specialist services International usage. Although Europe/UK and Japan account for the greatest non-USA usage of Dialog services Cscme 70 percent), Figure 3 shews that the highest growth is coming from so-called "other" category. This category contains many developing nations and growth te-ds to follow the availability of affordable telecommunications services in those countries. Dialog currently serves sc~e/#b \ countries, worldwide. /) (


IMPLICATIONS AND CHALLENGES /"Let us return to Professor d'Errico's letter mentioned earlier and his ~t that there is virtually no awareness that computer database s = ;rching is not simply a matter of doing on a machine what one could do :n books. Figure 10 illustrates two dimensions of value-added that ca- be discerned in various approaches to information retrieval . The rieval specification dimension determines richness and flexibility '•"in the request; content describes the richness and completeness the output. A manual search in a card catalog or reference lication, for example, typically allows a single key Cword or =se) specification and provides a group of citations (sometimes - abstracts) as the result. The user must be familiar with and '< the indexing words or categories used by the mdexer in creating m d e x . A machine seach with a system like Dialog, however, allows user to input patterns of words of his or her own selection to des cribe the search topic, and to retrieve in most all cases at least acs .racts and in many cases the full text of the article. Moreover, the user can review the abstracts and modify the search request with add itional words or concepts found in desired results in a recursive ss- se. A manual search normally cavers only cne subject area at a In 11 ~ s, limiting the potential of cross-discipline serendipity. Dia log one can specify up to 20 databases, within a field or across fie Ids, all of which are searched at once. My intent here is not to sell this audience on the benefits of machine retrieval, but rather to address some of the public misunderstandings which limit the wider use of this technology. For the experienced user, the benefits are obvious; for the inexperienced user the power of computer searching is difficult to conceive other than by analogy with other familiar processes. Just as the need to teach reading was recognized in education in the eighteenth century, we need to identify the teaching of machine searching as a core requirement today . To the extent that students car gain familiarity with these techniques in a classroom setting, they are likely to incorporate them into their general skill set and carry them through to later life. Such skills can have an enormous impact on society through curiosity satisfaction and stimulation, more efficient research, and a generally higher quality of knowledge accjmulation and processing. The printed word is of limited use if the resultant publications cannot be identified provided in an i e d ^and a effective and timely fashion. Li/*/

jtz-

w>*<< KeuD nfiMcrt'otUc

<2?%*^V*V&**f^ Q \]v^Jd^o^iM^(

*~^d


Samuel Johnson, the 18th century dictionery writer sums it up nicely: Knowledge is of two kinds.- we know a subject on our own, or wa know where we can find information on it. Unfortunately there often seems to be a low priority given to systems and institutions providing the second type of knoledge as illustrated in the Following excerpt from a recent San Jose Mercury article: As Mayor Tom McEnery's $130 million convention center, high-tech museum and arena go forward, its ^arth remembering the ujards seen on a sign at the Pearl Avenue branch of the San Jose public library: 'Due to the city budget crisis, we are in reed of the following supplies: Scotch tape, pencils, white-out and staples.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.