
14 minute read
An Interview with Gary Price
By Vessela Ensberg (Associate Director, Data Architecture, University of California Davis Library) and Peter Brantley (Director, Online Strategy, University of California Davis Library)
Claude Opus 3 edited transcript with revisions by Vessela Ensberg, Peter Brantley, and Gary Price.
Vessela Ensberg: Gary, can you talk about how a user can trust the model they’re using and what creates that trust or causes a lack of trust in it?
Gary Price: When it comes to trust in AI, a key point is that words matter. Our community has a huge role in defining things properly. Should we focus this discussion primarily on GPT models or AI as a whole? People may be confusing the definitions.
VE: Do you think there’s a difference in terms of what is trusted, whether it’s AI or GPT?
GP: It depends on how you’re interacting with it. With a self-driving car, for example, you have to trust the car designers, software developers, and their third parties to auto-drive the car and get you to the right place. You have little control as a user.
VE: For this interview, let’s define AI as generative AI that has been released to the general public, not specialized applications confined to particular companies.
GP: Trust comes from experience, use, reputation, and many other variables. If I’m writing a paper using GPT for research, do I trust OpenAI, Anthropic, or Perplexity to get it right without extensive review? At this point, I don’t.
VE: What would make you trust it more?
GP: Time. Research. People who use and research AI tools over time see improvement and growth in the product and in the service. The example I use is very simple: Would I trust ChatGPT to research facts and produce a slide deck with charts and graphs of the results for my boss to present?
I have run into any number of situations where a simple, well documented fact is presented incorrectly. Then, if I press the AI with a possible correction, it will respond: “Your answer X is true.” The precise words in the prompt matter!
VE: Do you have certain benchmarks or points you run through to determine how much you trust the output?
GP: One of the biggest things, which is not readily available in many situations, is the provenance and transparency of the training data. We don’t know where any of this is coming from. As a researcher, I can’t tell my boss where the data is sourced from. That can lead to a slippery slope in terms of output quality.
It is important to remember that I am thinking this way because of my education and training and experience, but a lot of people don’t. People take whatever they get - a concept called “satisficing” that arose around 2005-2010. In many cases, good enough is often good enough, but even then, if the sources could be referenced in a bibliography, or a webliography, a reviewer could see if it’s from reputable places.
I was just reading an article about the growing amount of AI content out there: AI generating an article after article after article. Within seconds, you can create a lot of junk. Now with the growing amount of AI-generated content farms, a GPT model could be sucking in a lot of misinformation or disinformation. If I’m researching with GPT and have no idea where facts are coming from, it makes replication, which is already often poor, even more challenging.
Over time, if you understand how a model is put together and what’s going into it, the trust level may increase.
Peter Brantley: So transparency into model inputs would foster some greater sense of trust or understanding, but I also assume you wouldn’t argue that transparency is sufficient for trust? Simply knowing what training data are used doesn’t necessarily invoke adequate trust levels?
GP: Absolutely, it’s just one of many variables that can help gain trust. That is not any form of guarantee of trust. Also, most major GPT providers today are large companies. We need to consider the business implications of putting trust in them. We know they have distinct profit motives, values. That’s another variable in the trust equation. That also leads to questions of trust regarding how they are using this data.
VE: You touched on citations and provenance for how responses are derived. What about citing the model used? Does a citation of a large language model response allow for reproducibility — and does it need to?
GP: Even if you know all the data going into the model, I don’t think that guarantees a reproducible and reliable result over time for users. There are too many variables — not just the underlying data but how prompts are worded. There are so many things that can affect reproducibility.
PB: Models are constantly being modified by changes in system prompts as well as in user prompts, and in various tweaks, potentially in parameterization. Technically, guaranteeing reproducibility may not be feasible currently. Given the need for reproducibility in research, how do we understand working productively with these models? What does that look like?
GP: The way for librarians and publishers to encourage trust and reproducibility is to have a hand in developing these tools and models. I don’t see any other way in having a say in how these things are developed. The growth of RAG (RetrievalAugmented Generation) and RAFT (Retrieval-Augmented Fine Tuning) will also be ripe for librarian involvement. There’s also an opportunity now for developing public models or ones for specific needs and user groups. This is the only way we can have the control we want to have. The issues that our community had with Google are minuscule compared to the issues we have here.
VE: You are touching on a lot of topics here. Let’s break them down and talk about Google. What did we learn, and how is it different now?
GP: One of the things I learned from Google was the value of being first in the technology cycle. Now when people are thinking about GPT, they are not thinking about Google any more; they are thinking about ChatGPT, OpenAI. Google is a major player with Gemini and the funding of Anthropic, but for the average user “ChatGPT” has become a verb.
With Google, you type in your keywords. At least with Google you are given a source. The quality of the source, the currency of the source — that is another matter. Of course this is, at least at the moment, beginning to change. AI summaries are coming to Google results pages and you’ll need an extra click to get to “traditional” Google results (or so we’ve been told). With LLMs you are getting a nicely-organized result that is ready to go; you can paste it in. People don’t have a lot of time; they want what they want when they want it. ChatGPT outputs, for most intents and purposes, are ready to go. As LLMs develop, we are seeing more web citations being supplied with these new services, but given all that we know about the manipulation of web results, this is a slippery slope .
VE: Is it ready to go, or is it tricking you into thinking it is ready to go because it sounds so smooth?
GP: No, you’re right, and I should be clear. It’s tricking you into thinking that it’s ready to go. I don’t know if most people know it is tricking you into thinking by its mechanism. Do people know enough to verify any of those things? This makes me think about how we are educating people about information digital media literacy. This education is not where it should be. You don’t know what you don’t know. And to verify, you sometimes need to know what you don’t know.
VE: So how do you become a literate user?
GP: The first thing is to know about all of its problems, and to be aware of alternatives, whether it be another GPT model, or whether it be Google, or whether it be Semantic Scholar, or whether it be one of the hundreds of databases that UC Davis, for example, licenses.
You can tell me if I am right or wrong, but most people are starting with Google Scholar or perhaps even just with Google. Yet if they knew one or two of the databases that you’re spending a lot of money for, they could get a better, more precise answer in a shorter amount of time. Tom Mann, who worked at the Library of Congress, talked about the principle of least effort in his book, A Guide to Library Research Methods.
GPT takes that to a whole new level because after just typing in a few words, it might be tricking you into believing that you’re done. Yet, people can’t use what they don’t know about. The first thing I would do as an information professional is be aware of the alternatives. If I need to hammer in a nail, I’m not going to use a screwdriver. I think most database resources are probably unknown to a lot of people who would benefit from them, but they are hearing about Google. They are hearing about ChatGPT. So that’s what they use. Even if you have access to one extended model via a license through your company or institution, I strongly recommend using other LLMs, at least from time to time, to compare responses. Poe, a freemium service, makes this easy and fast.
VE: How much does the ease of use factor in here? How much is it the fault of the user defaulting to the easier tool, and how much is it the fault of information resource companies not making their products user friendly?
GP: Google made it so easy. It’s that one box; you type in something, and you always get something. I think that’s the exact same thing when you’re interfacing with Chat GPT. There’s the box you type something in, and you don’t even have to click on any links to verify or to look at it. You get something back in a nice paragraph. I would agree with you. Ease of use is a big part of this.
Google led to improved ease of use of traditional database providers. But to use them at an advanced level, there is a learning curve. Maybe putting GPT over it will make it easier. I don’t know of any way of getting around the learning curve, if there’s a need for one. When you get to the core of what can motivate many people to use this technology, it’s saving time and effort. Asking people to take time to learn (often on their own) how to best use the technology and to have a very basic level of understanding can be a challenge.
VE: Based on what you just said, what are the most important roles for information professionals today and in the next few years?
GP: Number one: educator. Another huge role is to try to get a seat at the table and share your views or the community’s views. That’s easier said than done. Similarly, a huge opportunity that will only get better with time is being a curator. Everybody in our community believes that it’s garbage in — garbage out; good stuff in — good stuff out. The role of the curator to help build these large language models with good data, with quality data from quality sources is one. Another is knowing the right sources to help select the databases, the PDFs — and not only text materials, but audio and video that the large language model will use RAG ( Retrieval-Augmented Generation ) to query at the time of need. A third role is building curated GPTs for instruction for a specific class, for a specific faculty member, for a specific Ph. D. student. Not only building them once, but helping to keep these GPT models as current as possible, reviewing them, interacting with the patrons on a regular basis, and finding out if they are getting what they need. The role of the curator has a huge opportunity here.
VE: Building on that, what are the concepts and tools and ideas that we need to develop for fact-checking? Do we need a trusted clearinghouse? What are your thoughts on that?
GP: Our community would benefit from a nonprofit entity that would take what I’m doing and expand it. There’s just too much information. There’s too much data. You need to be looking at how these tools are being used in other areas beyond libraries and publishing, in other communities. Don’t always read what you want to read; read beyond it, and get a broader view, stimulate more ideas.
VE: I’m hearing two things here. One is learning from other fields, adopting practices. The second one I think you’re describing is the libraries being keepers of a small curated body of knowledge for critical applications.
GP: Picture an Information Clearinghouse in which a person tests how different GPT models provide different results and finds which one might work best for you. Information technology can play a big role in giving people the information they need, both directly related to their field and from a tangential field that could create new ideas and new thoughts, in a format and level that they can absorb.
VE: What is a good model to use to teach AI literacy? There are standards for information literacy, and data literacy. What is different about artificial intelligence literacy?
GP: The volume of material; the rapid change. The privacy aspect needs to be part of AI literacy.
You need the same thing that makes you a safe driver of a car. You don’t need a PhD-level understanding, you need to teach driver’s ed: a basic understanding of how a car works, identifying road signs, operating the vehicle safely around other people and other cars. There needs to be an “AI driver’s ed”: how to operate AI, what to expect from it. Constant updating. Awareness of where it works well and of its limitations. Awareness of business implications, given that most of the development is done by three large companies. Customize your examples to the audience.
PB: Generative AI systems make associations between bits of information, but, more broadly, fields of knowledge that otherwise humans might not have associated. The machine, deriving insights into natural systems, acquired knowledge that otherwise would have been opaque to the traditional trajectory of human research. Examples range from DeepFold’s extrapolation of protein folding to the surprise of machinedriven moves of a Go game.
GP: If an AI model makes a scientific discovery by identifying new connections, doesn’t that still need to be experimentally verified and replicated to be considered valid knowledge? The AI can generate hypotheses and insights, but other methods are needed to confirm accuracy and legitimacy.
I think there’s an opportunity for librarians to do fact checking on model responses. Given the volume, it would be challenging. It would be interesting to take different questions from different people over time, and have those results rated for quality. Still, if you’re not getting a source, it becomes very, very difficult to do verification. You would need to find the fact that needs verification on a website or in a journal.
PB: If people are trusting social media or generative AI because the curve of truth or reproducibility, or validity in any set of responses, is opaque to them, then they don’t know when and where to exercise the time and judgment to validate. And that’s a hard thing to ask of users, generally.
GP: This is why it needs to be taught or discussed.
VE: To summarize key themes here - don’t trust, verify. And maybe have AI itself write this article from our conversation?
GP: Yes, and I’d be curious to see the results of having the transcript fed through different GPT models to generate an article.