12 minute read
Research Integrity: A Market Overview
By Samantha Green (Head of Content Marketing, Morressier) <samantha.green@morressier.com> and Sami Benchekroun (CEO, Morressier) <sami.benchekroun@morressier.com>
Introduction: The Unique Challenge of Research Integrity
Issues of research integrity are ultimately issues of trust. Scientists need their work to be trusted in order to advance their careers, and to ensure their work has a broader societal impact. Publishers need the work they curate to be trusted to maintain their brand and ensure they can continue to pursue revenue and partnerships. Universities and institutions, many of whom now have Research Integrity Offices, also need to safeguard their reputation through the integrity of their researchers. Journal editors and reviewers work to maintain the impact of their publications, so do their best to ensure each piece of research upholds the highest standards, with the help of best practices and guidelines from organizations like The Committee on Publishing Ethics. Institutions, editorial teams, technology organizations, publishers, and more all play a role in improving research integrity. Those roles are under ever more pressure as the landscape of scholarly publishing changes faster and faster.
On top of the complexities of the market, and the different drivers for individual researchers, publishers, institutions, and technology organizations, we need to understand the root drivers of research misconduct in order to properly build solutions.
Research integrity issues become exacerbated when the system is under strain, as it is now with each year seeing a vast increase in submissions and a dwindling pool of reviewers, who often serve as one of the few checks and balances in the publishing process. Strains on the system lead to embarrassing retractions or slower publishing times, which diminish the impact and reputation of science in the public eye. The calls for change in publishing workflows and peer review come from all corners. The system does not work for reviewers, who spend huge amounts of time reviewing work with little credit or recognition. It does not work for editors or publishers, who invest resources on managing complicated processes that might be manual or not well integrated into the rest of their workflows. And it does not work for authors, who are under pressure to publish and have to wait for slow processes that prevent them from sharing their work efficiently. Increasingly, it also does not work for funders, who want the discoveries they’ve invested in shared the moment they are finished.
The Urgent Need To Scale
Research output has grown significantly year over year, scaling up to an incredible degree during the COVID-19 pandemic, especially. As a result, the publishing industry needs to be able to address this increase in volume, and identify research fraud and misconduct at scale.
Before exploring the role of technology to address research integrity, we must establish what has led to our currently strained publishing resources. Research misconduct happens when a system is under pressure. There are limited resources, and a constant call to do things faster. On the part of the researcher, they face immense pressure to publish, in order to advance their careers and build their personal reputation. This pressure leaves them vulnerable to paper mills and predatory journals, and perhaps more likely to cut corners by engaging in misconduct themselves, or simply making a mistake because there’s little time to perfect a paper. Within our publishing and peer review workflows, there are issues of scale and a pressure to review more papers, faster in an article-based economy. When there’s less time to review each paper, there’s less time to evaluate and identify mistakes or issues that would make a piece of research unsuitable for publication.
The entanglement of business drivers, in which more papers leads to more profit, and reputational drivers, in which publishing papers advances an individual’s career, creates a perfect storm in which individuals and institutions might be tempted to sacrifice accuracy and quality for the sake of scale.
The issue of scale is where technology can have a true impact on scholarly publishing. Building modern, lightweight tools that add a protective layer of assessment for research misconduct will allow publishers to focus on curating the highest quality research at a significantly higher volume, with the confidence that their techstack is supporting their mission to improve research integrity.
The Role of Technology
Technology vendors have a powerful role to play in driving research integrity and supporting publishers’ efforts to detect and prevent common forms of fraud. Technology is uniquely positioned to detect many forms of research misconduct, including:
1) Plagiarism — Plagiarism can take many forms, from taking someone else’s ideas to copying their words without proper attribution. Software can efficiently scan potential research papers against massive databases of web pages and published works to identify areas with too much similarity.
2) Data fabrication — Whether intentional or accidental, the misrepresentation of data is the most common type of research fraud. Data fabrication can be uncovered with algorithms to analyze datasets in submitted works and create statistical models that check for errors and inconsistencies.
3) Analysis errors — Similar to data fabrication, errors can occur during the analysis or results section of a published article. Identifying those anomalies requires a series of automated tests for inconsistencies in statistical analysis.
4) Ethical violations — Ethical violations within a paper can include the manipulation of citations or an overabundance of self-citation. Software can review studies, their methods, and their citations against ethical guidelines.
5) Irreproducible results — The reproducibility crisis is well-documented in certain fields of science, in which experiments are not able to be replicated or repeated with the same conclusion. Technology that promotes sharing data, code, and more to perform post-publications reviews can identify reproducibility issues.
6) AI-generated papers — This emerging form of research misconduct occurs when language models are used to draft or complete article submissions. Identifying such language requires the use of additional machine learning models that have been trained to identify generated text, checking for features like repetition and unusual phrasing.
7) Conflicts of interest — Industry databases manage disclosures for authors and reviewers, and can collect and flag inappropriate affiliations or unknown conflicts of interest, especially those that can be challenging to self-report in blind or double blind reviews.
When an issue is as complex as research integrity, an “all hands on deck” approach is the most effective. By nature, scholarly publishing is a collaborative ecosystem, and technology vendors have a critical part to play. The unique approach that technology organizations have toward building tools, products, and solutions brings a valuable diversity of thought to the research ecosystem.
The current landscape of publishing technology is often insular, built in-house, and highly customized or custom-built. However, building effective, lightweight technological solutions is not part of the central mission of most publishing companies, and research integrity issues have been exacerbated by inflexible software that can’t adapt to identify emerging forms of fraud or expand to effectively accommodate the growth in research output. Publishers are unmatched in their ability to curate research and share it with the world, but to operate at the cutting edge of technology, they will need to partner with technology organizations.
Today, technology organizations approach product development with high levels of adaptability. Publishing workflows and infrastructure are complex, often intercut with manual processes. Detangling a publishing workflow to embed research integrity checks is complex without a technology partner that can supply modular solutions that easily integrate. The iterative development of technology and software brings flexibility to the publishing world.
Using a technology supplier to improve research integrity can also ensure a positive, modern, and efficient user experience (UX). The importance of UX cannot be overstated: UX and powerful human-centered design is a critical part of any type of technology today. It gives publishers a unique understanding of exactly what happens at each stage of a workflow, using that data to make improvements both for the user and for the publishing efficiency. We must also remember that publishing technology does not exist in a vacuum. Every other piece of technology researchers engage with in their personal and professional lives provides a very different user experience than many of today’s publishing technologies. For the next generation of researchers in particular, maintaining the status quo will not be acceptable for much longer.
An Integration Strategy
One of the most powerful aspects of the various types of research integrity tools is their ability to integrate with one another, and with traditional publishing infrastructure. Today, publishing infrastructure is already very complex, and it can be incredibly challenging to manage a large migration to new software. Technology suppliers have the ability to enhance existing publishing workflows, leading to a quicker fix for urgent research integrity issues.
Integrations allow the industry to collaborate across different organizations, and create a technology ecosystem surrounding publishing that can quickly and effectively adapt to meet the needs of publishers, and most importantly, researchers themselves. The interoperability of various modules, each expertly designed to solve a problem, leads to faster evolution and a collaborative landscape of technology. Building a technology solution to improve research integrity means bringing together many of those interoperable modules to meet the unique needs of each publishing community. End to end solutions are enhanced with an extra layer of checks and balances for things like plagiarism or conflicts of interest.
Addressing Emerging Technologies
The agility of technical suppliers and their ability to create modular, interoperable solutions, make them critically important to the publishing world for building a proactive approach to emerging technologies. Today, artificial intelligence and machine learning are perhaps the biggest players in the emerging technology landscape at the moment.
• AI-generated content — With the rise of ChatGPT and numerous other AI services, the need to identify and regulate AI-generated content is top of mind for many publishers. Although AI-written text is highly sophisticated, it may have unusual writing styles that differentiate it from the personal touch in human-written papers. Further, the use of AI in writing threatens the originality of scientific papers because AI tools usually include tortured phrases and plagiarized content in the paper, thus undermining ethical standards. AI-generated papers are not the original ideas of the author and we can use plagiarism detection software to analyze the text on a macro level: fighting AI with AI. Much has been written about how to regulate this phenomenon at the editorial level, but first we must be able to reliably identify when content is generated by AI.
• AI-enhanced peer review workflows — Using AI as an additional layer of checks and balances in peer review is a critical counterpoint to the aspects of research fraud made easier by machine learning. At the beginning of the process, algorithms can be used to evaluate the reviewer pool and automatically assign reviewers to new submissions, based on criteria from how much other reviewing work an individual is already in the middle of to their areas of expertise. These algorithms can also automatically assess conflicts of interest. Additionally, AI integrity checks can detect plagiarism, data fabrication or statistical errors. Peer review is a stressed system, with too few reviewers trying to shoulder the burden of too many papers, but all of the manual points of the process, checking for submission completeness, or formatting, can be offloaded to smart technologies. This shift would make the process easier and more efficient for the humans at the heart of it.
• AI-supported trend analysis and taxonomies — One of the unique values of artificial intelligence is synthesizing vast amounts of data. Exploring the impact of machine learning at the discipline level opens up new opportunities to predict trends, chart the course of science, and even predict the future. Developing AI tools that can analyze publishing trends and predict where they will go in the future could support editorial decision-making, and even the launch of strategic new journals. Publishers can find gaps in their programs and fill them more proactively than ever before. This trend analysis could further be applied to the development of more nuanced and advanced taxonomies that draw connections between disciplines and pieces of research with ease. These taxonomies need not be static, or built from the top down, instead they would be a reflection of the global body of research.
• Knowledge translation by language models — For many fields of research, there are many different types of communities that would benefit from the latest research findings. Language models could not only make translation easier for global audiences, they could translate the complex, highly specialized jargon of a scientific paper into a summary that anyone and everyone can understand. Today, the process of producing a plain language summary is often secondary to the original article. It might be mandated by a journal in order to improve the impact of scientific discovery, but it also represents an extra step of work for an already overburdened author. Language models could further analyze scientific articles and synthesize conclusions or discoveries most relevant for policy, for industry, or for the public, and create audience specific summaries that allow science communication to scale at a rapid and consistent rate.
The only limit of the applications of artificial intelligence in publishing is the human imagination. It’s a polarizing issue that publishing needs to be prepared to address. Technology organizations are perfectly positioned to act as guides to publishers through the minefield between harnessing AI’s potential and safeguarding against AI’s threats.
Conclusion
Technology organizations give scientific publishing the opportunity to address research integrity at scale. To truly improve research integrity, the publishing techstack needs transformation, with the help of a collaborative and competitive ecosystem of tech suppliers. Partnering with technology organizations to improve infrastructure and publishing workflows will allow publishers, editors, and individual researchers to focus on the cultural pressures that threaten research integrity. It is only with strategic partnership between organizations with diverse expertise that research integrity can truly be restored.