2. Searches, appraises and synthesises the literature
3. If literature is lacking, conduct research
EBP, evidence-based practice.
All 19 models and frameworks included a process for asking questions. Most focused on identifying problems that needed to be addressed on an organisational or hospital level. Five used the PICO (population, intervention, comparator, outcome) format to ask specific questions related to patient care. 19–25
The models and frameworks gave basic instructions on acquiring literature, such as ‘conduct systematic search’ or ‘acquire resource’. 20 Four recommended sources from previously generated evidence, such as guidelines and systematic reviews. 6 21 22 26 Although most models and frameworks did not provide specifics, others suggested this work be done through EBP mentors/experts. 20 21 25 27 Seven models included qualitative evidence in the use of evidence, 6 19 21 24 27–29 while only four models considered the use of patient preference and values as evidence. 21 22 24 27 Six models recommended internal data be used in acquiring information. 17 20–22 24 27
The models and frameworks varied greatly in the level of instruction provided in assessing the best evidence. All provided a general overview in assessing and grading the evidence. Four recommended this work be done by EBP mentors and experts. 20 25 27 30 Seven models developed specific tools to be used to assess the levels of evidence. 6 17 21 22 24 25 27
The application of evidence also varied greatly for the different models and frameworks. Seven models recommended pilot programmes to implement change. 6 21–25 31 Five recommended the use of EBP mentors and experts to assist in the implementation of evidence and quality improvement as a strategy of the models and frameworks. 20 24 25 27 Thirteen models and frameworks discussed patient values and preferences, 6 17–19 21–27 31 32 but only seven incorporated this topic into the model or framework, 21–27 and only five included tools and instructions. 21–25 Twelve of the 20 models discussed using clinical skill, but specifics of how this was incorporated was lacking in models and frameworks. 6 17–19 21–27 31
Evaluation varied among the models and frameworks, but most involved using implementation outcome measures to determine the project’s success. Five models and frameworks provide tools and in-depth instruction for evaluation. 21 22 24–26 Monash Partners Learning Health Systems provided detailed instruction on using internal institutional data to determine success of application. 26 This framework uses internal and external data along with evidence in decision making as a benchmark for successful implementation.
EBP models and frameworks provide a process for transforming evidence into clinical practice and allow organisations to determine readiness and willingness for change in a complex hospital system. 12 The large number of models and frameworks complicates the process by confusing what the best tool is for healthcare organisations. This review examined many models and frameworks and assessed the characteristics and gaps that can better assist healthcare organisations to determine the right tool for themselves. This review identified 19 EBP models and frameworks that included the five main steps of EBP as described by Sackett. 5 The results showed that the themes of the models and frameworks are as diverse as the models and frameworks themselves. Some are well developed and widely used, with supporting validation and updates. 21 22 24 27 One such model, the Iowa EBP model, has received over 3900 requests for permission to use it and has been updated from its initial development and publication. 24 Other models provided tools and contextual instruction such as the Johns Hopkin’s model which includes a large number of supporting tools for developing PICOs, instructions for grading literature and project implementation. 17 21 22 24 27 By contrast, the ACE Star model and the An Evidence Implementation Model for Public Health Systems only provide high level overview and general instructions compared with other models and frameworks. 19 29 33
A consistent finding in research of clinician experience with EBP is the lack of expertise that is needed to assess the literature. 24 34 35 The models and frameworks reviewed demonstrated that the user must possess the knowledge and related skills for this step in the process. The models and frameworks varied greatly in the level of instruction to assess the evidence. Most provided a general overview in assessing and grading the evidence, though a few recommended that this work be done by EBP mentors and experts. 20 25 27 ARCC, JBI and Johns Hopkins provided robust tools and resources that would require administrative time and financial support. 21 22 27 Some models and frameworks offered vital resources or pointed to other resources for assessing evidence, 24 but most did not. While a few used mentors and experts to assist with assessing the literature, a majority did not address this persistent issue.
Sackett’s five-step model included another important consideration when implementing EBP: patient values and preferences. One criticism of EBP is that it ignores patient values and preferences. 36 Over half of the models and frameworks reported the need to include patient values and preferences, but the tools, instruction or resources for including them were limited. The ARCC model integrates patient preferences and values into the model, but it is up to the EBP mentor to accomplish this task. 37 There are many tools for assessing evidence, but few models and frameworks provide this level of guidance for incorporating patient preference and values. The inclusion of patient and family values and preferences can be misunderstood, insincere, and even tokenistic but without it there is reduced chance of success of implementation of EBP. 38 39
Similar to other well-designed scoping reviews, the strengths of this review include a rigorous search conducted by a skilled librarian, literature evaluation by more than one person, and the utilisation of an established methodological framework (PRISMA-ScR). 14 15 Additionally, utilising the EBP five-step models as a point of alignment allows for a more comprehensive breakdown and established reference points for the reviewed models and frameworks. While scoping reviews have been completed on implementation science and knowledge translation models and framework, to our knowledge, this is the first scoping review of EBP models and frameworks. 13 14 Limitations of the study include that well-developed models and frameworks may have been excluded for not including all five steps. 40 For example, the Promoting Action on Research Implementation in Health Services (PARIHS) framework is a well-developed and validated implementation framework but did not include all five steps of an EBP model. 40 Also, some models and frameworks have been studied and validated over many years. It was beyond the scope of the review to measure the quality of the models and frameworks based on these other validated studies.
Healthcare organisations can support EBP by choosing a model or framework that best suits their environment and providing clear guidance for implementing the best evidence. Some organisations may find the best fit with the ARCC and the Clinical Scholars Model because of the emphasis on mentors or the Johns Hopkins model for its tools for grading the level of evidence. 21 25 27 In contrast, other organisations may find the Iowa model useful with its feedback loops throughout its process. 24
Another implication of this study is the opportunity to better define and develop robust tools for patient and family values and preferences within EBP models and frameworks. Patient experiences are complex and require thorough exploration, so it is not overlooked, which is often the case. 39 41 The utilisation of EBP models and frameworks provide an opportunity to explore this area and provide the resources and understanding that are often lacking. 38 Though varying, models such as the Iowa Model, JBI and Johns Hopkins developed tools to incorporate patient and family values and preferences, but a majority of the models and frameworks did not. 21 22 24 An opportunity exists to create broad tools that can incorporate patient and family values and preferences into EBP to a similar extent as many of the models and frameworks used for developing tools for literature assessment and implementation. 21–25
Future research should consider appraising the quality and use of the different EBP models and frameworks to determine success. Additionally, greater clarification on what is considered patient and family values and preferences and how they can be integrated into the different models and frameworks is needed.
This scoping review of 19 models and frameworks shows considerable variation regarding how the EBP models and frameworks integrate the five steps of EBP. Most of the included models and frameworks provided a narrow description of the steps needed to assess and implement EBP, while a few provided robust instruction and tools. The reviewed models and frameworks provided diverse instructions on the best way to use EBP. However, the inclusion of patient values and preferences needs to be better integrated into EBP models. Also, the issues of EBP expertise to assess evidence must be considered when selecting a model or framework.
Acknowledgments.
We thank Keri Swaggart for completing the database searches and the Medical Writing Center at Children's Mercy Kansas City for editing this manuscript.
Contributors: All authors have read and approved the final manuscript. JD conceptualised the study design, screened the articles for eligibility, extracted data from included studies and contributed to the writing and revision of the manuscript. LM-L conceptualised the study design, provided critical feedback on the manuscript and revised the manuscript. AM screened the articles for eligibility, extracted data from the studies, provided critical feedback on the manuscript and revised the manuscript. JD is the guarantor of this work.
Funding: The article processing charges related to the publication of this article were supported by The University of Kansas (KU) One University Open Access Author Fund sponsored jointly by the KU Provost, KU Vice Chancellor for Research, and KUMC Vice Chancellor for Research and managed jointly by the Libraries at the Medical Center and KU - Lawrence
Disclaimer: No funding agencies had input into the content of this manuscript.
Competing interests: None declared.
Patient and public involvement: Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review: Not commissioned; externally peer reviewed.
Supplemental material: This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.
Ethics statements, patient consent for publication.
Not applicable.
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Email citation, add to collections.
Your saved search, create a file for external citation management software, your rss feed.
Affiliation.
In the fast-growing geriatric population, we are confronted with both osteoporosis, which makes fixation of fractures more and more challenging, and several comorbidities, which are most likely to cause postoperative complications. Several models of shared care for these patients are described, and the goal of our systematic literature research was to point out the differences of the individual models. A systematic electronic database search was performed, identifying articles that evaluate in a multidisciplinary approach the elderly hip fracture patients, including at least a geriatrician and an orthopedic surgeon focused on in-hospital treatment. The different investigations were categorized into four groups defined by the type of intervention. The main outcome parameters were pooled across the studies and weighted by sample size. Out of 656 potentially relevant citations, 21 could be extracted and categorized into four groups. Regarding the main outcome parameters, the group with integrated care could show the lowest in-hospital mortality rate (1.14%), the lowest length of stay (7.39 days), and the lowest mean time to surgery (1.43 days). No clear statement could be found for the medical complication rates and the activities of daily living due to their inhomogeneity when comparing the models. The review of these investigations cannot tell us the best model, but there is a trend toward more recent models using an integrated approach. Integrated care summarizes all the positive features reported in the various investigations like integration of a Geriatrician in the trauma unit, having a multidisciplinary team, prioritizing the geriatric fracture patients, and developing guidelines for the patients' treatment. Each hospital implementing a special model for geriatric hip fracture patients should collect detailed data about the patients, process of care, and outcomes to be able to participate in audit processes and avoid peerlessness.
PubMed Disclaimer
Full text sources.
NCBI Literature Resources
MeSH PMC Bookshelf Disclaimer
The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.
An investigation into adult learners and learning : powerful learners and learning in three sites of adult education, adult learners and mathematics learning support, being, having and doing: theories of learning and adults with learning difficulties..
What do we know about mathematics teaching and learning of multilingual adults and why does it matter, teaching english as a foreign language: perceptions of an in-service diploma course, how to use socratic questioning in order to promote adults’ self-directed learning, the effects of the literacy policy environment on local sites of learning, adult education and lifelong learning in arts and cultural institutions: a content analysis, theories of learning and the teacher educator, 55 references, the adult learner : a neglected species, boundaries of adult learning, conceptualising education for all in latin america, researching expanded notions of learning and work and underemployment: findings of the first canadian survey of informal learning practices, the concept of experiential learning and john dewey's theory of reflective thought and action, understanding practice: perspectives on activity and context, adult learning in the social context, self-direction for lifelong learning: a comprehensive guide to theory and practice, learning in likely places : varieties of apprenticeship in japan, social learning theory, related papers.
Showing 1 through 3 of 0 Related Papers
Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser .
Enter the email address you signed up with and we'll email you a reset link.
2008, Australasian Physics & Engineering Sciences in Medicine
Christopher Khoo
This multi-author volume brings together 30 contributors under an international editorship. The four editors include three biomedical engineers (one from Queen Mary's College, University of London and two from the Eindhoven University of Technology) and the medical director of l'Arche Rehabilitation Centre in France, who is the only clinical author. The final chapter on tissue repair strategies, which is of most surgical interest, focuses on ‘biochemical stimulation of the wound bed to improve wound healing. Important new therapeutics in this category are reviewed such as: (i) exogenous application of growth factors; (ii) tissue-engineered skin grafts; and (iii) gene therapy.’ The two authors from the Eindhoven University of Technology are (according to an internet search) not medically qualified, so understandably their viewpoints do not set out the full range of current reconstructive surgical options or ‘future perspectives’ in clinical surgery. The publishers have adopte...
Wound Repair and Regeneration
Martijn van Griensven
Pressure Ulcer Research
Carlijn V . C . Bouten
Indian Journal of Plastic Surgery
Karoon Agrawal
ABSTRACTPressure ulcer in an otherwise sick patient is a matter of concern for the care givers as well as the medical personnel. A lot has been done to understand the disease process. So much so that USA and European countries have established advisory panels in their respective continents. Since the establishment of these organizations, the understanding of the pressure ulcer has improved significantly. The authors feel that the well documented and well publicized definition of pressure ulcer is somewhat lacking in the correct description of the disease process. Hence, a modified definition has been presented. This disease is here to stay. In the process of managing these ulcers the basic pathology needs to be understood well. Pressure ischemia is the main reason behind the occurrence of ulceration. Different extrinsic and intrinsic factors have been described in detail with review of literature. There are a large number of risk factors causing ulceration. The risk assessment scale...
Aleksandra Kotlińska-Lemieszek
This paper presents a modern concept of conservative treatment of pressure ulcers in the moist environment. Dressing types, their characteristics and a system of “colour” wound classification are presented.
Ernane Reis
Pressure Ulcers: Etiology, Treatment and Prevention Anu Singhal, MD, Resident, Metrohealth Medical Centre, Cleveland, OH, USA. Ernane D. Reis, MD, Assistant Professor, Department of Surgery,The Mount Sinai Medical Center, New York, NY, USA. Morris D. Kerstein, MD, Chief of Staff,V.A. Medical & Regional Office Center, Wilmington, Delaware; Professor of Surgery, Jefferson Medical College, Philadelphia, PA, USA. SKIN DISEASE Frequently found on the sacrum,pressure ulcers develop due to prolonged periods of unrelieved pressure on soft tissues,but can occur anywhere there is pressure, including trochanters and especially heels. In the bedridden patient, constant pressure causes ischemia and necrosis of subcutaneous tissues and skin. Most patients are elderly, immobile and have neurologic impairments,often associated with inability to sense pain and discomfort and/or incontinence. Sacral ulcers can be treated with debridement, dressings and skin grafts.However,preventive efforts—including...
Science and Practice of Pressure Ulcer Management
Physical Medicine and Rehabilitation Clinics of North America
Advances in Dermatology and Allergology/Postępy Dermatologii i Alergologii
Revista Brasileira de Cirurgia Plástica
Ricardo Figueiras
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Dermatologic Therapy
Marco Romanelli
Journal of Pharmaceutical Research International
Bandar Alsharari
BMJ clinical evidence
Emily Petherick
Advances in Wound Care
Tatiana Boyko
Archives of Physical Medicine and Rehabilitation
Linda Phillips
Health & Research Journal
Orchan Impis
Ostomy/wound management
Laura Edsberg
International Wound Journal
andres maldonado
Plastic and Reconstructive Surgery
Bruce Klitzman
Andrea Cavicchioli
Journal of Wound, Ostomy and Continence Nursing
Christina Lindholm
Journal of advanced nursing
Dawn Dowding
Nils Lahmann
Joanne Whitney , Linda Phillips
International Journal of Online and Biomedical Engineering (iJOE)
Olivera Stojadinovic
JAMA: The Journal of the American Medical Association
Ijesrt Journal
Systematic Reviews volume 13 , Article number: 158 ( 2024 ) Cite this article
475 Accesses
1 Altmetric
Metrics details
Systematically screening published literature to determine the relevant publications to synthesize in a review is a time-consuming and difficult task. Large language models (LLMs) are an emerging technology with promising capabilities for the automation of language-related tasks that may be useful for such a purpose.
LLMs were used as part of an automated system to evaluate the relevance of publications to a certain topic based on defined criteria and based on the title and abstract of each publication. A Python script was created to generate structured prompts consisting of text strings for instruction, title, abstract, and relevant criteria to be provided to an LLM. The relevance of a publication was evaluated by the LLM on a Likert scale (low relevance to high relevance). By specifying a threshold, different classifiers for inclusion/exclusion of publications could then be defined. The approach was used with four different openly available LLMs on ten published data sets of biomedical literature reviews and on a newly human-created data set for a hypothetical new systematic literature review.
The performance of the classifiers varied depending on the LLM being used and on the data set analyzed. Regarding sensitivity/specificity, the classifiers yielded 94.48%/31.78% for the FlanT5 model, 97.58%/19.12% for the OpenHermes-NeuralChat model, 81.93%/75.19% for the Mixtral model and 97.58%/38.34% for the Platypus 2 model on the ten published data sets. The same classifiers yielded 100% sensitivity at a specificity of 12.58%, 4.54%, 62.47%, and 24.74% on the newly created data set. Changing the standard settings of the approach (minor adaption of instruction prompt and/or changing the range of the Likert scale from 1–5 to 1–10) had a considerable impact on the performance.
LLMs can be used to evaluate the relevance of scientific publications to a certain review topic and classifiers based on such an approach show some promising results. To date, little is known about how well such systems would perform if used prospectively when conducting systematic literature reviews and what further implications this might have. However, it is likely that in the future researchers will increasingly use LLMs for evaluating and classifying scientific publications.
Peer Review reports
Systematic literature reviews (SLRs) summarize knowledge about a specific topic and are an essential ingredient for evidence-based medicine. Performing an SLR involves a lot of effort, as it requires researchers to identify, filter, and analyze substantial quantities of literature. Typically, the most relevant out of thousands of publications need to be identified for the topic and key information needs to be extracted for the synthesis. Some estimates indicate that systematic reviews typically take several months to complete [ 1 , 2 ], which is why the latest evidence may not always be taken into consideration.
Title and abstract screening forms a considerable part of the systematic reviewing workload. In this step, which typically follows defining a search strategy and precedes the full-text screening of a smaller number of search results, researchers determine whether a certain publication is relevant for inclusion in the systematic review based on title and abstract. Automating title and abstract screening has the potential to save time and thereby accelerate the translation of evidence into practice. It may also make the reviewing methodology more consistent and reproducible. Thus, the automation or semi-automation of this part of the reviewing workflow has been of longstanding interest [ 3 , 4 , 5 ].
Several approaches have been developed that use machine learning (ML) to automate or semi-automate screening [ 1 , 6 ]. For example, systematic review software applications such as Covidence [ 7 ] and EPPI-Reviewer [ 8 ] (which use the same algorithm) offer ML-assisted ranking algorithms that aim to show the most relevant publications for the search criteria higher in the reviewing to speed up the manual review process. Elicit [ 9 ] is a standalone literature discovery tool that also offers an ML-assisted literature search facility. Furthermore, several dedicated tools have been developed to specifically automate title and abstract screening [ 1 , 10 ]. Examples include Rayyan [ 11 ], DistillerSR [ 12 ], Abstrackr [ 13 ], RobotAnalyst [ 14 ], and ASReview [ 5 ]. These tools typically work via different technical strategies drawn from ML and topic modeling to enable the system to learn how similar new articles are to a core set of identified ‘good’ results for the topic. These approaches have been found to lead to a considerable reduction in the time taken to complete systematic reviews [ 15 ].
Most of these systems require some sort of pre-selection or specific training for the larger corpus of publications to be analyzed (e.g., identification of some “relevant” publications by a human so that the algorithm can select similar papers) and are thus not fully automated.
Furthermore, dedicated models are required that are built for the specific purpose together with appropriate training data. Fully automated systems that achieve high levels of performance and can be flexibly applied to various topics have not yet been realized.
Large language models (LLMs) are an approach to natural language processing in which very large-scale neural networks are trained on vast amounts of textual data to generate sequences of words in response to input text. These capable models are then subject to different strategies for additional training to improve their performance on a wide range of tasks. Recent technological advancements in model size, architecture, and training strategies have led to general-purpose dialog LLMs achieving and exceeding state-of-the-art performance on many benchmark tasks including medical question answering [ 16 ] and text summarization [ 17 ].
Recent progress in the development of LLMs led to very capable models. While models developed by private companies such as GPT-3/GPT-3.5/GPT-4 from OpenAI [ 18 ] or PaLM and Gemini from Google [ 19 , 20 ] are among the most powerful LLMs currently available, openly available models are actively being developed by different stakeholders and in some cases achieve performances not far from the state of the art [ 21 ].
LLMs have shown remarkable capabilities in a variety of subjects and tasks that would require a profound understanding of text and knowledge for a human to perform. Among others, LLMs can be used for classification [ 22 ], information extraction [ 23 ], and knowledge access [ 24 ]. Furthermore, they can be flexibly adapted via prompt engineering techniques [ 25 ] and parameter settings, to behave in a desired way. At the same time, considerable problems with the usage of LLMs such as “hallucinations” of models [ 26 ], inherent biases [ 27 , 28 ], and weak alignment with human evaluation [ 29 ] have been described. Therefore, even though the text output generated by LLMs is based on objective statistical calculations, the text output itself is not necessarily factual and correct and furthermore incorporates subjectivity based on the training data. This implies, that an LLM-based evaluation system has a priori some fundamental limitations. However, using LLMs for evaluating scientific publications is a novel and interesting approach that may be helpful in creating fully automated and still flexible systems for screening and evaluating scientific literature.
To investigate whether and how well openly available LLMs can be used for evaluating the relevance of publications as part of an automated title and abstract screening system, we conducted a study to evaluate the performance of such an approach in the biomedical domain with modern openly available LLMs.
We designed an approach for evaluating the relevance of publications based on title and abstract using an LLM. This approach is based on the following strategy:
An instruction prompt to evaluate the relevance of a scientific publication for inclusion into an SLR is given to an LLM.
The prompt includes the title and abstract of the publication and the criteria that are considered relevant.
The prompt furthermore includes the request to return just a number as an answer, which corresponds to the relevance of the publication on a Likert scale (“not relevant” to “highly relevant”).
The prompt for each publication is created in a structured and automated way.
A numeric threshold may be defined which separates relevant publications from irrelevant publications (corresponding to the definition of a classifier).
The prompts are created in the following way:
Prompt = [Instruction] + [Title of publication] + [Abstract of publication] + [Relevant Criteria ] .
(“ + ” is not part of the final prompt but indicates the merge of the text strings).
[Instruction] is the text string describing the general instruction for the LLM to evaluate the publication. The LLM is asked to evaluate the relevance of a publication for an SLR on a numeric scale (low relevance to high relevance) based on the title and abstract of the publication and based on defined relevant criteria.
[Title of publication] is the text string “Title:” together with the title of the publication.
[Abstract of publication] is the text string “, Abstract:” together with the abstract of the publication.
[Relevant Criteria] is the text that describes the criteria to evaluate the relevance of a publication. The relevant criteria are defined beforehand by the researchers depending on the topic to determine which publications are relevant. The [Relevant Criteria] text string remains unchanged for all the publications that should be checked for relevance.
The answer to the LLM usually consists just of a digit on a numeric scale (e.g., 1–5). However, variations are acceptable if the answer can unambiguously be assigned to one of the possible scores on the Likert scale (e.g., the answer “The relevance of the publication is 3.” can unambiguously be assigned to the score 3). This assignment of answers to a score can be automated with a string-search command, meaning a simple regular expression command searching for a positive integer number, which will be extracted from the text string.
A request is sent to the LLM for each publication in the corpus. In cases for which an LLM provided an invalid (unprocessable) response for a publication, that response was excluded from the direct downstream analysis. It was determined for how many publications invalid responses were given and how many of these publications would have been relevant.
A schematic illustration of the approach is shown in Fig. 1 . An example of a prompt is provided in Supplementary material 1: Appendix 1.
Schematic illustration of the LLM-based approach for evaluating the relevance of a scientific publication. In this example, a 1–5 scale and a 3 + classifier are used
A Python script was created to automate the process and to apply it to a data set with a collection of different publications.
With the publications being sorted into different relevance groups, a threshold can be defined, which is used by a classifier to separate relevant from irrelevant publications. For example, a 3 + classifier would classify publications with a score of ≥ 3 as relevant, and publications with a score < 3 as irrelevant.
The performance of the approach was tested with different LLMs, data sets and settings as described in the following:
A variety of different models were tested. To investigate the approach with different LLMs (that are also diverse regarding design and training data), the following four models were used in the experiments:
FlanT5-XXL (FlanT5) is an LLM developed by Google Research. It’s a variant of the T5 (text-to-text) model, that utilizes a unified text-to-text framework allowing it to perform a wide range of NLP tasks with the same model architecture, loss function, and hyperparameters. FlanT5 is a variant that was enhanced through fine-tuning over a thousand additional tasks and supporting more languages. It is primarily used for research in various areas of natural language processing, such as reasoning and question-answering [ 30 , 31 ].
OpenHermes-2.5-neural-chat-7b-v3-1-7B (OHNC) [ 32 ] is a powerful open-source LLM, which was merged from the two models OpenHermes 2.5 Mistral 7B [ 33 ] and Neural-Chat (neural-chat-7b-v3-1) [ 34 ]. Despite having only 7 billion parameters it performs better than some larger models on various benchmarks.
Mixtral-8 × 7B-Instruct v0.1 (Mixtral) is a pretrained generative Sparse Mixture of Experts LLM developed by Mistral AI [ 35 , 36 ]. It was reported to outperform powerful models like gpt-3.5-turbo, Claude-2.1, Gemini Pro, and Llama 2 70B-chat on human benchmarks.
Platypus2-70B-Instruct (Platypus 2) is a powerful language model with 70 Billion parameters [ 37 ]. The model itself is a merge of the models Platypus2-70B and SOLAR-0-70b-16bit (previously published as LLaMa-2-70b-instruct-v2) [ 38 ].
A list of several data sets for SLRs is provided to the public by the research group of the ASReview tool [ 39 ]. The list contains data sets on a variety of different biomedical subjects of previously published SLRs. For testing the LLM approach on an individual data set, the [Relevant Criteria] string for each data set was created based on the description in the publication of the corresponding SLR. We tested the approach on a total of ten published data sets covering different biomedical topics (Table 1 , Supplementary material 2: Appendix 2).
To test the approach also in a prospective setting on a not previously published review, we created a data set for a new, hypothetical SLR, for which title and abstract screening should be performed.
The use case was an SLR on “Clinical Decision Support System (CDSS) tools for physicians in radiation oncology”. A CDSS is an information technology system developed to support clinical decision-making. This general definition may include diagnostic tools, knowledge bases, prognostic models, or patient decision aids [ 50 ]. We decided that the hypothetical SLR should be only about software-based systems to be used by clinicians for decision-making purposes in radiation oncology. We defined the following criteria for the [Relevant Criteria] text of the provided prompt:
Only inclusion of original articles, exclusion of review articles.
Publications examining one or several clinical decision-support systems relevant to radiation therapy.
Decision-support systems are software-based.
Exclusion of systems intended for support of non-clinicians (e.g., patient decision aids).
Publications about models (e.g., prognostic models) should only be included if the model is intended to support clinical decision-making as part of a software application, which may resemble a clinical decision support system.
The following query was used for searching relevant publications on PubMed: “(clinical decision support system) AND (radiotherapy OR radiation therapy)”.
Titles and abstracts of all publications found with the query were collected. A human-based title and abstract screening was performed to obtain the ground truth data set. Two researchers (FD and NC) independently labeled the publications as relevant/not relevant based on the title and abstract and based on the [Relevant criteria] string. The task was to label those publications relevant that may be of interest and should be analyzed as full text, while all other publications should be labeled irrelevant. After labeling all publications, some of the publications were deemed relevant only by one of the two researchers. To obtain a final decision, a third researcher (PMP) independently did the labeling for the undecided cases.
The aim was to create a human-based data set purely representing the process of title and abstract screening without further information or analysis.
A manual title and abstract screening was conducted on 521 publications identified in the search with 36 publications being identified as relevant and labeled accordingly in the data set. This data set was named “CDSS_RO”. It should be noted that this data set is qualitatively different from the 10 published data sets, as not only the publications that may be finally included in an SLR are labeled as relevant, but all publications that should be analyzed in full text based on title and abstract. The file is provided at https://github.com/med-data-tools/title-abstract-screening-ai ).
Standard parameters.
The LLM-based title and abstract screening as described above requires the definition of some parameters. The standard settings for the approach were the following:
[Instruction] string: We used the following standard [Instruction] string:
“On a scale from 1 (very low probability) to X (very high probability), how would you rate the relevance of the following scientific publication to be included in a systematic literature review based on the relevant criteria and based on title and abstract?”
Range of scale: defines the range of the Likert scale mentioned in the [Instruction] string (marked as X in the standard string above). For the standard settings, a value of 5 was used.
Model parameters of the LLMs were defined in the source code. To obtain reproducible results, the model parameters were set accordingly for the model to become deterministic (e.g., the temperature value is a parameter that defines how much variation a response of a model should have. Values greater than 0 add a random element to the output, which should be avoided for the reproducibility of the LLM-based title and abstract screening).
The behavior of an LLM is highly dependent on the provided prompt. Adequate adaptation of the prompt may be used to improve the performance of an LLM for certain tasks [ 25 ]. To investigate what impact a slightly adapted version of the Instruction prompt would have on the results, we added the string “(Note: Give a low score if not all criteria are fulfilled. Give only a high score if all or almost all criteria are fulfilled.)” in the instruction prompt as additional instruction and examined the impact on the performance. Furthermore, the range of the scale was changed from 1–5 to 1–10 in some experiments to investigate what impact this would have on the performance.
The performance of the approach, depending on models and threshold, was determined by calculating the sensitivity (= recall), specificity, accuracy, precision, and F1-score of the system, based on the amount of correctly and incorrectly included/excluded publications for each data set.
The LLM-based title and abstract screening was compared to another, recently published approach for fully automated title and abstract screening. This approach, developed by Natukunda et al., uses an unsupervised Latent Dirichlet Allocation-based topic model for screening [ 51 ]. Unlike the LLM-based approach, it does not require an additional [Relevant Criteria] string, but defined search keywords to determine which publications are relevant. The approach was used to do a screening on the ten published data sets as well as on the CDSS_RO data set. To obtain the required keywords we processed the text of the used search terms by splitting combined text into individual words and removing stop words, duplicates, and punctuation (as described in the original publication of Natukunda et al.).
The LLM-based screening with a Likert scale of 1–5 provided clear results for evaluating the relevance of a publication in the majority of cases. Out of the total of 44,055 publications among the 10 published data sets, valid and unambiguously assignable answers were given for 44,055 publications (100%) by the FlanT5 model, for 44,052 publications (99.993%) by the OHNC model, for 44,026 publications (99.93%) by the Mixtral model and for 44,054 publications (99.998%) by the Platypus 2 model. The few publications for which an invalid answer was given were excluded from further analysis. None of the excluded publications was relevant. The distribution of scores given was different between the different models. For example, the OHNC model ranked the majority of publications with a score of 3 (47.2%) or 4 (34.2%), while the FlanT5 model ranked almost all publications with a score of either 4 (68.1%) or 2 (31.7%). For all models, the group of publications labeled as relevant in the data sets was ranked with higher scores compared to the overall group of publications (mean score of 3.89 compared to 3.38 for FlanT5, 3.86 compared to 3.14 for OHNC, 4.16 compared to 2.12 for Mixtral and 3.80 compared to 2.92 for Platypus 2). An overview is provided in Fig. 2 .
Distribution of scores given by the different models
Based on the scores given, according classifiers that label publications with a score of greater than or equal to “X” as relevant, have higher rates of sensitivity and lower rates of specificity with decreasing threshold (decreasing “X”).
Classifiers with a threshold of ≥ 3 (3 + classifiers) were further analyzed, as these classifiers were considered to correctly identify the vast majority of relevant publications (high sensitivity) without including too many irrelevant publications (sufficient specificity). The 3 + classifiers had a sensitivity/specificity of 94.8%/31.8% for the FlanT5 model, of 97.6%/19.1% for the OHNC model, of 81.9%/75.2% for the Mixtral model, and of 97.2%/38.3% for the Platypus 2 model on all ten published data sets. The performance of the classifiers was quite different depending on the data set used (Fig. 3 ). Detailed results on the individual data sets are presented in Supplementary material 3: Appendix 3.
Sensitivity and specificity of the 3 + classifiers on different data sets using different models. Each data point represents the results of one of the data sets
The highest specificity at 100% sensitivity was seen for the Mixtral model on the data set Wolters_2018 with all 19 relevant publications being scored with 3–5, while 4410 of 5019 irrelevant publications were scored with 1 or 2 (specificity of 87.87%). The lowest sensitivity was observed with the Mixtral model on the dataset Jeyaraman_2021 with 23.96% sensitivity at 94.63% specificity.
On the newly created manually labeled data set, the 3 + classifiers had 100% sensitivity for all four models with specificity ranging from 4.54 to 62.47%. The results of the LLM-based title and abstract screening, dependent on the threshold for the classifiers are presented as receiver operating characteristics (ROC) curves in Fig. 4 as well as in Supplementary material 3: Appendix 3.
Receiver operating characteristics (ROC) curves of the LLM-based title and abstract screening for the different models on the CDSS_RO data set
Several runs of the Python script with different settings (adapted [Instruction] string and/or range of scale 1–10 instead of 1–5) were performed, which led to different results. Minor adaptation of the Instruction string with an additional demand to focus on the mentioned criteria had a different impact on the performance of the classifiers depending on the LLM used. While the sensitivity of the 3 + classifiers remained at 100% for all four models, the specificity was lower for the OHNC model (2.89% vs. 4.54%), the Mixtral model (56.29% vs. 62.47%) and the Platypus 2 model (15.88% vs. 24.74%), while it was higher for the FlanT5 model (25.15% vs. 12.58%).
Changing the range of scale from 1–5 to 1–10 and using a 6 + classifier instead of a 3 + classifier led to a lower sensitivity for the OHNC model (97.22% vs. 100%), while increasing the specificity (13.49% vs. 4.54%). For the other models, the sensitivity remained at 100% with higher specificity for the Platypus 2 model (51.34% vs. 24.74%) and the FlanT5 model (50.52% vs. 12.58%). The specificity was unchanged for the Mixtral model at 62.47%, which was the highest value among all combinations at 100% sensitivity. No combination of the settings for a range of scales and with/without prompt adaptation was superior among all models. An overview of the results is provided in Fig. 5 .
Performance of the classifiers depending on adaptation of the prompt and on the range of scale
The screening approach developed by Natukunda et al. achieved an overall sensitivity of 52.75% at 56.39% specificity on the ten published data sets. As for the LLM-based screening, the performance of this approach was dependent on the data set analyzed. The lowest sensitivity was observed for the Jeyaraman_2021 data set (1.04%), while the highest sensitivity was observed for the Wolters_2018 dataset (100%). Compared to the 3 + classifier with the Mixtral model, the LLM-based approach had higher sensitivity on 9 data sets and equal sensitivity on 1 data set, while it had higher specificity on 6 data sets and lower specificity on 4 data sets.
On the CDSS_RO data set, the approach of Natukunda et al. achieved 94.44% sensitivity (lower than all four LLMs) at 39.59% specificity (lower than the Mixtral model and higher than the FlanT5, OHNC, and Platypus 2 models). Further data on the comparison is provided in Supplementary material 4: Appendix 4.
We developed and elaborated a flexible approach to use LLMs for automated title and abstract screening that has shown some promising results on a variety of biomedical topics. Such an approach could potentially be used to automatically pre-screen the relevance of publications based on title and abstract. While the results are far from perfect, using LLMs for evaluating the relevance of publications could potentially be helpful (e.g., as a pre-processing step) when performing an SLR. Furthermore, the approach is widely applicable without the development of custom tools or training custom models.
A variety of different ML and AI tools have been developed to assist researchers in performing SLRs [ 5 , 10 , 52 , 53 ]. Fully automated systems (like the LLM-based approach presented in our study) still fail to differentiate relevant from irrelevant publications near the level of human evaluation [ 51 , 54 ].
A well-functioning fully automated title and abstract screening system that could be used on different subjects in the biomedical domain and possibly also in other scientific areas would be very valuable. While human-based screening is the current gold standard, it has considerable drawbacks. From a methodological point of view, one major problem of human-based literature evaluation, including title and abstract screening, is the subjectivity of the process [ 55 ]. Evaluating the publications (based on title and abstract) is dependent on the experience and individual judgments of the person doing the screening. To overcome this issue, SLRs of high quality require multiple independent researchers to do the evaluation with specific criteria upon inclusion/exclusion defined beforehand [ 56 ]. Nevertheless, subjectivity remains an unresolved issue, which also limits the reproducibility of results. From a practical point of view, another major problem is the considerable workload needed to be performed by humans, especially if thousands of publications need to be assessed, which is multiplied by the need to have multiple reviewers and to discuss disagreements. The challenge of workload is not just a matter of inconvenience, as SLRs on subjects that require tens of thousands of publications to be searched, may just not be feasible for small research teams to do, or may already be outdated after the time it would take to do the screening and analyze the results.
While fully automated screening approaches may also be affected by subjectivity (since the training data of models is itself generated by processes which are affected by subjectivity), the results would at least be more reproducible, and automation can be applied at scale in order to overcome the problem of practicability.
While current fully automated systems cannot replace humans in title and abstract screening, they may nevertheless be helpful. Such systems are already being used in systematic reviews and most likely their usage will continue to grow [ 57 ].
Ideally, a fully automated system should not miss a single relevant publication (100% sensitivity) while minimizing as far as possible the number of irrelevant publications included. This would allow confident exclusion of some of the retrieved search results which is a big asset to reducing time taken in manual screening.
By creating structured prompts with clear instructions, an LLM can feasibly be used for evaluating the relevance of a scientific publication. In comparison to some other solutions, the LLM-based screening may have some advantages. On the one hand, the flexible nature of the approach allows adaptation to a specific subject. Depending on the question, different prompts for relevant criteria and instructions can be used to address the individual research question. On the other hand, the approach can create reproducible results, given a fixed model, parameters, prompting strategy, and defined threshold. At the same time, it is scalable to process large numbers of publications. As we have seen, such an approach is feasible with a performance similar to or even better in comparison to other current solutions like the approach of Natukunda et al. However, it should be noted that the performance varied considerably depending on which of the 10 + 1 data sets were used.
While we investigated LLMs for evaluating the relevance of publications and in particular for title and abstract screening, it is being discussed how these models may be used for a variety of tasks in literature analysis [ 58 , 59 ]. For example, Wang et al. obtained promising results when investigating if ChatGPT may be used for writing Boolean Queries for SLRs [ 60 ]. Aydin et al., also using ChatGPT, employed the LLM to write an entire Literature Review about Digital Twins in Healthcare [ 61 ].
Guo et al. recently performed a study using the OpenAI API with gpt-3.5 and gpt-4 to create a classifier for clinical reviews [ 62 ]. They observed promising results when comparing the performance of the classifier against human-based screening with a sensitivity of 76% at 91% specificity on six different review papers. In contrast to our approach, they used a Boolean classifier instead of a Likert scale. Another approach was developed by Akinseloyin et al., who used ChatGPT to create a method for citation screening by ranking the relevance of publications using a question-answering framework [ 63 ].
The question may arise what the purpose of using a Likert scale instead of a direct binary classifier is (also since some models only rarely use some of the score values; see e.g., FlanT5 in Fig. 2 ). The rationale for using the Likert scale arose out of some preliminary, unsystematic explorations we conducted using different models and ranges of scale (including binary). We realized that using a Likert scale has some advantages as it sorts the publications into several groups depending on the estimated relevance. This also allows flexible adjustment of the threshold (which may potentially also be useful if the user wants to rather focus on sensitivity or rather on specificity).
However, there seem to be several feasible approaches and frameworks to use LLMs for the screening of publications.
It should be noted that an LLM-based approach for evaluating the relevance of publications might just as well be used for a variety of different classification tasks in literature analysis. For example, one may adopt the [Instruction prompt] asking the LLM not to evaluate the relevance of a publication on a Likert scale, but for classification into several groups like “original article”, “trial”, “letter to the editor”, etc. From this point of view, the title and abstract screening is just a special use case of LLM-based classification.
The capabilities of LLMs and other AI models will continue to evolve, which will increase the performance of fully automated systems. As we have seen, the results are highly dependent on the LLM used for the approach. In any case, there may still be substantial room for improvement and optimization and it currently is unclear what LLM-based approach with which prompts, models, and settings yields the best results over a large variety of data sets.
Furthermore, LLMs may not only be used for the screening of titles and abstracts but for the analysis of full-text documents. The newest generation of language and multimodal models may process whole articles or potentially also image data from publications [ 64 , 65 ]. Beyond that, LLM-based evaluation of scientific data and publications may only be one of several options for AI assistance in literature analysis. Future systems may combine different ML and AI approaches for optimal automated processing of literature and scientific data.
Even though the LLM-based screening presented in our work shows some promising results, it also has some drawbacks and limitations. While the open framework with adaptable prompts makes the approach flexible, the performance of the approach is highly dependent on the used model, the input parameters/settings, and the data set analyzed. If a slightly different instruction or another scale (1–10 instead of 1–5) is used, this can have a considerable impact on the performance. The classifiers analyzed in our study failed to consistently identify relevant publications at 100% sensitivity without considerably impairing the specificity. In academic research, the bar for automated screening tools needs to be very high, as ideally not a single relevant publication should be missed. The LLM-based title and abstract screening requires the definition of clear criteria for inclusion/exclusion. For research questions with less clear relevance criteria, LLMs may not be that useful for the evaluation. This may potentially be one reason, why the performance of the approach was quite different in our study depending on the data set analyzed. Overall, there are still many open questions, and it is unclear if and how high levels of performance can be consistently guaranteed so that such a system can be relied on. It is interesting that the Mixtral model, even though it seemed to have the highest level of performance on average, performed poorly with low sensitivity on one data set (Fig. 3 ). Further research is needed to investigate the requirements for good performance of the LLMs in evaluating scientific literature.
Another limitation of the approach in its current form is a considerable demand for resources regarding calculation power and hardware equipment. Answering thousands of long text prompts with modern, multi-billion-parameter LLMs requires sufficient IT infrastructure and calculation power to perform. The issue of resource demand is especially relevant if many thousand publications are evaluated and if very complex models are used.
On a more fundamental level, there are some general issues regarding the use of LLMs for literature studies. LLMs calculate the probability for a sequence of words based on their training data which derives from past observations and knowledge. They can thereby inherit unwanted features and biases (such as for example ethnic or gender biases) [ 29 , 66 ]. In a recent study by Koo et al., it was shown that the cognitive biases and preferences of LLMs are not the same as the ones of humans as a low correlation between ratings given by LLMs and humans was observed [ 67 ]. The authors therefore stated that LLMs are currently not suitable as fair and reliable automatic evaluators. Considering that using LLMs for evaluating and processing scientific publications may be seen as a problematic and questionable undertaking. However, the biases present in language models affect different tasks differently, and it remains to be seen how they might differentially affect different screening tasks in the literature review [ 28 ].
Nevertheless, it is most likely that LLMs and other AI solutions will be increasingly used in conducting and evaluating scientific research [ 68 ]. While this certainly will provide a lot of chances and opportunities, it is also potentially concerning. The amount and proportion of text being written by AI models is increasing. This includes not only public text on the Internet but also scientific literature and publications [ 69 , 70 ]. The fact that ChatGPT has been chosen as one of the top researchers of the year 2023 by Nature and has frequently been listed as co-author, shows how immediate the impact of the development has already been [ 71 ]. At the same time, most LLMs are trained on large amounts of text provided on the Internet. The idea that in the future LLMs might be used to evaluate publications written with the help of LLMs that may themselves be trained on data created by LLMs may lead to disturbing negative feedback loops which decrease the quality of the results over time [ 72 ]. Such a development could actually undermine academia and evidence-based science [ 73 ], also due to the known fact that LLMs tend to “hallucinate”, meaning that a model may generate text with illusory statements not based on correct data [ 26 ]. It is important to be aware that LLMs are not directly coupled to evidence and that there is no restriction preventing a model from generating incorrect statements. As part of a screening tool assigning just a score value to the relevance of a publication, this may be a mere factor impairing the performance of the system – yet for LLM-based analysis in general this is a major problem.
The majority of studies that so far have been published on using LLMs for publication screening used the currently most powerful models that are operated by private companies—most notably the ChatGPT models GPT-3.5 and GPT-4 developed by OpenAI [ 18 , 74 ]. Using models that are owned and controlled by private companies and that may change over time is associated with additional major problems when using them for publication screening, such as a lack of reproducibility. Therefore, after initial experiments with such models, we decided to use openly available models for our study.
Our study has some limitations. While we present a strategy for using LLMs to evaluate the relevance of publications for an SLR, our work does not provide a comprehensive analysis of all possible capabilities and limitations. Even though we achieved promising results on ten published data sets and a newly created one in our study, generalization of the results may be limited as it is not clear how the approach would perform on many other subjects within the biomedical domain more broadly and within other domains. To get a more comprehensive understanding, thorough testing with many more data sets about different topics would be needed, which is beyond the scope of this work. Testing the screening approach on retrospective data sets is also per se problematic. While a good performance on retrospective data should hopefully indicate a good performance if used prospectively on a new topic, this does not have to be the case [ 75 ]. Indeed, naively assuming a classifier that was tested on retrospective data will perform equally on a new research question is clearly problematic, since a new research question in science is by definition new and unfamiliar and therefore will not be represented in previously tested data sets.
Furthermore, models that are trained on vast amounts of scientific literature may even have been trained on some publications or the reviews that are used in the retrospective benchmarking of an LLM-based classifier, which obviously creates a considerable bias. To objectively assess how well an LLM-based solution can evaluate scientific publications for new research questions, large cultivated and independent prospective data sets on many different topics would be needed, which will be very challenging to create. It is interesting that the LLM-based title and abstract screening in our study would have also performed well on our new hypothetical SLR on CDSS in radiation therapy, but of course, this alone is a too limited data basis from which to draw general conclusions. Therefore, it currently cannot be reliably known in which situations such an LLM-based evaluation may succeed or may fail.
Regarding the ten published data sets, the results also need to be interpreted with caution. These data sets may not truly represent the singular task of title and abstract screening. For example, in the Appenzeller-Herzog_2020 data set, only the 26 publications that were finally included (not only after title and abstract screening but also after further analysis) were labeled as relevant [ 40 ]. While these publications ideally should be correctly identified by an AI-classifier, there may be other publications in the data set, that per se cannot be excluded solely based on title and abstract. Furthermore, we had to retrospectively define the [Relevant Criteria] string based on the text in the publication of the SLR. This obviously is a suboptimal way to define inclusion and exclusion criteria, as the defined string may not completely align with the criteria intended by the researchers of the SLR.
We also want to emphasize that the comparison with the approach of Natukunda et al. needs to be interpreted with caution since the two approaches are not based on exactly the same prerequisites: the LLM-based approach requires a [Relevant Criteria] string, while the approach of Natukunda et al. requires defined keywords.
While overall our work shows that LLM-based title and abstract screening is possible and shows some promising results on the analyzed data sets, our study cannot fully answer the question of how well LLMs would perform if they were used for new research. Even more importantly, we cannot answer the question of to what extent LLMs should be used for conducting literature reviews and for doing research.
Large language models can be used for evaluating the relevance of publications for SLRs. We were able to implement a flexible and cross-domain system with promising results on different biomedical subjects. With the continuing progress in the fields of LLMs and AI, fully automated computer systems may assist researchers in performing SLRs and other forms of scientific knowledge synthesis. However, it remains unclear how well such systems will perform when being used in a prospective manner and what implications this will have on the conduction of SLRs.
All data generated and analyzed during this study are either included in this published article (and its supplementary information files) or publicly available on the Internet. The Python script as well as the CDSS_RO data set are available under https://github.com/med-data-tools/title-abstract-screening-ai . The ten published data sets analyzed in our study are available on the GitHub Repository of the research group of the ASReview Tool [ 39 ].
Artificial intelligence
Application programming interface
Clinical Decision Support System
FlanT5-XXL model
Generative pre-trained transformer
Mixtral-8 × 7B-Instruct v0.1 model
Machine learning
Large language model
OpenHermes-2.5-neural-chat-7b-v3-1-7B model
Platypus2-70B-Instruct model
Receiver operating characteristic
Khalil H, Ameen D, Zarnegar A. Tools to support the automation of systematic reviews: a scoping review. J Clin Epidemiol. 2022;144:22–42.
Article PubMed Google Scholar
Clark J, Scott AM, Glasziou P. Not all systematic reviews can be completed in 2 weeks—But many can be (and should be). J Clin Epidemiol. 2020;126:163.
Clark J, Glasziou P, Del Mar C, Bannach-Brown A, Stehlik P, Scott AM. A full systematic review was completed in 2 weeks using automation tools: a case study. J Clin Epidemiol. 2020;121:81–90.
Pham B, Jovanovic J, Bagheri E, Antony J, Ashoor H, Nguyen TT, et al. Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow. Syst Rev. 2021;10(1):156.
Article PubMed PubMed Central Google Scholar
van de Schoot R, de Bruin J, Schram R, Zahedi P, de Boer J, Weijdema F, et al. An open source machine learning framework for efficient and transparent systematic reviews. Nat Mach Intell. 2021;3(2):125–33.
Article Google Scholar
Hamel C, Hersi M, Kelly SE, Tricco AC, Straus S, Wells G, et al. Guidance for using artificial intelligence for title and abstract screening while conducting knowledge syntheses. BMC Med Res Methodol. 2021;21(1):285.
Covidence [Internet]. [cited 2024 Jan 14]. Available from: www.covidence.org .
Machine learning functionality in EPPI-Reviewer [Internet]. [cited 2024 Jan 14]. Available from: https://eppi.ioe.ac.uk/CMS/Portals/35/machine_learning_in_eppi-reviewer_v_7_web_version.pdf .
Elicit [Internet]. [cited 2024 Jan 14]. Available from: https://elicit.org/ .
Harrison H, Griffin SJ, Kuhn I, Usher-Smith JA. Software tools to support title and abstract screening for systematic reviews in healthcare: an evaluation. BMC Med Res Methodol. 2020;20(1):7.
Rayyan [Internet]. [cited 2024 Jan 14]. Available from: https://www.rayyan.ai/ .
DistillerSR [Internet]. [cited 2024 Jan 14]. Available from: https://www.distillersr.com/products/distillersr-systematic-review-software .
Abstrackr [Internet]. [cited 2024 Jan 14]. Available from: http://abstrackr.cebm.brown.edu/account/login .
RobotAnalyst [Internet]. [cited 2024 Jan 14]. Available from: http://www.nactem.ac.uk/robotanalyst/ .
Clark J, McFarlane C, Cleo G, Ishikawa Ramos C, Marshall S. The impact of systematic review automation tools on methodological quality and time taken to complete systematic review Tasks: Case Study. JMIR Med Educ. 2021;7(2): e24418.
Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023 [cited 2024 Jan 14]; Available from: https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2804309 .
Tang L, Sun Z, Idnay B, Nestor JG, Soroush A, Elias PA, et al. Evaluating Large Language Models on Medical Evidence Summarization [Internet]. Health Informatics; 2023 Apr [cited 2024 Jan 14]. Available from: http://medrxiv.org/lookup/doi/ https://doi.org/10.1101/2023.04.22.23288967 .
OpenAI: GPT3-apps [Internet]. [cited 2024 Jan 14]. Available from: https://openai.com/blog/gpt-3-apps .
Google: PaLM [Internet]. [cited 2024 Jan 14]. Available from: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html .
Google: Gemini [Internet]. [cited 2024 Jan 14]. Available from: https://deepmind.google/technologies/gemini/#hands-on .
Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, et al. A Survey of Large Language Models. 2023 [cited 2024 Jan 14]; Available from: https://arxiv.org/abs/2303.18223 .
McNichols H, Zhang M, Lan A. Algebra error classification with large language models [Internet]. arXiv; 2023 [cited 2023 May 25]. Available from: http://arxiv.org/abs/2305.06163 .
Wadhwa S, Amir S, Wallace BC. Revisiting relation extraction in the era of large language models [Internet]. arXiv; 2023 [cited 2024 Jan 14]. Available from: http://arxiv.org/abs/2305.05003 .
Trajanoska M, Stojanov R, Trajanov D. Enhancing knowledge graph construction using large language models [Internet]. arXiv; 2023 [cited 2024 Jan 14]. Available from: http://arxiv.org/abs/2305.04676 .
Reynolds L, McDonell K. Prompt programming for large language models: beyond the few-shot paradigm [Internet]. arXiv; 2021 [cited 2024 Jan 14]. Available from: http://arxiv.org/abs/2102.07350 .
Guerreiro NM, Alves D, Waldendorf J, Haddow B, Birch A, Colombo P, et al. Hallucinations in Large Multilingual Translation Models [Internet]. arXiv; 2023 [cited 2024 Jan 14]. Available from: http://arxiv.org/abs/2303.16104 .
Zack T, Lehman E, Suzgun M, Rodriguez JA, Celi LA, Gichoya J, et al. Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study. Lancet Digital Health. 2024;6(1):e12-22.
Article CAS PubMed Google Scholar
Hastings J. Preventing harm from non-conscious bias in medical generative AI. Lancet Digital Health. 2024;6(1):e2-3.
Digutsch J, Kosinski M. Overlap in meaning is a stronger predictor of semantic activation in GPT-3 than in humans. Sci Rep. 2023;13(1):5035.
Article CAS PubMed PubMed Central Google Scholar
Huggingface: FlanT5-XXL [Internet]. [cited 2024 Jan 14]. Available from: https://huggingface.co/google/flan-t5-xxl .
Chung HW, Hou L, Longpre S, Zoph B, Tay Y, Fedus W, et al. Scaling Instruction-Finetuned Language Models [Internet]. arXiv; 2022 [cited 2024 Jan 14]. Available from: http://arxiv.org/abs/2210.11416 .
Huggingface: OpenHermes-2.5-neural-chat-7b-v3–1–7B [Internet]. [cited 2024 Jan 14]. Available from: https://huggingface.co/Weyaxi/OpenHermes-2.5-neural-chat-7b-v3-1-7B .
Huggingface: OpenHermes-2.5-Mistral-7B [Internet]. [cited 2024 Jan 14]. Available from: https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B .
Huggingface: neural-chat-7b-v3–1 [Internet]. [cited 2024 Jan 14]. Available from: https://huggingface.co/Intel/neural-chat-7b-v3-1 .
Huggingface: Mixtral-8x7B-Instruct-v0.1 [Internet]. [cited 2024 Jan 14]. Available from: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1 .
Jiang AQ, Sablayrolles A, Roux A, Mensch A, Savary B, Bamford C, et al. Mixtral of Experts [Internet]. [cited 2024 Jan 14]. Available from: http://arxiv.org/abs/2401.04088 .
Huggingface: Platypus2–70B-Instruct [Internet]. [cited 2024 Jan 14]. Available from: https://huggingface.co/garage-bAInd/Platypus2-70B-instruct .
Huggingface: SOLAR-0–70b-16bit [Internet]. [cited 2024 Jan 14]. Available from: https://huggingface.co/upstage/SOLAR-0-70b-16bit#updates .
Systematic Review Datasets: ASReview [Internet]. [cited 2024 Jan 14]. Available from: https://github.com/asreview/systematic-review-datasets .
Appenzeller-Herzog C, Mathes T, Heeres MLS, Weiss KH, Houwen RHJ, Ewald H. Comparative effectiveness of common therapies for Wilson disease: a systematic review and meta-analysis of controlled studies. Liver Int. 2019;39(11):2136–52.
Bos D, Wolters FJ, Darweesh SKL, Vernooij MW, De Wolf F, Ikram MA, et al. Cerebral small vessel disease and the risk of dementia: a systematic review and meta-analysis of population-based evidence. Alzheimer’s & Dementia. 2018;14(11):1482–92.
Donners AAMT, Rademaker CMA, Bevers LAH, Huitema ADR, Schutgens REG, Egberts TCG, et al. Pharmacokinetics and associated efficacy of emicizumab in humans: a systematic review. Clin Pharmacokinet. 2021;60(11):1395–406.
Jeyaraman M, Muthu S, Ganie PA. Does the source of mesenchymal stem cell have an effect in the management of osteoarthritis of the knee? Meta-analysis of randomized controlled trials. CARTILAGE. 2021 Dec;13(1_suppl):1532S-1547S.
Leenaars C, Stafleu F, De Jong D, Van Berlo M, Geurts T, Coenen-de Roo T, et al. A systematic review comparing experimental design of animal and human methotrexate efficacy studies for rheumatoid arthritis: lessons for the translational value of animal studies. Animals. 2020;10(6):1047.
Meijboom RW, Gardarsdottir H, Egberts TCG, Giezen TJ. Patients retransitioning from biosimilar TNFα inhibitor to the corresponding originator after initial transitioning to the biosimilar: a systematic review. BioDrugs. 2022;36(1):27–39.
Muthu S, Ramakrishnan E. Fragility analysis of statistically significant outcomes of randomized control trials in spine surgery: a systematic review. Spine. 2021;46(3):198–208.
Oud M, Arntz A, Hermens ML, Verhoef R, Kendall T. Specialized psychotherapies for adults with borderline personality disorder: a systematic review and meta-analysis. Aust N Z J Psychiatry. 2018;52(10):949–61.
Van De Schoot R, Sijbrandij M, Depaoli S, Winter SD, Olff M, Van Loey NE. Bayesian PTSD-trajectory analysis with informed priors based on a systematic literature search and expert elicitation. Multivar Behav Res. 2018;53(2):267–91.
Wolters FJ, Segufa RA, Darweesh SKL, Bos D, Ikram MA, Sabayan B, et al. Coronary heart disease, heart failure, and the risk of dementia: A systematic review and meta-analysis. Alzheimer’s Dementia. 2018;14(11):1493–504.
Sutton RT, Pincock D, Baumgart DC, Sadowski DC, Fedorak RN, Kroeker KI. An overview of clinical decision support systems: benefits, risks, and strategies for success. npj Digit Med. 2020 Feb 6;3(1):17.
Natukunda A, Muchene LK. Unsupervised title and abstract screening for systematic review: a retrospective case-study using topic modelling methodology. Syst Rev. 2023;12(1):1.
Marshall IJ, Wallace BC. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst Rev. 2019 Dec;8(1):163, s13643–019–1074–9.
Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics. 2010;11(1):55.
Li D, Wang Z, Wang L, Sohn S, Shen F, Murad MH, et al. A text-mining framework for supporting systematic reviews. Am J Inf Manag. 2016;1(1):1–9.
PubMed PubMed Central Google Scholar
de Almeida CPB, de Goulart BNG. How to avoid bias in systematic reviews of observational studies. Rev CEFAC. 2017;19(4):551–5.
Siddaway AP, Wood AM, Hedges LV. How to do a systematic review: a best practice guide for conducting and reporting narrative reviews, meta-analyses, and meta-syntheses. Annu Rev Psychol. 2019;70(1):747–70.
Santos ÁOD, Da Silva ES, Couto LM, Reis GVL, Belo VS. The use of artificial intelligence for automating or semi-automating biomedical literature analyses: a scoping review. J Biomed Inform. 2023;142: 104389.
Haman M, Školník M. Using ChatGPT to conduct a literature review. Account Res. 2023;6:1–3.
Liu R, Shah NB. ReviewerGPT? An exploratory study on using large language models for paper reviewing [Internet]. arXiv; 2023 [cited 2024 Jan 14]. Available from: http://arxiv.org/abs/2306.00622
Wang S, Scells H, Koopman B, Zuccon G. Can ChatGPT write a good boolean query for systematic review literature search? [Internet]. arXiv; 2023 [cited 2024 Jan 14]. Available from: http://arxiv.org/abs/2302.03495 .
Aydın Ö, Karaarslan E. OpenAI ChatGPT generated literature review: digital twin in healthcare. SSRN Journal [Internet]. 2022 [cited 2024 Jan 14]; Available from: https://www.ssrn.com/abstract=4308687 .
Guo E, Gupta M, Deng J, Park YJ, Paget M, Naugler C. Automated paper screening for clinical reviews using large language models [Internet]. arXiv; 2023 [cited 2024 Jan 14]. Available from: http://arxiv.org/abs/2305.00844 .
Akinseloyin O, Jiang X, Palade V. A novel question-answering framework for automated citation screening using large language models [Internet]. Health Informatics; 2023 Dec [cited 2024 Jan 14]. Available from: http://medrxiv.org/lookup/doi/ https://doi.org/10.1101/2023.12.17.23300102 .
Koh JY, Salakhutdinov R, Fried D. Grounding language models to images for multimodal inputs and outputs. 2023 [cited 2024 Jan 14]; Available from: https://arxiv.org/abs/2301.13823 .
Wang L, Lyu C, Ji T, Zhang Z, Yu D, Shi S, et al. Document-level machine translation with large language models [Internet]. arXiv; 2023 [cited 2024 Jan 14]. Available from: http://arxiv.org/abs/2304.02210 .
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language Models are Few-Shot Learners [Internet]. arXiv; 2020 [cited 2024 Jan 14]. Available from: http://arxiv.org/abs/2005.14165 .
Koo R, Lee M, Raheja V, Park JI, Kim ZM, Kang D. Benchmarking cognitive biases in large language models as evaluators [Internet]. arXiv; 2023 [cited 2024 Jan 14]. Available from: http://arxiv.org/abs/2309.17012 .
Editorial —Artificial Intelligence language models in scientific writing. EPL. 2023 Jul 1;143(2):20000.
Grimaldi G, Ehrler BAI, et al. Machines Are About to Change Scientific Publishing Forever. ACS Energy Lett. 2023;8(1):878–80.
Article CAS Google Scholar
Grillo R. The rising tide of artificial intelligence in scientific journals: a profound shift in research landscape. Eur J Ther. 2023;29(3):686–8.
nature: ChatGPT and science: the AI system was a force in 2023 — for good and bad [Internet]. [cited 2024 Jan 14]. Available from: https://www.nature.com/articles/d41586-023-03930-6 .
Chiang CH, Lee H yi. Can large language models be an alternative to human evaluations? 2023 [cited 2024 Jan 6]; Available from: https://arxiv.org/abs/2305.01937 .
Erler A. Publish with AUTOGEN or perish? Some pitfalls to avoid in the pursuit of academic enhancement via personalized large language models. Am J Bioeth. 2023;23(10):94–6.
OpenAI: ChatGPT [Internet]. [cited 2024 Jan 14]. Available from: https://openai.com/blog/chatgpt .
Gates A, Gates M, Sebastianski M, Guitard S, Elliott SA, Hartling L. The semi-automation of title and abstract screening: a retrospective exploration of ways to leverage Abstrackr’s relevance predictions in systematic and rapid reviews. BMC Med Res Methodol. 2020;20(1):139.
Download references
Not applicable.
Authors and affiliations.
Department of Radiation Oncology, Cantonal Hospital of St. Gallen, St. Gallen, Switzerland
Fabio Dennstädt & Paul Martin Putora
Institute for Computer Science, University of Würzburg, Würzburg, Germany
Johannes Zink
Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland
Fabio Dennstädt, Paul Martin Putora & Nikola Cihoric
Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland
Janna Hastings
School of Medicine, University of St. Gallen, St. Gallen, Switzerland
Swiss Institute of Bioinformatics, Lausanne, Switzerland
You can also search for this author in PubMed Google Scholar
All authors contributed to designing the concept and methodology of the presented approach of LLM-based evaluation of the relevance of a publication to an SLR. The Python script was created by FD and JZ. The experiments were conducted by FD and JH. All authors contributed in writing and revising the manuscript. All authors have read and approved the final version of the manuscript.
Correspondence to Fabio Dennstädt .
Ethics approval and consent to participate, consent for publication, competing interests.
NC is a technical lead for the SmartOncology© project and medical advisor for Wemedoo AG, Steinhausen AG, Switzerland. The authors declare that they have no other competing interests.
Publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary material 1: appendix 1: sample prompt., supplementary material 2: appendix 2: relevant criteria of published datasets., supplementary material 3: appendix 3: performance of models on data sets., supplementary material 4: appendix 4: comparison with other approach., rights and permissions.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Reprints and permissions
Cite this article.
Dennstädt, F., Zink, J., Putora, P.M. et al. Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain. Syst Rev 13 , 158 (2024). https://doi.org/10.1186/s13643-024-02575-4
Download citation
Received : 17 June 2023
Accepted : 30 May 2024
Published : 15 June 2024
DOI : https://doi.org/10.1186/s13643-024-02575-4
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
ISSN: 2046-4053
Click through the PLOS taxonomy to find articles in your field.
For more information about PLOS Subject Areas, click here .
Loading metrics
Open Access
Peer-reviewed
Research Article
Roles Writing – original draft
Affiliation China Railway Fifth Bureau Group Co., Ltd., Guiyang, China
Roles Data curation, Formal analysis
Affiliation Geological Brigade of Guizhou Provincial Bureau of Geology and Mineral Resources, Zunyi, China
Roles Writing – review & editing
* E-mail: [email protected]
Affiliation Faculty of Resources and Environmental Engineering, Guizhou Institute of Technology, Guiyang, China
Roles Conceptualization, Supervision, Validation
Roles Methodology, Project administration
Affiliation Guizhou Natural Resources Survey and Planning Research Institute, Guiyang, China
Global warming, caused by greenhouse gas emissions, is a major challenge for all human societies. To ensure that ambitious carbon neutrality and sustainable economic development goals are met, regional human activities and their impacts on carbon emissions must be studied. Guizhou Province is a typical karst area in China that predominantly uses fossil fuels. In this study, a backpropagation (BP) neural network and extreme learning machine (ELM) model, which is advantageous due to its nonlinear processing, were used to predict carbon emissions from 2020 to 2040 in Guizhou Province. The carbon emissions were calculated using conversion and inventory compilation methods with energy consumption data and the results showed an "S" growth trend. Twelve influencing factors were selected, however, five with larger correlations were screened out using a grey correlation analysis method. A prediction model for carbon emissions from Guizhou Province was established. The prediction performance of a whale optimization algorithm (WOA)-ELM model was found to be higher than the BP neural network and ELM models. Baseline, high-speed, and low-carbon scenarios were analyzed and the size and time of peak carbon emissions in Liaoning Province from 2020 to 2040 were predicted using the WOA-ELM model.
Citation: Lian D, Yang SQ, Yang W, Zhang M, Ran WR (2024) Carbon peaking prediction scenarios based on different neural network models: A case study of Guizhou Province. PLoS ONE 19(6): e0296596. https://doi.org/10.1371/journal.pone.0296596
Editor: Salim Heddam, University 20 Aout 1955 skikda, Algeria, ALGERIA
Received: December 19, 2023; Accepted: May 13, 2024; Published: June 25, 2024
Copyright: © 2024 Lian et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This research was supported by Concealed Ore Deposit Exploration and Innovation Team of Guizhou Colleges and Universities (Guizhou Education and Cooperation Talent Team [2015]56), Provincial Key Discipline of Geological Resources and Geological Engineering of Guizhou Province (ZDXK[2018]001), Huang Danian Resources of National colleges and universities Teachers' Team of Exploration Engineering (Teacher Letter [2018] No. 1), Geological Resources and Geological Engineering Talent Base of Guizhou Province (RCJD2018-3), Key Laboratory of Karst Engineering Geology and Hidden Mineral Resources of Guizhou Province (Qianjiaohe KY [2018] No. 486Guizhou Institute of Technology Rural Revitalization Soft Science Project(2022xczx10), Education and Teaching Reform Research Project of Guizhou Institute of Technology (JGZD202107,2022TDFJG01).The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Global warming is a major issue for all countries, and one of the major strategies by which to relieve it is the reduction of greenhouse gases. In particular, the Fifth Assessment of Climate Change Bulletin of the United Nations Intergovernmental Panel on Climate Change (IPCC) objectively stated that climate warming is mainly caused by the burning of large amounts of fossil fuels during human activities, and that the alleviation of global warming has become an unavoidable responsibility of all countries. The increase in global average temperature caused by excessive CO 2 emissions seriously threatens the living space of human beings and sustainable development.
Global climate change is closely related to sustainable development goals worldwide, and consequently, many governments have taken relevant measures to actively address this issue. China has put forward the ambitious goal of "achieving peak carbon by 2030 and carbon neutrality by 2060.” To help achieve this they have increased research and development into activities relating to new energy-technologies that can help reduce the proportion of fossil energy use, thereby achieving a profound change in the energy consumption structure. To protect the ecological environment, water, wind, and tidal energy resources should be utilized alongside "pollution and carbon reduction" activities. The guidance for typical demonstrations should be improved to fully mobilize the enthusiasm of local governments, departments, industries, and enterprises and thus facilitate good working patterns.
In China, Guizhou Province is a major karst area whose energy consumption is predominantly from coal, oil, and other primary energy sources. In recent years, accelerated urbanization and rapid economic growth have increased its dependence on traditional energy sources, and consequently, total energy consumption for this area is considered high. Furthermore, in Guizhou Province, the dual impetus of urbanization and economic development has lead to increased energy consumption and consequently, annually increasing carbon emissions. Energy saving and emission reduction strategies for Guizhou province will ultimately affect low-carbon economic development for China as a whole. To achieve peak carbon emissions in China by 2030, efficient emission reduction policies must be implemented. The impacts of carbon emissions must be studied and monitored using effective scientific methods to help enable accurate predictions. Different neural network models have been used to predict peak carbon for Guizhou Province as this will help to reduce carbon emissions to meet Chinas "3060" goal ( Fig 1 ).
https://doi.org/10.1371/journal.pone.0296596.g001
Existing research on carbon emissions has predominantly focused on national and industrial levels, while regional level research has been limited. This study focuses on Guizhou Province and the total carbon emissions were calculated using relevant data on energy consumption and carbon emissions. A literature review was conducted to identify the factors influencing carbon emissions, machine learning technology was used to identify high correlations among factors, and the future trends for carbon emissions were predicted. The future carbon emissions were then analyzed using the predominant factors and carbon peak data. The results of this study will help guide future theoretical research on carbon emissions at the regional level ( Fig 1 ).
National carbon emission reduction work is required in all regions of a country, and regions with different levels of industrial development should implement differentiated policies. Guizhou Province is an old industrial base in China that predominantly used fossil fuel energy, and its greenhouse gas emissions, both historically and at a present, are generally high. Consequently, achieving Chinas carbon peak schedule will be difficult for Guizhou Province. A scientifically accurate calculation method for carbon emissions is thus required and will provide significant aid in the implementation of energy conservation and emission reduction strategies. Based on the optimized neural network model, this study predicts and analyzes the peak carbon emissions in Guizhou Province over the next 20 years, which helps to identify the existing problems and gaps and to guide the direction of energy development.
Research on factors influencing carbon emissions is focused on the contributions of different factors and predominantly selects economic indicators such as population, economy, and energy intensity to construct a relevant index system. Ang et al. [ 1 ] analyzed the changes in carbon dioxide produced per unit of electricity globally and considered the import and export of each country, the fuel structure of power generation, and emission factors as the main influencing factors. This study has revealed that an improvement in power generation efficiency is the main reason for reduced CO 2 emissions. Rustemoglu et al. [ 2 ] studied carbon dioxide levels in Brazil and Russia from 1992 to 2012 and identified the factors affecting carbon emissions as falling into employment, economy, and carbon emission intensity categories. The results showed that Brazil’s carbon dioxide emissions were not decoupled from its economic development. Russia’s carbon dioxide emissions have greatly reduced with an increase in energy intensity. Lin et al. [ 3 ] used the input-output method to analyze the carbon emissions of the national food industry and divided industrial carbon emissions into four main factors: pollution factor, total output, energy intensity, and energy structure. Roinioti et al. [ 4 ] considered the development level of the national economy, energy consumption intensity, fuel consumption capacity, and carbon emission intensity as the main factors affecting carbon dioxide. Kim et al. [ 5 ] decomposed the factors affecting carbon emissions into production scale and production intensity and conducted corresponding research and analysis on the contributions of growth in energy consumption in various sub-industries. Roman et al. [ 6 ] focused on Colombia and used the IDA-LMDI model to decompose the influencing factors affecting CO 2 emissions into five aspects, energy intensity, wealth value, fossil fuel substitution, and renewable energy development, to explore the contribution levels of different CO 2 increments.
A carbon emissions impact factor index system was derived in China, predominantly from the perspectives of energy structure, population growth, economics, and other factors. Ying et al. [ 7 ] considered China’s steel industry as the main research object, and their results showed that energy intensity has a greater impact on carbon emissions, but the role of the consumption structure was not as expected. Dewey et al. [ 8 ] studied the influencing factors of carbon dioxide generated by indirect consumption in the daily lives of Chinese residents and found that socioeconomic level is the main driving factor affecting CO 2 emissions for urban and rural residents, and the contributions of their different structures and their consumption proportions are inversely related to CO 2 emissions. Jingyan et al. [ 9 ] used regression analysis to study the carbon emissions of the thermal power industry in Guangdong Province, China. Whereas Ting et al. [ 10 ] used the LMDI method to conclude that economic growth and carbon dioxide levels change in the same direction; that is, the faster the economic growth, the more obvious the effect of promoting carbon dioxide emissions, whereas the effective utilization and conversion of energy can reduce carbon dioxide emissions. Xiaoming et al. [ 11 ] used an LMDI decomposition model to study the carbon emissions from 30 provinces and cities in China from 2004 to 2014 and explore the contributions of the important factors by dividing the time period with 2009 as the demarcation point. The results showed that the growth of the gross national product had the greatest impact on national carbon emissions, while the contributions of other factors were weak. Ying et al. [ 12 ] used energy intensity, economic development, and population size as influencing factors to study carbon emissions. Xiaoyong et al. [ 13 ] used high-resolution spatial data to place carbonate chemical weathering carbon sinks, silicate chemical weathering carbon sinks, vegetation-soil ecosystem carbon sinks, and energy carbon emissions on a spatial grid. Subsequently, a carbon neutral index model was established to reveal the contributions of terrestrial ecosystem carbon sinks to carbon neutrality. The results were compared with those of other countries from horizontal and vertical perspectives. The results provide a new ideas for the measurement of carbon neutralization capacity, and provide important reference values and data for the systematic determination of global carbon neutralization capacity. the model developed by Xiaoyong et al. is highly recognized in academia.
Previous studies have predicted the carbon emissions for different countries or industries using logistic regression models, the STIRPAT model, and scenario analysis. Ouedraogo et al. [ 14 ] used the LEAP framework to model the analysis and projection of energy demand and associated emissions under alternative strategies in Africa from 2010 to 2040. Lin et al. [ 15 ] conducted a survey of China’s manufacturing industry using the STIRPAT model and found that macroeconomic growth factors could determine the carbon dioxide emissions of the industry, while the effects of the fuel utilization and urbanization rates have significant regional heterogeneity. Wang et al.[ 16 ] constructed the STIRPAT model to investigate the factors influencing carbon emissions in Xinjiang from 1952 to 2012, and the results identified differences in the impacts of various factors in different historical periods. Prior to 1978, population size expansion was the main factor causing an increase in carbon emissions. From 1978 to 2000, economic growth and population size were the main factors driving increased carbon emissions, and after 2000, the main factors were increased economic development and fixed asset investment. Kachoee et al. [ 17 ] used the LEAP model to predict carbon dioxide emissions relating to Iran’s power sector over the next 30 years and concluded that economic growth is the main influencing factor. Emodi et al. [ 18 ] used the LEAP model to study climate change in relation to Australia’s power sector and found that reducing expenditure on environmental protection and resource conservation would produce economic benefits ( Fig 2 ).Liu [ 19 – 22 ] designed four plane-scale models of steel oblique beam structures and conducted quasi-static tests under cyclic loading, which clarified the yield mechanism, failure mode, hysteresis energy consumption, stiffness degradation, equivalent viscous resistance coefficient and lateral deformation performance of oblique beam structures, and provided technical basis for performance-based seismic design of oblique beam structures [ 23 ]. Jun-song Jia [ 24 ] takes Henan Province of China as a study area,computed the EF and the ecological carrying capacity (EC) in 1949–2006. Based on the computed results, the simulating process of the ARIMA model and the fitting and forecasting results were explained in detail. The final results demonstrated that ARIMA model could be used effectively in the simulation and prediction of EF and the predicted EF could help the decision-makers make a package of better planning for regional ecological balance or sustainable future.
https://doi.org/10.1371/journal.pone.0296596.g002
Neural network models are widely used in various fields [ 25 – 27 ]. Representative neural network models include BP neural networks, radial basis function networks, Hopfield models, GMDH networks, adaptive resonance theory, Boltzmann machines, and CPN models. Lapedes et al. used neural networks for economic forecasting in 1987, whereas Chunjuan et al. [ 27 ] applied neural networks to predict typhoons, debris flows, and geological subsidence. Fan et al. [ 28 ] established a POS-BP neural network model to predict the total carbon emissions and intensities of 30 provinces, municipalities, and autonomous regions in China. Ying et al. [ 29 ] compared the advantages of a neural network and other traditional prediction methods and used a BP neural network model combined with a terminal information collection system and Web Service technology to design an intelligent system for urban road-occupying parking and proved the feasibility of the management system using actual data. Xiaowei et al. [ 30 ] predicted the prices of stock investments by combining a neural network model with principal component analysis and multiple linear regression. Xiaolong et al. [ 31 ] and others studied the problem of gas outbursts in tunnels using a BP neural network. The results were good and verified that the predicted and real values are consistent. Xiaocheng et al. [ 32 ] also used a BP neural network to predict air pollutant concentrations. The original BP neural network was used to calculate the system error of all samples using successive iterations and batch processing, which improved the execution efficiency.
In summary, the influencing factors affecting carbon emissions are predominantly considered to be population, economy, and energy structure, and these are used to establish a carbon emission index system and applied in the follow-up prediction research. Carbon emission forecasting research has predominantly used traditional econometric methods such as the logistic regression model, STIRPAT model, and scenario analysis. Whereas neural network models have achieved satisfactory results when used for economic forecasting. A review of carbon emission-related literature shows that most recent studies have focused on the national or industrial levels, and few studies have focused on the use of machine learning algorithms for peak carbon emission predictions, while the reported neural network model used is relatively single. Training a single neural network prediction model is time-consuming and can easily fall into a local optimum. This study has thus combined a neural network model with carbon emission research and has optimized different algorithms to improve carbon emission predictions for Guizhou Province.
Carbon emissions from Guizhou Province are calculated using an inventory method based with energy consumption data for the province. Referring to relevant literature and combining it with the actual development of Guizhou Province, 12 factors affecting carbon emissions were selected to establish a characteristic subset. By introducing the grey correlation analysis method, indicators with a higher degree of influence were selected and applied to follow-up prediction research. Finally, the carbon emissions from 2020 to 2040 in Guizhou Province were predicted under three different development scenarios by establishing a prediction model based on the WOA-ELM.
Where B i represents the consumption of the i th energy source and n represents the type of energy source. In this study, n = 9 represents the energy consumptions of coal, coke, crude oil, gasoline, kerosene, diesel, natural gas, and electricity. The conversion and carbon emission coefficients for each energy standard coal in Guizhou Province are listed in Table 1 .
https://doi.org/10.1371/journal.pone.0296596.t001
From 2000 to 2012, the total carbon emissions from Guizhou Province increased ( Table 2 ), while from 2012 to 2016 they showed a decreasing trend. The decrease is related to the construction of the ecological civilization in Guizhou Province, and the practice of green mountains is Jinshan and Yinshan. In 2020, the total amount of carbon emissions in Guizhou Province was 1.1 million tonnes 22237, approximately twice that in 2002. With the development of a social economy, carbon emissions in Guizhou Province are expected to show a steady upward trend in the future. However, due to the inhibition of carbon emissions through the implementation of policies, such as those for carbon emission reduction, the establishment of a carbon trading market, and an increase in the proportion of new energy applications in Guizhou Province, the growth rate of carbon emissions will gradually decrease, and the development trend will be reduced year on year.
https://doi.org/10.1371/journal.pone.0296596.t002
Literature on carbon emission influencing factors shows that most scholars select macroeconomy, industrial structure, energy consumption, and scientific and technological development. Based on the actual social and economic development of Guizhou Province, this study has comprehensively considered the scientific, systematic, and authentic principles of index selection. Total population, urbanization rate, household consumption level, per capita GDP, energy intensity, carbon emission intensity, foreign direct investment, energy structure, proportion of primary secondary, and tertiary industries, and total energy consumption were selected and qualitatively analyzed. A description of each factor is as follows:
The selected indicators are shown in Table 3 .
https://doi.org/10.1371/journal.pone.0296596.t003
Total carbon emissions from Guizhou Province between 2000 and 2020 were calculated as basic data, and correlations with the population, economy, and energy data were determined. The data were obtained from the Statistical Yearbook of Guizhou Province 2000–2020.
To facilitate subsequent modeling and data representation, the variable names for the 12 factors affecting carbon emissions were redefined as X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, and X12. The total carbon emissions (Y) from Guizhou Province were set as the reference sequence, and the 12 influencing factors are set as the comparison sequence. The original data were normalized to eliminate dimensional influences, and the results are shown in Table 4 .
https://doi.org/10.1371/journal.pone.0296596.t004
To calculate the difference between the maximum and minimum absolute values in the matrix, the resolution coefficient was set to 0.5 to obtain the correlation coefficient table. The average values of the correlation coefficients for different sequences at each time point were used to obtain the correlation degree and for sorting. The results are summarized in Table 5 .
https://doi.org/10.1371/journal.pone.0296596.t005
The closer the correlation degree is to 1, the stronger the correlation with carbon emissions in Guizhou Province. The top five correlation degrees identified were for X12, X1, X2, X4, and X3. The influencing factors were total energy consumption, total population, urbanization rate, per capita gross domestic product (GDP), and residents’ consumption level. The correlation degrees, which were the main correlation factors, were all greater than 0.75. The correlation degrees for X7, X11, X10, X9, X8, and X5 were all between 0.5 and 0.7, indicating medium correlation. The correlation coefficient between energy intensity and carbon emissions in Guizhou Province was low at 0.487.
The relevant data regarding energy consumption in Guizhou Province were collected and the carbon emission data between 2000 and 2020 were determined. The results show that the total carbon emissions in Guizhou Province are closely related to economic development and relevant policies. Data from the existing literature was used to help analyze the data from Guizhou Province, this study initially sets 12 indicators, including the total population, urbanization rate, consumption level of residents, and per capita GDP, as the factors influencing carbon emissions and expounds the reasons for the selection in detail. The total energy consumption, total population, urbanization rate, per capita GDP, and residents’ consumption level all showed a high level of correlation with carbon emissions in Guizhou Province, and the five factors with strong correlation can be used as input variables in the prediction model to improve carbon emission prediction accuracy in Guizhou Province.
Prediction model design, establishment of a bp neural network model..
A BP neural network is composed of input, hidden, and output layers. When establishing a BP neural network, setting too many or too few hidden layer nodes will affects the results of the data; too many hidden layers are prone to overfitting, resulting in increased training time, and too few hidden layer nodes affects the accuracy of the data fitting. The number of hidden layers must be determined based on data characteristics. In this study, the number of hidden layer nodes [ 38 ] was determined by trial and error. The specific settings for each level and node are as follows:
The activation function introduced a nonlinear relationship into the neurons through mapping. To better represent the nonlinear relationship of a function, an appropriate type of activation function must be selected. The hyperbolic tangent function is a common activation function, which maps the number taking the value of (−∞, + ∞) into (−1, 1), so that the variable is in the largest possible threshold range, which can better preserve the nonlinear variation level of the function. The transfer function of the hidden layer node was thus chosen as the tangent S-type transfer function, tansig, for this study. The input and output values of the linear transfer function purelin can assume any value. To facilitate comparisons with the sample value, purelin was selected as the output value returned by the output layer, which refers to the change in the information accumulation speed of the BP neural network with time. Different learning rate settings affect the training time and the training effect of the model. The training speed of the model with a larger learning rate value is relatively fast; however, there are large fluctuations in the later period that the resulting model cannot convert. When its value is small, although it can make the simulation results of the model more accurate, it significantly increases the training time. In general studies, the learning rate γ is usually set between 0 and 1. In this paper, through continuous debugging and comparison of the training effect in the training process, the learning rate γ = 0.1 was selected. The accuracy of the network training was required to be 0.001, and the maximum number of training sessions was 500.
https://doi.org/10.1371/journal.pone.0296596.t006
The algorithm can be divided into three steps. The first step determines the number of neurons in a hidden layer and randomly generates a connection value between an input layer and the hidden layer, and a neuron bias for the hidden layer in a network model. The second step determines the activation function of the neurons in the hidden layer, and calculates the output matrix H for the hidden layer by selecting an infinitely differentiable function. The third step calculates the output layer weight as follows: β * = H + T.
The BP algorithm is a common learning algorithm used in various fields. However, existing problems restrict its development. During the training process, the initial weights and thresholds are randomly generated, and consequently, the generalization ability cannot be guaranteed. The WOA is then used to optimize the initial parameters of the BP neural network to obtain a more stable WOA-BP neural network.
The steps involved in the WOA optimization of the BP neural network are as follows:
Based on the whale optimization algorithm and structure of the extreme learning machine mentioned above, a WOA-ELM combination forecasting model was established. In the WOA, the optimal position of the humpback whale is the optimized ELM parameter value, and the WOA iteration is used to determine the optimal wi and bi of the ELM, which can improve the prediction accuracy of the model.
It is worth noting that in ELM, the input data is transformed by the hidden layer, and then the output layer produces the result. This process is "forward propagation", that is, information flows from the input layer to the output layer. However, the most important thing in this process is backpropagation, that is, how to adjust network parameters to improve performance when the output does not meet expectations. Back-propagation algorithm is an important optimization technique, which calculates how much the weight of each layer needs to be adjusted according to the difference between the actual output and the expected output of the network, and then optimizes the network. In ELM, due to its single-layer feedforward feature, backpropagation is mainly used to adjust the weights and biases, so that the network can better adapt to the training data.
There is no authoritative agency in China that directly provides carbon dioxide emission data, and consequently, this study has used a compromise method to discount the emission data for each year from different database collections and obtain an average value. The calculation of carbon emissions and the collection of relevant data varied depending on the subjects of the study. Considering the characteristics of carbon emissions in Guizhou Province and the difficulty in acquiring data, this study has used an estimation method.
https://doi.org/10.1371/journal.pone.0296596.t007
The carbon emission prediction curve in Table 7 shows that the change rule for the predicted carbon emission values in Guizhou Province was generally consistent with the change rule for the real values; however, the difference between the values is large, and the prediction effect is not sufficiently stable. The difference between the predicted and real values in 2018, 2019, and 2020 was small, and the relative error was below 2.5%, which met the expectations for prediction accuracy. However, there were large differences between the predicted and actual values in 2015, 2016, and 2017, and the expected prediction effects were not observed. This was predominantly because the initial weights and thresholds in the BP neural network are determined randomly, and it is difficult to achieve a good fit during model training; this results in large fluctuations in the prediction results and an inability to achieve a good prediction effect [ 39 – 42 ].
An extreme learning machine model was used to predict carbon emissions for Guizhou Province, using the data from 2000 to 2014 as the training set. The data from 2015 to 2020 were divided into the test set, and the training sample method was the same as for the BP neural network structure, the relative error was reserved for two digits, the absolute error was reserved for one digit, and the error comparison between the actual value and the predicted value was obtained.
The prediction results in Table 8 show that the model fits the carbon emissions of Guizhou Province well and also approximates their relationship with the influencing factors. However, the predicted results are unstable. Although the predicted values for most years were close to the actual values, the carbon emissions obtained in 2017 differed significantly from the actual values. In the forecast results, the absolute error between the forecast and real values in 2019 was 8.8, and the forecast value in 2019 was the closest to the actual carbon emissions of Guizhou Province. The average relative error of the test set was 0.43%. Compared with the BP neural network, the prediction model for carbon emissions in Guizhou Province based on extreme machine learning has a higher accuracy and stronger ability to approximate the nonlinear relationship of samples, but the setting of random initialization value and β also affects the accuracy of the model to a certain extent, and will require further optimization.
https://doi.org/10.1371/journal.pone.0296596.t008
The whale algorithm, which has a global search ability, was used to optimize the initial weight threshold of the BP neural network to improve its prediction accuracy. The divided training and test sets were the same as those used in the BP neural network model.
When setting the initial weights and thresholds of the neural network, a set of randomly generated initial values was selected, because there was no relevant setting principle. The BP neural network can learn the mapping relationship between the input and output automatically, generate initial parameters randomly, and modify the weights and thresholds of the network continuously through error back propagation; however, randomly selected initial weights and thresholds are usually inversely proportional to the convergence speed of neural network training; that is, the larger the value, the slower the convergence speed. In this case, the final training results easily fall into the local optimum, and it is difficult to obtain ideal calculation and prediction results. As shown in Table 9 , the relative error of the BP neural network after optimization is significantly reduced by no more than 1.5%, and the fitting degree for carbon emissions in Guizhou Province is significantly higher when compared with the results prior to optimization. The carbon emission prediction value in 2017 was the closest to the actual value, with a relative error of 0.16%, and the prediction results in other years were relatively stable. The accuracy and stability of the prediction using the WOA-BP neural network were significantly improved [ 43 , 44 ].
https://doi.org/10.1371/journal.pone.0296596.t009
The training samples selected in this section were the same as those used for the extreme learning machine model setting. The input and output variable data from 2000 to 2014 were used as the training set ( Table 10 ), and the prediction years were from 2015 to 2020. The WOA-ELM model was established [ 45 – 50 ].
https://doi.org/10.1371/journal.pone.0296596.t010
The results show that after multiple training sessions, the ELM model has a better fitting effect on the carbon emissions of Guizhou Province from 2015 to 2020 after WOA optimization. The error between the predicted value and the real value is between 0% and 0.05%. The relative error and absolute error between the two are relatively small, except for a few, and the prediction accuracy is significantly higher than that of the ELM model. The effectiveness of the WOA-optimized ELM scheme was verified.
In this study, four prediction models were established to test the carbon emission data for Guizhou Province. To evaluate the prediction performance of the four models, three indicators,–mean absolute error, mean absolute percentage error, and root mean square error were used for comparative analysis. The mean absolute error indicates the actual prediction error, while the root mean square error (RMSE) indicates the deviation between the observed and true values.
The results showed that the BP neural network is highly accurate at predicting carbon emission data for Guizhou Province using an extreme learning machine model ( Table 11 ). Compared with the ELM model, the BP neural network model is less robust and random, and its convergence speed is slightly lower than that of the ELM model. Among the four prediction models, the prediction model based on the WOA-ELM had the highest prediction accuracy, and it was followed by the model based on the extreme learning machine. The prediction performance of the BP neural network model was the worst. The comparison test results concluded that the prediction effect of the WOA-ELM model was relatively better, and that the prediction accuracy can reach the expected level, which can be used to predict the peak carbon emissions of Guizhou Province in the following text.
https://doi.org/10.1371/journal.pone.0296596.t011
Construction of carbon emission scenarios, scenario settings..
Scenario analysis refers to the quantitative analysis of both past and present situations, and integrates the factors affecting the future and makes qualitative assumptions to infer possible future situations. It is not the purpose of scenario construction to accurately predict the possibilities of the future, as its greatest practical value is comprehensive analysis. When using scenario analysis, there are two premises: one is to ensure that impact factors can be quantified and the other is to predict future indicators [ 33 – 35 , 51 , 52 ].
This section uses a scenario analysis method to set the impact factors of carbon emissions under different development scenarios as this will help to facilitate theprediction of carbon emission levels in Guizhou Province from 2020 to 2040. First, the baseline, high-speed, and low-carbon scenarios are set, corresponding to the indicators of medium growth and high growth with positive regression coefficients. Then, according to strategic policy interpretation for economic and energy development in Guizhou Province, the current situation for economic and social development and the development trend for the energy structure in Guizhou Province were clarified, and the parameters for total population, urbanization rate, residents’ consumption level, total energy consumption, and per capita GDP in Guizhou Province in the future under different development scenarios were set in combination with relevant policies and energy target requirements. Finally, the future evolutionary trend for carbon emissions in Guizhou Province was predicted ( Table 12 ).
https://doi.org/10.1371/journal.pone.0296596.t012
Benchmark scenario . The benchmark scenario is the continuation of existing economic and energy development in Guizhou Province. In the current economic development model, each factor is set according to the most likely situation. As a large industrial province, Guizhou Province has a complete industrial infrastructure that is expected to continue to thrive. Furthermore, its economic and industrial structures will continue to follow the state calls for transformation and upgrading. The energy consumption structure of Guizhou Province is dominated by industrial development, and will continue to be dominated by fossil energy consumption; however, the proportion of fossil energy consumption will continue to decline with the development of new energy technologies.
High-speed scenario mode . In the high-speed scenario mode, the total population, urbanization rate, per capita GDP, household consumption level, and total energy consumption are maintained, which facilitates rapid development and change. With the rapid growth of the population, acceleration of urbanization, rapid and vigorous development of the economy and society, rapid development of new industries, and dominant position of the information industry, the use of new energy will be more widely applied in various industries, and the efficiency of energy utilization will be significantly improved.
Low-carbon scenario . Total population, urbanization rate, per capita GDP, and total energy consumption will develop at a lower rate than in the baseline scenario.
1) Population setting . With economic and social development, the total population will continue to expand in the short term, however, long term, the population growth rate will decline. Analysis of the changes in the total population trend in Guizhou Province showed that it gradually decreased from 2010 to 2020, and the natural growth rate of the population was negative. By 2020, the population of Guizhou Province will reach 38.57 million, while the Population Development Plan of Guizhou Province proposes that the permanent population will reach 50 million by 2030; which means that the average annual growth rate will be 0.44%. According to the population development plan of Guizhou Province and the population growth in recent years, this study sets the annual rate of change at 0. 7% in baseline mode, 1% in high-speed mode, and 0. 5% in the low-carbon mode.
2) Setting the urbanization rate . Urbanization is continuously advancing in Guizhou Province, reaching a rate of 50.26% in 2018, which was 2.58 times higher than that in 1995. The average growth rate in the past five years was 1.12%, and the average growth rate in the past ten years was 1.33%. The effects of changing urbanization trends on the economies of various countries show that Britain and the United States are in the leading position in the process of global urbanization construction, reaching approximately 80%, while other developed countries are approximately 70%. Compared with the general level of urbanization in China, the urbanization process in Guizhou Province is relatively fast. Combined with the experience of developed countries, this study sets the annual rate of change to 1% in the benchmark mode, 1.25% in the high-speed mode, and 1% in the high-speed mode. In the low-carbon mode, the annual rate of change was 0.7%.
3) Setting of per capita GDP . The per capita GDP of Guizhou Province will continue to grow from 2000 to 2020, with the per capita GDP reaching $330/person in 2000 and $7000/person in 2020. In recent years, infrastructure development has led to an increase in economic development in Guizhou Province, and the growth rate of the per capita GDP has rapidly increased. In 2016, the per capita GDP of Guizhou Province increased significantly. According to the 13th Five-Year Plan for National Economic and Social Development of Guizhou Province, the average annual growth rate of the regional GDP has reached 6.6%, and the space for the decline in the per capita GDP growth rate will gradually shrink after the 13th Five-Year Plan. This study set the annual change rate to 6.5% in the benchmark mode, 7% in the high-speed mode, and 6% in the low-carbon mode [ 36 – 38 , 53 , 54 ].
4) Residents’ consumption levels . From 2000 to 2020, the average annual growth rate of residents’ consumption levels in Guizhou Province was 8.0%. The "13th Five-Year Plan" of Guizhou Province proposes releasing residents’ consumption potential, creating consumption demand, and further enhancing their consumption capacity. In baseline mode, the annual change rate was 8%, whereas in high-speed mode, the annual change rate was 9%.
5) Total energy consumption . From 2000 to 2020, the total energy consumption in Guizhou Province increased slightly. From 2002 to 2012, total energy consumption showed a rapid upward trend. After 2012, the total energy consumption declined, from 23526 tonnes of standard coal/10,000 CHY in 2012 to 22321 tonnes of standard coal/10,000 CHY in 2018. According to the requirements of energy saving and emission reduction planning in the 13th Five-Year Plan of Guizhou Province, this paper sets the annual change rate of −1.5% in the baseline mode, −2% in the high-speed mode, and −2.5% in the low-carbon mode ( Table 13 ).
https://doi.org/10.1371/journal.pone.0296596.t013
The fitted WOA-ELM was used to predict the carbon emissions of Guizhou Province from 2020 to 2040 under the three scenarios. The predicted results are listed in Table 14 .
https://doi.org/10.1371/journal.pone.0296596.t014
Owing to the differences in the carbon emissions affected by population, urbanization rate, resident consumption level, per capita GDP, and total energy consumption, the occurrence time and peak value of the carbon peak in Guizhou Province will change because of different parameter settings, and the total carbon emissions will also change accordingly. In the baseline scenario, it is estimated that the peak carbon emissions of Guizhou Province will reach 260 million tonnes in 2038, while in the high-speed scenario, they reach 300 million tonnes in 2036 and in the low-carbon scenario they reach 210 million tonnes in 2033. The baseline scenario data shows that the peak carbon emissions in Guizhou Province will not be achieved by 2030 as scheduled and will most likely be delayed to 2038. In the high-speed scenario, the peak carbon emissions in Guizhou Province occurred two years earlier than those in the baseline scenario. By comparing the time and size of the peak carbon emissions under the baseline and low-carbon scenarios, the peak year under the low-carbon scenario was found to be earlier than under the baseline scenario. Although it is three years after the peak target of China in 2030, the development status of Guizhou Province is relatively backward compared to that of developed cities such as Beijing and Shanghai; therefore, it is acceptable. Its peak volume is 40 million tonnes lower than that of the baseline scenario. Comparing the peak time and size of carbon emissions predicted by the low-carbon and high-speed scenarios, shows that the peak time for carbon emissions in the low-carbon scenario is three years earlier than in the high-speed scenario, and the peak volume is reduced by 50 million tonnes. The previous prediction results showed that Guizhou Province cannot achieve the ambitious goal of carbon peak in 2030 in the baseline and high-speed development scenarios. In contrast, the carbon peak time in the low-carbon scenario was earlier and the peak value was lower.
Exploring the main factors affecting carbon emissions in Guizhou Province will be crucial for China to achieve its desired carbon peaks and neutralization. Accurate carbon emission predictions are also of great significance for governments and will help to formulate relevant policies and innovate energy-saving and emission-reduction science and technologies. In this study, the characteristic subset affecting carbon emissions was constructed by referring to the existing literature and combining it with real world data from Guizhou Province. The appropriate input variables were then selected based on the grey correlation analysis method and then the BP neural network and ELM model were established, the WOA algorithm was used to optimize the BP neural network and ELM model, and the performance of the prediction models was compared and analyzed using simulations. Finally, three scenarios were established to predict the carbon emissions of Guizhou Province from 2020 to 2040. The following conclusions were drawn from the analysis.
The results show that the total carbon emissions in Guizhou Province in 2020 will be 22237 million tonnes which was approximately twice the total carbon emissions of Guizhou Province in 2000. With the development of social economy, the growth rate of total carbon emissions in Guizhou Province will gradually decrease, and the overall trend was shown to have an "S" curve. The data for Guizhou Province and previous studies were combined and 12 influencing factors were selected. According to their degrees of correlation, population and total energy consumption have a greater impact on carbon emissions in Guizhou Province, while the total population, urbanization rate, residents’ consumption level, per capita GDP, and total energy consumption were selected as the input variables of the prediction model.
The BP neural network, ELM, WOA-BP, and WOA-ELM models were established to predict carbon emissions in Guizhou Province. Comparing the mean absolute error, mean absolute percentage error, and root mean square error of the BP neural network, ELM, WOA-BP, and WOA-ELM prediction models, the accuracy of the WOA-ELM model was found to be higher with an MAE of 101. The prediction accuracy of the model based on an extreme learning machine was the second highest, with an MAE of 224. 46, MAPE is 0.43%, RMSE is 328.62, and the prediction effect of the BP neural network model was the worst. Three scenarios were constructed: baseline, high-speed, and low-carbon scenarios. The carbon emissions in Guizhou Province over the next 20 years were determined using the fitted model input with the set scenario parameters. The results show that, under the baseline model, the carbon peak for Guizhou Province will appear in 2038, and the peak value will be 0.61 million tonnes 26243. Under the high-speed scenario, the peak time for carbon emissions in Guizhou Province appeared in 2036, with a peak value is 0.27 million tonnes 30251. Under the low-carbon scenario, the peak time of carbon emissions in Guizhou Province is 2033, and the peak value is 9800 tonnes 21294. In the baseline model, Guizhou Province cannot achieve China’s peak target by 2030, and the low-carbon scenario is the closest to the carbon peak target of the three scenarios, indicating that it is necessary to intervene in the external policies of Guizhou Province [ 55 , 56 ].
According to the results of the above gray correlation analysis, the population has an important impact on carbon emissions in Guizhou Province, which must be considered in the work of energy conservation and emission reduction in Guizhou Province. The increasing demand for energy in daily life and production activities promotes and significantly impacts the increase in carbon emissions. Controlling the population of Guizhou Province and encouraging citizens to choose green travel, energy-saving, and environmentally friendly lifestyles will have far-reaching impacts on the current carbon emissions in Guizhou Province. The government should encourage people to save electricity, appropriately dispose of household appliance waste, and increase investment in the research and development of energy-saving alternatives. Enriching urban public transport, promoting the construction of public transport facilities, and opening more convenient energy vehicle development.
Analysis of the differences in carbon emissions caused by the three development modes in Guizhou Province showed that the low-carbon mode reached the carbon peak earliest, followed by the high-speed, and benchmark modes. In the low-carbon development mode, when carbon dioxide emissions reach their peak, the value is the smallest among the three modes. Overall, population, economic development, and energy consumption factors influence each other, and to reach a timely carbon peak in Guizhou Province, we should not only ensure normal economic growth but also take measures to control the increase in urbanization rate, reduce energy consumption, and optimize energy structures. For example, coal consumption accounts for a high proportion of the energy consumption in Guizhou Province, and coal combustion increases carbon dioxide emissions. Efforts should thus be made to reduce the consumption of coal energy, increase the utilization and conversion rate of coal, increase investment in the research and development of new energy, broaden the scope of the popularization of new energy, improve the construction of related supporting facilities, and put the full use of new energy on the agenda. There should be a focus on the development of water conservancy, hydropower projects, and photovoltaic projects, and on increasing the proportion of clean energy, such as hydropower.
https://doi.org/10.1371/journal.pone.0296596.s001
New citation alert added.
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Please log in to your account
Bibliometrics & citations, supplemental material, index terms.
Computing methodologies
Artificial intelligence
Natural language processing
Human-centered computing
Human computer interaction (HCI)
Empirical studies in HCI
Journal self-citation study for semiconductor literature: synchronous and diachronous approach.
The present study investigates the self-citations of the most productive semiconductor journals by synchronous (self-citing rate) and diachronous (self-cited rate) approaches. Journal's productivity of 100 most productive semiconductor journals was ...
Scientists may encounter many collaborators of different academic ages throughout their careers. Thus, they are required to make essential decisions to commence or end a creative partnership. This process can be influenced by strategic motivations ...
This article investigates whether Microsoft Academic can use its web search component to identify early citations to recently published articles to help solve the problem of delays in research evaluations caused by the need to wait for citation counts ...
Published in.
Monash University
The Australian National University
University of Glasgow
Lancaster University
University of Nottingham
Monash University/New Mexico State University
University of Copenhagen
Association for Computing Machinery
New York, NY, United States
Permissions, check for updates, author tags.
Funding sources.
Upcoming conference, contributors, other metrics, bibliometrics, article metrics.
Login options.
Check if you have access through your login credentials or your institution to get full access on this article.
View options.
View or Download as a PDF file.
View online with eReader .
View this article in Full Text.
View this article in HTML Format.
Copying failed.
Affiliations, export citations.
We are preparing your search results for download ...
We will inform you here when the file is ready.
Your file of search results citations is now ready.
Your search export query has expired. Please try again.
COMMENTS
1. Narrative Literature Review. A narrative literature review, also known as a traditional literature review, involves analyzing and summarizing existing literature without adhering to a structured methodology. It typically provides a descriptive overview of key concepts, theories, and relevant findings of the research topic.
Qualitative, narrative synthesis. Thematic analysis, may include conceptual models. Rapid review. Assessment of what is already known about a policy or practice issue, by using systematic review methods to search and critically appraise existing research. Completeness of searching determined by time constraints.
What Are Theories. The terms theory and model have been defined in numerous ways, and there are at least as many ideas on how theories and models relate to each other (Bailer-Jones, Citation 2009).I understand theories as bodies of knowledge that are broad in scope and aim to explain robust phenomena.Models, on the other hand, are instantiations of theories, narrower in scope and often more ...
As mentioned previously, there are a number of existing guidelines for literature reviews. Depending on the methodology needed to achieve the purpose of the review, all types can be helpful and appropriate to reach a specific goal (for examples, please see Table 1).These approaches can be qualitative, quantitative, or have a mixed design depending on the phase of the review.
There are different ways to approach and construct a literature review. Booth et al. (2016a) provide an overview that includes, for example, scoping reviews, which are focused only on notable studies and use a basic method of analysis, and integrative reviews, which are the result of exhaustive literature searches across different genres.
The different literature review typologies discussed earlier and the type of literature being synthesized will guide the reviewer to appropriate synthesis methods. The synthesis methods, in turn, will guide the data extraction process—for example, if one is doing a meta-analysis, data extraction will be centered on what's needed for a meta ...
Literature Study An important method in the development of conceptual models and theories is the study and synthesis of relevant insights from literature, that is, literature review. There are different types of literature review (see Cooper 1988 , for a taxonomy) and they can be approached in different ways.
A literature review is defined as "a critical analysis of a segment of a published body of knowledge through summary, classification, and comparison of prior research studies, reviews of literature, and theoretical articles." (The Writing Center University of Winconsin-Madison 2022) A literature review is an integrated analysis, not just a summary of scholarly work on a specific topic.
Furthermore, the paper experimentally examines the performance of various model variants based on different self-attention mechanisms, ultimately concluding that the encoder-decoder architecture outperforms standalone language model and prefix LMs in text-to-text tasks. ... "From Large Language Models to Large Multimodal Models: A Literature ...
1. Thematic model. Be the first to add your personal experience. 2. Conceptual model. Be the first to add your personal experience. 3. Theoretical model. Be the first to add your personal experience.
9.3. Types of Review Articles and Brief Illustrations. EHealth researchers have at their disposal a number of approaches and methods for making sense out of existing literature, all with the purpose of casting current research findings into historical contexts or explaining contradictions that might exist among a set of primary research studies conducted on a particular topic.
Writing a Literature Review. A literature review is a document or section of a document that collects key sources on a topic and discusses those sources in conversation with each other (also called synthesis ). The lit review is an important genre in many disciplines, not just literature (i.e., the study of works of literature such as novels ...
A literature review may consist of simply a summary of key sources, but in the social sciences, a literature review usually has an organizational pattern and combines both summary and synthesis, often within specific conceptual categories.A summary is a recap of the important information of the source, but a synthesis is a re-organization, or a reshuffling, of that information in a way that ...
A Literature Review is a comprehensive overview of all the knowledge available on a specific topic up to the present day. ... some applications of literature review in different fields: Social Sciences: ... researchers can identify the key concepts, theories, and models that are relevant to their research. Selecting Research Methods: Literature ...
A total of 23 studies were included in the final analysis. Majority of the studies were US-based. Five chronic disease models included Chronic Care Model (CCM), Improving Chronic Illness Care (ICIC), and Innovative Care for Chronic Conditions (ICCC), Stanford Model (SM) and Community based Transition Model (CBTM). CCM was the most studied model.
Manual exploratory literature reviews should be a thing of the past, as technology and development of machine learning methods have matured. The learning curve for using machine learning methods is rapidly declining, enabling new possibilities for all researchers. A framework is presented on how to use topic modelling on a large collection of papers for an exploratory literature review and how ...
A literature review is a critical analysis and synthesis of existing research on a particular topic. It provides an overview of the current state of knowledge, identifies gaps, and highlights key findings in the literature. 1 The purpose of a literature review is to situate your own research within the context of existing scholarship ...
of Literature. Ahmed Shaikh. University of Manitoba, Canada. [email protected]. *Correspondence: [email protected]. Received: 29 th November 2019; Accepted: 15 th March 2020 ...
This model emphasizes aligning the selection of a literature review type with the needs and expectations of the synthesis question driving the study. We focus on the 8 review types discussed in the JGME literature review series: systematic, realist, narrative, scoping, state-of-the-art, critical, meta-ethnographic, and theoretical integrative ...
Objectives. The aim of this scoping review was to identify and review current evidence-based practice (EBP) models and frameworks. Specifically, how EBP models and frameworks used in healthcare settings align with the original model of (1) asking the question, (2) acquiring the best evidence, (3) appraising the evidence, (4) applying the findings to clinical practice and (5) evaluating the ...
Interpretation: Development of integrated models combining elements from different model types in a framework that enables the evaluation of larger populations of subjects could address existing voids and enable more realistic representation of the biomechanics of the lumbar spine. 1. Introduction. Computational modeling has become a common and ...
Ortho-geriatric service--a literature review comparing different models Osteoporos Int. 2010 Dec;21(Suppl 4):S637-46. doi: 10.1007/s00198-010-1396-x . ... for the medical complication rates and the activities of daily living due to their inhomogeneity when comparing the models. The review of these investigations cannot tell us the best model ...
The objective of this paper is to analyze the evolution of the topic modeling technique, the main areas in which it has been applied, and the models that are recommended for specific types of data ...
Design/methodology/approach The paper critically examines 19 different service quality models reported in the literature. The critical review of the different service quality models is intended to ...
A broad review of literature on adult learning is described, describing the different models of adult learning and their significance for research and development in adult literacy, numeracy and English for speakers of other languages (ESOL). This paper summarises a broad review of literature on adult learning, describing the different models of adult learning and their significance for ...
Australasian Physical & Engineering Sciences in Medicine Volume 31 Number 3, 2008 TECHNICAL REPORT A literature review of different pressure ulcer models from 1942-2005 and the development of an ideal animal model P. K. T. Nguyen1, A-L. Smith2 and K. J. Reynolds1 1 School of Informatics and Engineering, Flinders University, Adelaide, Australia ...
The literature review of this study is divided into two parts: (1) the development of an EV adoption model and (2) the integration of an activity-based travel demand model with traffic assignment and emission simulators to evaluate traffic operations and vehicular emissions. ... The model uses several inventories from multiple data sources and ...
Systematically screening published literature to determine the relevant publications to synthesize in a review is a time-consuming and difficult task. Large language models (LLMs) are an emerging technology with promising capabilities for the automation of language-related tasks that may be useful for such a purpose. LLMs were used as part of an automated system to evaluate the relevance of ...
Different neural network models have been used to predict peak carbon for Guizhou Province as this will help to reduce carbon emissions to meet Chinas "3060" goal . ... Literature review. National carbon emission reduction work is required in all regions of a country, and regions with different levels of industrial development should implement ...
Evaluating Large Language Models on Academic Literature Understanding and Review: An Empirical Study among Early-stage Scholars ... paper reading and literature reviews) under different levels of time pressure. Before conducting the tasks, participants received different training programs regarding the limitations and capabilities of the LLMs ...