To read this content please select one of the options below:

Please note you do not have access to teaching notes, a systematic literature review of data science, data analytics and machine learning applied to healthcare engineering systems.

Management Decision

ISSN : 0025-1747

Article publication date: 7 December 2020

Issue publication date: 2 February 2022

The objective of this paper is to assess and synthesize the published literature related to the application of data analytics, big data, data mining and machine learning to healthcare engineering systems.

Design/methodology/approach

A systematic literature review (SLR) was conducted to obtain the most relevant papers related to the research study from three different platforms: EBSCOhost, ProQuest and Scopus. The literature was assessed and synthesized, conducting analysis associated with the publications, authors and content.

From the SLR, 576 publications were identified and analyzed. The research area seems to show the characteristics of a growing field with new research areas evolving and applications being explored. In addition, the main authors and collaboration groups publishing in this research area were identified throughout a social network analysis. This could lead new and current authors to identify researchers with common interests on the field.

Research limitations/implications

The use of the SLR methodology does not guarantee that all relevant publications related to the research are covered and analyzed. However, the authors' previous knowledge and the nature of the publications were used to select different platforms.

Originality/value

To the best of the authors' knowledge, this paper represents the most comprehensive literature-based study on the fields of data analytics, big data, data mining and machine learning applied to healthcare engineering systems.

  • Data analytics
  • Machine learning
  • Healthcare systems
  • Systematic literature review

Salazar-Reyna, R. , Gonzalez-Aleu, F. , Granda-Gutierrez, E.M.A. , Diaz-Ramirez, J. , Garza-Reyes, J.A. and Kumar, A. (2022), "A systematic literature review of data science, data analytics and machine learning applied to healthcare engineering systems", Management Decision , Vol. 60 No. 2, pp. 300-319. https://doi.org/10.1108/MD-01-2020-0035

Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited

Related articles

All feedback is valuable.

Please share your general feedback

Report an issue or find answers to frequently asked questions

Contact Customer Support

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • How to Write a Literature Review | Guide, Examples, & Templates

How to Write a Literature Review | Guide, Examples, & Templates

Published on January 2, 2023 by Shona McCombes . Revised on September 11, 2023.

What is a literature review? A literature review is a survey of scholarly sources on a specific topic. It provides an overview of current knowledge, allowing you to identify relevant theories, methods, and gaps in the existing research that you can later apply to your paper, thesis, or dissertation topic .

There are five key steps to writing a literature review:

  • Search for relevant literature
  • Evaluate sources
  • Identify themes, debates, and gaps
  • Outline the structure
  • Write your literature review

A good literature review doesn’t just summarize sources—it analyzes, synthesizes , and critically evaluates to give a clear picture of the state of knowledge on the subject.

Instantly correct all language mistakes in your text

Upload your document to correct all your mistakes in minutes

upload-your-document-ai-proofreader

Table of contents

What is the purpose of a literature review, examples of literature reviews, step 1 – search for relevant literature, step 2 – evaluate and select sources, step 3 – identify themes, debates, and gaps, step 4 – outline your literature review’s structure, step 5 – write your literature review, free lecture slides, other interesting articles, frequently asked questions, introduction.

  • Quick Run-through
  • Step 1 & 2

When you write a thesis , dissertation , or research paper , you will likely have to conduct a literature review to situate your research within existing knowledge. The literature review gives you a chance to:

  • Demonstrate your familiarity with the topic and its scholarly context
  • Develop a theoretical framework and methodology for your research
  • Position your work in relation to other researchers and theorists
  • Show how your research addresses a gap or contributes to a debate
  • Evaluate the current state of research and demonstrate your knowledge of the scholarly debates around your topic.

Writing literature reviews is a particularly important skill if you want to apply for graduate school or pursue a career in research. We’ve written a step-by-step guide that you can follow below.

Literature review guide

Prevent plagiarism. Run a free check.

Writing literature reviews can be quite challenging! A good starting point could be to look at some examples, depending on what kind of literature review you’d like to write.

  • Example literature review #1: “Why Do People Migrate? A Review of the Theoretical Literature” ( Theoretical literature review about the development of economic migration theory from the 1950s to today.)
  • Example literature review #2: “Literature review as a research methodology: An overview and guidelines” ( Methodological literature review about interdisciplinary knowledge acquisition and production.)
  • Example literature review #3: “The Use of Technology in English Language Learning: A Literature Review” ( Thematic literature review about the effects of technology on language acquisition.)
  • Example literature review #4: “Learners’ Listening Comprehension Difficulties in English Language Learning: A Literature Review” ( Chronological literature review about how the concept of listening skills has changed over time.)

You can also check out our templates with literature review examples and sample outlines at the links below.

Download Word doc Download Google doc

Before you begin searching for literature, you need a clearly defined topic .

If you are writing the literature review section of a dissertation or research paper, you will search for literature related to your research problem and questions .

Make a list of keywords

Start by creating a list of keywords related to your research question. Include each of the key concepts or variables you’re interested in, and list any synonyms and related terms. You can add to this list as you discover new keywords in the process of your literature search.

  • Social media, Facebook, Instagram, Twitter, Snapchat, TikTok
  • Body image, self-perception, self-esteem, mental health
  • Generation Z, teenagers, adolescents, youth

Search for relevant sources

Use your keywords to begin searching for sources. Some useful databases to search for journals and articles include:

  • Your university’s library catalogue
  • Google Scholar
  • Project Muse (humanities and social sciences)
  • Medline (life sciences and biomedicine)
  • EconLit (economics)
  • Inspec (physics, engineering and computer science)

You can also use boolean operators to help narrow down your search.

Make sure to read the abstract to find out whether an article is relevant to your question. When you find a useful book or article, you can check the bibliography to find other relevant sources.

You likely won’t be able to read absolutely everything that has been written on your topic, so it will be necessary to evaluate which sources are most relevant to your research question.

For each publication, ask yourself:

  • What question or problem is the author addressing?
  • What are the key concepts and how are they defined?
  • What are the key theories, models, and methods?
  • Does the research use established frameworks or take an innovative approach?
  • What are the results and conclusions of the study?
  • How does the publication relate to other literature in the field? Does it confirm, add to, or challenge established knowledge?
  • What are the strengths and weaknesses of the research?

Make sure the sources you use are credible , and make sure you read any landmark studies and major theories in your field of research.

You can use our template to summarize and evaluate sources you’re thinking about using. Click on either button below to download.

Take notes and cite your sources

As you read, you should also begin the writing process. Take notes that you can later incorporate into the text of your literature review.

It is important to keep track of your sources with citations to avoid plagiarism . It can be helpful to make an annotated bibliography , where you compile full citation information and write a paragraph of summary and analysis for each source. This helps you remember what you read and saves time later in the process.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

To begin organizing your literature review’s argument and structure, be sure you understand the connections and relationships between the sources you’ve read. Based on your reading and notes, you can look for:

  • Trends and patterns (in theory, method or results): do certain approaches become more or less popular over time?
  • Themes: what questions or concepts recur across the literature?
  • Debates, conflicts and contradictions: where do sources disagree?
  • Pivotal publications: are there any influential theories or studies that changed the direction of the field?
  • Gaps: what is missing from the literature? Are there weaknesses that need to be addressed?

This step will help you work out the structure of your literature review and (if applicable) show how your own research will contribute to existing knowledge.

  • Most research has focused on young women.
  • There is an increasing interest in the visual aspects of social media.
  • But there is still a lack of robust research on highly visual platforms like Instagram and Snapchat—this is a gap that you could address in your own research.

There are various approaches to organizing the body of a literature review. Depending on the length of your literature review, you can combine several of these strategies (for example, your overall structure might be thematic, but each theme is discussed chronologically).

Chronological

The simplest approach is to trace the development of the topic over time. However, if you choose this strategy, be careful to avoid simply listing and summarizing sources in order.

Try to analyze patterns, turning points and key debates that have shaped the direction of the field. Give your interpretation of how and why certain developments occurred.

If you have found some recurring central themes, you can organize your literature review into subsections that address different aspects of the topic.

For example, if you are reviewing literature about inequalities in migrant health outcomes, key themes might include healthcare policy, language barriers, cultural attitudes, legal status, and economic access.

Methodological

If you draw your sources from different disciplines or fields that use a variety of research methods , you might want to compare the results and conclusions that emerge from different approaches. For example:

  • Look at what results have emerged in qualitative versus quantitative research
  • Discuss how the topic has been approached by empirical versus theoretical scholarship
  • Divide the literature into sociological, historical, and cultural sources

Theoretical

A literature review is often the foundation for a theoretical framework . You can use it to discuss various theories, models, and definitions of key concepts.

You might argue for the relevance of a specific theoretical approach, or combine various theoretical concepts to create a framework for your research.

Like any other academic text , your literature review should have an introduction , a main body, and a conclusion . What you include in each depends on the objective of your literature review.

The introduction should clearly establish the focus and purpose of the literature review.

Depending on the length of your literature review, you might want to divide the body into subsections. You can use a subheading for each theme, time period, or methodological approach.

As you write, you can follow these tips:

  • Summarize and synthesize: give an overview of the main points of each source and combine them into a coherent whole
  • Analyze and interpret: don’t just paraphrase other researchers — add your own interpretations where possible, discussing the significance of findings in relation to the literature as a whole
  • Critically evaluate: mention the strengths and weaknesses of your sources
  • Write in well-structured paragraphs: use transition words and topic sentences to draw connections, comparisons and contrasts

In the conclusion, you should summarize the key findings you have taken from the literature and emphasize their significance.

When you’ve finished writing and revising your literature review, don’t forget to proofread thoroughly before submitting. Not a language expert? Check out Scribbr’s professional proofreading services !

This article has been adapted into lecture slides that you can use to teach your students about writing a literature review.

Scribbr slides are free to use, customize, and distribute for educational purposes.

Open Google Slides Download PowerPoint

If you want to know more about the research process , methodology , research bias , or statistics , make sure to check out some of our other articles with explanations and examples.

  • Sampling methods
  • Simple random sampling
  • Stratified sampling
  • Cluster sampling
  • Likert scales
  • Reproducibility

 Statistics

  • Null hypothesis
  • Statistical power
  • Probability distribution
  • Effect size
  • Poisson distribution

Research bias

  • Optimism bias
  • Cognitive bias
  • Implicit bias
  • Hawthorne effect
  • Anchoring bias
  • Explicit bias

A literature review is a survey of scholarly sources (such as books, journal articles, and theses) related to a specific topic or research question .

It is often written as part of a thesis, dissertation , or research paper , in order to situate your work in relation to existing knowledge.

There are several reasons to conduct a literature review at the beginning of a research project:

  • To familiarize yourself with the current state of knowledge on your topic
  • To ensure that you’re not just repeating what others have already done
  • To identify gaps in knowledge and unresolved problems that your research can address
  • To develop your theoretical framework and methodology
  • To provide an overview of the key findings and debates on the topic

Writing the literature review shows your reader how your work relates to existing research and what new insights it will contribute.

The literature review usually comes near the beginning of your thesis or dissertation . After the introduction , it grounds your research in a scholarly field and leads directly to your theoretical framework or methodology .

A literature review is a survey of credible sources on a topic, often used in dissertations , theses, and research papers . Literature reviews give an overview of knowledge on a subject, helping you identify relevant theories and methods, as well as gaps in existing research. Literature reviews are set up similarly to other  academic texts , with an introduction , a main body, and a conclusion .

An  annotated bibliography is a list of  source references that has a short description (called an annotation ) for each of the sources. It is often assigned as part of the research process for a  paper .  

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. (2023, September 11). How to Write a Literature Review | Guide, Examples, & Templates. Scribbr. Retrieved July 16, 2024, from https://www.scribbr.com/dissertation/literature-review/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, what is a theoretical framework | guide to organizing, what is a research methodology | steps & tips, how to write a research proposal | examples & templates, what is your plagiarism score.

Advertisement

Advertisement

Data science pedagogical tools and practices: A systematic literature review

  • Published: 24 August 2023
  • Volume 29 , pages 8179–8201, ( 2024 )

Cite this article

literature review for data science

  • Bahar Memarian   ORCID: orcid.org/0000-0003-0671-3127 1 &
  • Tenzin Doleck 1  

831 Accesses

Explore all metrics

The development of data science curricula has gained attention in academia and industry. Yet, less is known about the pedagogical practices and tools employed in data science education. Through a systematic literature review, we summarize prior pedagogical practices and tools used in data science initiatives at the higher education level. Following the Technological Pedagogical Content Knowledge (TPACK) framework, we aim to characterize the technological and pedagogical knowledge quality of reviewed studies, as we find the content presented to be diverse and incomparable. TPACK is a universally established method for teaching considering information and communication technology. Yet it is seldom used for the analysis of data science pedagogy. To make this framework more structured, we list the tools employed in each reviewed study to summarize technological knowledge quality. We further examine whether each study follows the needs of the Cognitive Apprenticeship theory to summarize the pedagogical knowledge quality in each reviewed study. Of the 23 reviewed studies, 14 met the needs of Cognitive Apprenticeship theory and include hands-on experiences, promote students’ active learning, seeking guidance from the instructor as a coach, introduce students to the real-world industry demands of data and data scientists, and provide meaningful learning resources and feedback across various stages of their data science initiatives. While each study presents at least one tool to teach data science, we found the assessment of the technological knowledge of data science initiatives to be difficult. This is because the studies fall short of explaining how students come to learn the operation of tools and become proficient in using them throughout a course or program. Our review aims to highlight implications for practices and tools used in data science pedagogy for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

literature review for data science

Similar content being viewed by others

literature review for data science

Data Sciences and Teaching Methods—Learning

literature review for data science

Exploring Interdisciplinary Data Science Education for Undergraduates: Preliminary Results

literature review for data science

Designing and Delivering a Curriculum for Data Science Education Across Europe

Data availability.

Data sharing does not apply to this article as no datasets were generated or analyzed during the current study.

Akram, H., Yingxiu, Y., Al-Adwan, A. S., & Alkhalifah, A. (2021). Technology integration in higher education during COVID-19: An assessment of online teaching competencies through technological pedagogical content knowledge model. Frontiers in Psychology, 12 , 736522.

Article   Google Scholar  

Aktaş, İ, & Özmen, H. (2020). Investigating the impact of TPACK development course on pre-service science teachers’ performances. Asia Pacific Education Review, 21 , 667–682.

Allaire JJ, Xie Y, McPherson J, Luraschi J, Ushey K, Atkins A, Wickham H, Cheng J, Chang W, Iannone R (2021) Rmarkdown: Dynamic documents for R. https://CRAN.R-project.org/package=rmarkdown

Allen, G. I. (2021). Experiential learning in data science: Developing an interdisciplinary, client-sponsored capstone program. SIGCSE - Proc. ACM Tech. Symp. Comput. Sci. Educ. , PG - 516 – 522 , 516–522. https://doi.org/10.1145/3408877.3432536

Anderson, P., Bowring, J., McCauley, R., Pothering, G., & Starr, C. (2014). n undergraduate degree in data science: curriculum and a decade of implementation experience. 45th ACM Technical Symposium on Computer Science Education , 145–150.

Archambault, L. M., & Barnett, J. H. (2010). Revisiting technological pedagogical content knowledge: Exploring the TPACK framework. Computers & Education, 44 (4), 1656–1662.

Barman, A., Chen, S., Chang, A., & Allen, G. (2022). Experiential learning in data science through a novel client-facing consulting course. Proc. Front. Educ. Conf. FIE , 2022 - Octob (PG-). https://doi.org/10.1109/FIE56618.2022.9962532

Bart, A. C., Kafura, D., Shaffer, C. A., & Tilevich, E. (2018). Reconciling the promise and pragmatics of enhancing computing pedagogy with data science. 49th ACM Technical Symposium on Computer Science Education , 1029–1034.

Berman, F., Rutenbar, R., Hailpern, B., Christensen, H., Davidson, S., Estrin, D., ..., & Szalay, A. S. (2018). Realizing the potential of data science. Communications of the ACM , 61 (4), 67–72.

Bonnell, J., Ogihara, M., & Yesha, Y. (2022). Challenges and issues in data science education. Computer, 55 (2 PG-63–66), 63–66. https://doi.org/10.1109/MC.2021.3128734

Bornn, L., Mortensen, J., & Ahrensmeier, D. (2022). A data-first approach to learning real-world statistical modeling. Canadian Journal for the Scholarship of Teaching and Learning , 13 (1 PG-). https://doi.org/10.5206/cjsotlrcacea.2022.1.10204

Brinkley-Etzkorn, K. E. (2018). Learning to teach online: Measuring the influence of faculty development training on teaching effectiveness through a TPACK lens. The Internet and Higher Education, 38 , 28–35.

Cao, L. (2017). Data science: A comprehensive overview. ACM Computing Surveys (CSUR), 50 (3), 1–42.

Cetinkaya-Rundel, M., & Ellison, V. (2021). A fresh look at introductory data science. Journal of Statistics and Data Science Education, 29 (PG-S16-S26), S16–S26. https://doi.org/10.1080/10691898.2020.1804497

Ching, G. S., & Roberts, A. (2020). Evaluating the pedagogy of technology integrated teaching and learning: An overview. International Journal of Research Studies in Education, 9 , 37–50.

Collins, A., Brown, J. S., & Holum, A. (1991). Cognitive apprenticeship: Making thinking visible. American Educator, 15 (3), 6–11.

Google Scholar  

Collins, A., Brown, J. S., & Newman, S. E. (2018). Cognitive apprenticeship: Teaching the crafts of reading, writing, and mathematics. In Knowing, learning, and instruction . Routledge.

Collins, A. (2006). Cognitive apprenticeship . The cambridge handbook of the learning sciences.

Covidence. (2023). Covidence systematic review software . Retrieved February 2023 from www.covidence.org

Danyluk, A., Leidig, P., McGettrick, A., Cassel, L., Doyle, M., Servin, C., Schmitt, K., & Stefik, A. (2021). Computing competencies for undergraduate data science programs: An ACM task force final report. SIGCSE , PG - 1119 – 1120 , 1119–1120. https://doi.org/10.1145/3408877.3432586

De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., Bryant, L., Cheng, L. Z., Francis, A., Gould, R., Kim, A. Y., Kretchmar, M., Lu, Q., Moskol, A., Nolan, D., Pelayo, R., Raleigh, S., Sethi, R. J., Sondjaja, M., …, & Ye, P. (2017). Curriculum guidelines for undergraduate programs in data science. In Annual Review of Statistics and Its Application (Vol. 4, Issue PG-15–30, pp. 15–30). https://doi.org/10.1146/annurev-statistics-060116-053930

Dennen, V. P., & Burner, K. J. (2008). The cognitive apprenticeship model in educational practice . Routledge.

Dogan, A., & Birant, D. (2021). Machine learning and data mining in manufacturing. Expert Systems with Applications, 166 , 114060.

Donoghue, T., Voytek, B., & Ellis, S. E. (2021). Teaching creative and practical data science at scale. Journal of Statistics and Data Science Education, 29 (PG-S27-S39), S27–S39. https://doi.org/10.1080/10691898.2020.1860725

Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26 (4), 745–766.

Article   MathSciNet   Google Scholar  

Fennell, H. W., Lyon, J. A., Madamanchi, A., & Magana, A. J. (2020). Toward computational apprenticeship: Bringing a constructivist agenda to computational pedagogy. Journal of Engineering Education, 109 (2), 170–176.

Feyyad, U. M. (1996). Data mining and knowledge discovery: Making sense out of data. IEEE Expert, 11 (5), 20–25.

Finzer, W. (2013). The data science education dilemma. Technology Innovations in Statistics Education , 7 (2). https://doi.org/10.52041/srap.12105

Garrett, K. N. (2014). A quantitative study of higher education faculty self-assessments of technological, pedagogical, and content knowledge (TPaCK) and technology training . The University of Alabama.

Gess-Newsome, J. (1999). Pedagogical content knowledge: An introduction and orientation. In Examining pedagogical content knowledge: The construct and its implications for science education (pp. 3–17).

Green, A., & Zhai, C. (2019). LiveDataLab: A cloud-based platform to facilitate hands-on data science education at scale. In Proceedings of the Sixth (2019) ACM Conference on Learning@ Scale (Issue PG-, pp. 1–2). https://doi.org/10.1145/3330430.3333665

Hassan, O. A. (2011). Learning theories and assessment methodologies–an engineering educational perspective. European Journal of Engineering Education, 36 (4), 327–339.

Hee, K., Zicari, R. V., Tolle, K., & Manieri, A. (2016). Tailored data science education using gamification. In 2016 8TH IEEE International Conference on Cloud Computing Technology and Science (CLOUDCOM 2016) (Issue PG-627–632, pp. 627–632). https://doi.org/10.1109/CloudCom.2016.105

Hicks, S. C., & Irizarry, R. A. (2018). A guide to teaching data science. The American Statistician, 72 (4 PG-382–391), 382–391. https://doi.org/10.1080/00031305.2017.1356747

Holt, D., Smissen, I., & Segrave, S. (2006). New students, new learning, new environments in higher education: Literacies in the digital age. Proceedings of the 23rd Annual ASCILITE Conference “Who’s Learning? Whose Technology , 327–336.

Hughes, J., Thomas, R., & Scharber, C. (2006). Assessing technology integration: The RAT–replacement, amplification, and transformation-framework. In Society for Information. Technology & Teacher Education International Conference , 1616–1620.

Huppenkothen, D., Arendt, A., Hogg, D. W., Ram, K., VanderPlas, J. T., & Rokem, A. (2018). Hack weeks as a model for data science education and collaboration. Proceedings of the National Academy of Sciences of the United States of America, 115 (36 PG-8872–8877), 8872–8877. https://doi.org/10.1073/pnas.1717196115

Ionascu, A., & Stefaniga, S. A. (2020). DS Lab Notebook: A new tool for data science applications. In 2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2020) (Issue PG-310–314, pp. 310–314). https://doi.org/10.1109/SYNASC51798.2020.00056

Irizarry, R. A. (2020). The role of academia in data science education . 2 (1).

Kim, B., & Henke, G. (2021). Easy-to-use cloud computing for teaching data science. Journal of Statistics and Data Science Education, 29 (PG-S103-S111), S103–S111. https://doi.org/10.1080/10691898.2020.1860726

Kitchin, R. (2014). Big data, new epistemologies and paradigm shifts. Big Data & Society , 1 (1). https://doi.org/10.1177/2053951714528481

Koyuncuoglu, Ö. (2021). An investigation of graduate students’ Technological Pedagogical and Content Knowledge (TPACK). International Journal of Education in Mathematics, Science and Technology, 9 (2), 299–313.

Kristensen, F., Troeng, O., Safavi, M., & Narayanan, P. (2015). Competition in higher education–good or bad .

Kross, S., & Guo, P. J. (2019). Practitioners teaching data science in industry and academia: Expectations, workflows, and challenges. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems , 1–14.

Maksimenkova, O., Neznanov, A., & Radchenko, I. (2019). Using data expedition as a formative assessment tool in data science education: Reasoning, justification, and evaluation. International Journal of Emerging Technologies in Learning, 14 (11 PG-107–122), 107–122. https://doi.org/10.3991/ijet.v14i11.10202

Maksimenkova, O., Neznanov, A., & Radchenko, I. (2020). Collaborative learning in data Science education: A data expedition as a formative assessment tool. In Challenges of the Digital Transformation in Education, ICL2018, VOL 1 (Vol. 916, Issue PG-14–25, pp. 14–25). https://doi.org/10.1007/978-3-030-11932-4_2

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Hung Byers, A. (2011). Big data: The next frontier for innovation, competition, and productivity . McKinsey Global Institute.

Mikalef, P., & Krogstie, J. (2019). Investigating the Data Science Skill Gap: An Empirical Analysis. In EDUCON (Issue PG-1275–1284, pp. 1275–1284).

Mikroyannidis, A., Domingue, J., Bachler, M., & Quick, K. (2019). Smart blockchain badges for data science education. Proc. Front. Educ. Conf. FIE , 2018 - Octob (PG-). https://doi.org/10.1109/FIE.2018.8659012

Mikroyannidis, A., Domingue, J., Phethean, C., Beeston, G., & Simperl, E. (2018). Designing and delivering a curriculum for data science education across Europe. In Teaching and Learning in a Digital World (Vol. 716, Issue PG-540–550, pp. 540–550). https://doi.org/10.1007/978-3-319-73204-6_59

Mishra, P., & Koehler, M. J. (2006). Technological pedagogical content knowledge: A framework for teacher knowledge. Teachers College Record, 108 (6), 1017–1054.

Molenda, M. (2003). In search of the elusive ADDIE model. Performance Improvement, 42 (5), 34–37.

Mujallid, A. (2021). Instructors’ readiness to teach online: A review of TPACK standards in online professional development. Programmes in Higher Education. International Journal of Learning, Teaching and Educational Research, 20 (7), 135–150.

Murray, S., Ryan, J., & Pahl, C. (2003). A tool-mediated cognitive apprenticeship approach for a computer engineering course. 3rd IEEE International Conference on Advanced Technologies , 2–6.

Polak, J., & Cook, D. (2021). A study on student performance, engagement, and experience with Kaggle InClass data challenges. Journal of Statistics and Data Science Education, 29 (1 PG-63–70), 63–70. https://doi.org/10.1080/10691898.2021.1892554

Power, D. J. (2016). Data science: Supporting decision-making. Journal of Decision Systems, 25 (4), 345–356.

Rao, A., Bihani, A., & Nair, M. (2018). Milo: A visual programming environment for Data Science Education. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (Issue PG-211–215, pp. 211–215). NS -

Romrell, D., Kidder, L., & Wood, E. (2014). The SAMR model as a framework for evaluating mLearning. Online Learning Journal , 18 (2). https://doi.org/10.24059/olj.v18i2.435

Rossi, R. (2021). Data science education based on ADDIE model and the EDISON framework. In 2021 International Conference on Big Data Engineering and Education (BDEE 2021) (Issue PG-40–45, pp. 40–45). https://doi.org/10.1109/BDEE52938.2021.00013

Rostami, M. A., & Bucker, H. M. (2019). Redesigning interactive educational modules for combinatorial scientific computing. In Computational Science - ICCS 2019, PT V (Vol. 11540, Issue PG-363–373, pp. 363–373). https://doi.org/10.1007/978-3-030-22750-0_29

Roy, P. K., Saumya, S., Singh, J. P., Banerjee, S., & Gutub, A. (2023). Analysis of community question-answering issues via machine learning and deep learning: State-of-the-art review. CAAI Transactions on Intelligence Technology, 8 (1), 95–117.

Salas-Rueda, R. A. (2020). TPACK: Technological, pedagogical and content model necessary to improve the educational process on mathematics through a web application? International Electronic Journal of Mathematics Education , 15 (1). https://doi.org/10.29333/iejme/5887

Sanchez-Pinto, L. N., Luo, Y., & Churpek, M. M. (2018). Big data and data science in critical care. Chest, 154 (5), 1239–1248.

Sánchez‐Peña, M., Vieira, C., & Magana, A. J. (2022). Data science knowledge integration: Affordances of a computational cognitive apprenticeship on student conceptual understanding. Computer Applications in Engineering Education , 31 (2), 239–259. https://doi.org/10.1002/cae.22580

Savonen, C., Wright, C., Hoffman, A. M., Muschelli, J., Cox, K., Tan, F. J., & Leek, J. T. (2022). Open-source Tools for Training Resources–OTTR. Journal of Statistics and Data Science Education, PG- 1–12. https://doi.org/10.1080/26939169.2022.2118646

Schmidt, D. A., Baran, E., Thompson, A. D., Mishra, P., Koehler, M. J., & Shin, T. S. (2009). Technological pedagogical content knowledge (TPACK) the development and validation of an assessment instrument for preservice teachers. Journal of Research on Technology in Education, 42 (2), 123–149.

Shafi, A., Saeed, S., Bamarouf, Y. A., Iqbal, S. Z., Min-Allah, N., & Alqahtani, M. A. (2019). Student outcomes assessment methodology for ABET accreditation: A case study of computer science and computer information systems programs. IEEE Access, 7 , 13653–13667.

Sheffield, R., Dobozy, E., Gibson, D., Mullaney, J., & Campbell, C. (2015). Teacher education students using TPACK in science: A case study. Educational Media International, 52 (3), 227–238.

Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher, 15 (2), 4–14.

Silva, P. (2015). Davis’ technology acceptance model (TAM)(1989). Information Seeking Behavior and Technology Adoption: Theories and Trends (pp. 205–219). https://doi.org/10.4018/978-1-4666-8156-9.ch013

Song, I. Y., & Zhu, Y. J. (2016). Big data and data science: what should we teach? Expert Systems, 33 (4 PG-364–373), 364–373. https://doi.org/10.1111/exsy.12130

Suthar, K., Mitchell, T., Hartwig, A. C., Wang, J., Mao, S., Parson, L., Zeng, P., Liu, B., & He, P. (2021). Real data and application-based interactive modules for data science education in engineering. ASEE Annu. Conf. Expos. Conf. Proc. , PG -. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85124546523&partnerID=40&md5=ed00569a6049c4f397399743b6de40efNS -

Tang, R., & Sae-Lim, W. (2016). Data science programs in US higher education: An exploratory content analysis of program description, curriculum structure, and course focus. Education for Information, 23 (3), 269–290.

Vance, E. A. (2021). Using team-based learning to teach data science. Journal of Statistics and Data Science Education, 29 (3 PG-277–296), 277–296. https://doi.org/10.1080/26939169.2021.1971587

Watson, D. M. (2001). Pedagogy before technology: Re-thinking the relationship between ICT and teaching. Education and Information Technologies, 6 , 251–266.

West, J. (2018). Teaching data science: an objective approach to curriculum validation. Computer Science Education, 28 (2 PG-136–157), 136–157. https://doi.org/10.1080/08993408.2018.1486120

Yavuz, F. G., & Ward, M. D. (2020). Fostering undergraduate data science. American Statistician, 74 (1 PG-8–16), 8–16. https://doi.org/10.1080/00031305.2017.1407360

Download references

Acknowledgements

This study was funded by Canada Research Chair Program and Canada Foundation for Innovation

Author information

Authors and affiliations.

Faculty of Education, Simon Fraser University, Vancouver, British Columbia, Canada

Bahar Memarian & Tenzin Doleck

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Bahar Memarian .

Ethics declarations

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors declare the following financial interests/personal relationships which may be considered as potential competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Memarian, B., Doleck, T. Data science pedagogical tools and practices: A systematic literature review. Educ Inf Technol 29 , 8179–8201 (2024). https://doi.org/10.1007/s10639-023-12102-y

Download citation

Received : 19 April 2023

Accepted : 01 August 2023

Published : 24 August 2023

Issue Date : May 2024

DOI : https://doi.org/10.1007/s10639-023-12102-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Data analytics
  • Artificial intelligence
  • Higher education
  • Find a journal
  • Publish with us
  • Track your research

ACM Digital Library home

  • Advanced Search

Data science ethical considerations: a systematic literature review and proposed project framework

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, view options.

  • Fu F (2022) Exploring the Emotional Factors in College English Classroom Teaching Based on Computer-Aided Model Advances in Multimedia 10.1155/2022/4383254 2022 Online publication date: 1-Jan-2022 https://dl.acm.org/doi/10.1155/2022/4383254
  • Bandy J (2021) Problematic Machine Behavior Proceedings of the ACM on Human-Computer Interaction 10.1145/3449148 5 :CSCW1 (1-34) Online publication date: 22-Apr-2021 https://dl.acm.org/doi/10.1145/3449148
  • Georgiadis G Poels G (2021) Enterprise architecture management as a solution for addressing general data protection regulation requirements in a big data context: a systematic mapping study Information Systems and e-Business Management 10.1007/s10257-020-00500-5 19 :1 (313-362) Online publication date: 1-Mar-2021 https://dl.acm.org/doi/10.1007/s10257-020-00500-5
  • Show More Cited By

Index Terms

Applied computing

Life and medical sciences

Security and privacy

Human and societal aspects of security and privacy

Social and professional topics

Computing / technology policy

Privacy policies

Professional topics

Computing profession

Codes of ethics

Recommendations

Key concepts for a data science ethics curriculum.

Data science is a new field that integrates aspects of computer science, statistics and information management. As a new field, ethical issues a data scientist may encounter have received little attention to date, and ethics training within a data ...

Data Science: A Comprehensive Overview

The 21st century has ushered in the age of big data and data economy, in which data DNA , which carries important knowledge, insights, and potential, has become an intrinsic constituent of all data-based organisms. An appropriate understanding of data ...

Big data and data science: what should we teach?

The era of big data has arrived. Big data bring us the data-driven paradigm and enlighten us to challenge new classes of problems we were not able to solve in the past. We are beginning to see the impacts of big data in every aspect of our lives and ...

Information

Published in.

Kluwer Academic Publishers

United States

Publication History

Author tags.

  • Data science
  • Code of conduct
  • Research-article

Contributors

Other metrics, bibliometrics, article metrics.

  • 4 Total Citations View Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0
  • Rochel J Evéquoz F (2021) Getting into the engine room: a blueprint to investigate the shadowy steps of AI ethics AI & Society 10.1007/s00146-020-01069-w 36 :2 (609-622) Online publication date: 1-Jun-2021 https://dl.acm.org/doi/10.1007/s00146-020-01069-w

View options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Share this publication link.

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

literature review for data science

Current approaches for executing big data science projects—a systematic literature review

There is an increasing number of big data science projects aiming to create value for organizations by improving decision making, streamlining costs or enhancing business processes. However, many of these projects fail to deliver the expected value. It has been observed that a key reason many data science projects don’t succeed is not technical in nature, but rather, the process aspect of the project. The lack of established and mature methodologies for executing data science projects has been frequently noted as a reason for these project failures. To help move the field forward, this study presents a systematic review of research focused on the adoption of big data science process frameworks. The goal of the review was to identify (1) the key themes, with respect to current research on how teams execute data science projects, (2) the most common approaches regarding how data science projects are organized, managed and coordinated, (3) the activities involved in a data science projects life cycle, and (4) the implications for future research in this field. In short, the review identified 68 primary studies thematically classified in six categories. Two of the themes (workflow and agility) accounted for approximately 80% of the identified studies. The findings regarding workflow approaches consist mainly of adaptations to CRISP-DM ( vs entirely new proposed methodologies). With respect to agile approaches, most of the studies only explored the conceptual benefits of using an agile approach in a data science project ( vs actually evaluating an agile framework being used in a data science context). Hence, one finding from this research is that future research should explore how to best achieve the theorized benefits of agility. Another finding is the need to explore how to efficiently combine workflow and agile frameworks within a data science context to achieve a more comprehensive approach for project execution.

Introduction

There is an increasing use of big data science across a range of organizations. This means that there is a growing number of big data science projects conducted by organizations. These projects aim to create value by improving decision making, streamlining costs or enhancing business processes.

However, many of these projects fail to deliver the expected value ( Martinez, Viles & Olaizola, 2021 ). For example, VentureBeats (2019) noted that 87% of data science projects never make it into production and a NewVantage survey ( NewVantage Partners, 2019 ) reported that for 77% of businesses, the adoption of big data and artificial intelligence (AI) initiatives is a big challenge. A systematic review over the grey and scientific literature has found 21 cases of failed big data projects reported over the last decade ( Reggio & Astesiano, 2020 ). This is due, at least in part, to that fact that data science teams generally suffer from immature processes, often relying on trial-and-error and Ad Hoc processes ( Bhardwaj et al., 2015 ; Gao, Koronios & Selle, 2015 ; Saltz & Shamshurin, 2015 ). In short, big data science projects often do not leverage well-defined process methodologies ( Martinez, Viles & Olaizola, 2021 ; Saltz & Hotz, 2020 ). To further emphasize this point, in a survey to data scientists from both industry as well as from not-for-profit organizations, 82% of the respondents did not follow an explicit process methodology for developing data science projects, and equally important, 85% of the respondents stated that using an improved and more consistent process would produce more effective data science projects ( Saltz et al., 2018 ).

While a literature review in 2016 did not identify any research focused on improving data science team processes ( Saltz & Shamshurin, 2016 ), more recently, there has been increase in the studies specifically focused on how to organize and manage big data science projects in more efficient manner ( e.g . Martinez, Viles & Olaizola, 2021 ; Saltz & Hotz, 2020 ).

With this in mind, this paper presents a systematic review of research focused on the adoption of big data science process frameworks. The purpose is to present an overview of research works, findings, as well as implications for research and practice. This is necessary to identify (1) the key themes, with respect to current research on how teams execute data science projects, (2) the most common approaches regarding how data science projects are organized, managed and coordinated, (3) the activities involved in a data science projects life cycle, and (4) the implications for future research in this field.

The rest of the paper is organized as follows: “Background and Related Work” section provides information on big data process frameworks and the key challenges with respect to teams executing big data science projects. In the “Survey Methodology” section, the adopted research methodology is discussed, while the “Results” section presents the findings of the study. The insights from this SLR as well as implications for future research and limitations of the study are highlighted in the “Discussion” section. “Conclusions” section concludes the paper.

Background and Related Work

It has been frequently noted that project management (PM) is a key challenge for successfully executing data science projects. In other words, a key reason many data science projects fail is not technical in nature, but rather, the process aspect of the project ( Ponsard et al., 2017 ). Furthermore, Espinosa & Armour (2016) argue that task coordination is a major challenge for data projects. Likewise, Chen, Kazman & Haziyev (2016) conclude that coordination among business analysts, data scientists, system designers, development and operations is a major obstacle that compromises big data science initiatives. Angée et al. (2018) summarized the challenge by noting that it is important to use an appropriate process methodology, but which, if any, process is the most appropriate is not easy to know.

The importance of using a well-defined process framework

This data science process challenge, in terms of knowing what process framework to use for data science projects, is important because it has been observed that big data science projects are non-trivial and require well-defined processes ( Angée et al., 2018 ). Furthermore, using a process model or methodology results in higher quality outcomes and avoids numerous problems that decrease the risk of failure in data analytics projects ( Mariscal, Marbán & Fernández, 2010 ). Example problems that occur when a team does not use a process model include the team being slow to share information, deliver the wrong result, and in general, work inefficiently ( Gao, Koronios & Selle, 2015 ; Chen et al., 2017 ).

The most common framework: CRISP-DM

The CRoss-Industry Standard Process for Data Mining (CRISP-DM) ( Chapman et al., 2000 ) along with Knowledge Discovery in Databases (KDD) ( Fayyad, Piatetky-Shapiro & Smyth, 1996 ), which both were created in the 1990s, are considered ‘canonical’ methodologies for most of the data mining and data science processes and methodologies ( Martinez-Plumed et al., 2019 ; Mariscal, Marbán & Fernández, 2010 ). The evolution of those methodologies can be traced forward to more recent methodologies such as Refined Data Mining Process ( Mariscal, Marbán & Fernández, 2010 ), IBM’s Foundational Methodology for Data Science ( Rollins, 2015 ) and Microsoft’s Team Data Science Process ( Microsoft, 2020 ).

However, recent surveys show that when data science teams do use a process, CRISP-DM has been consistently the most commonly used framework and de facto standard for analytics, data mining and data science projects ( Martinez-Plumed et al., 2019 ; Saltz & Hotz, 2020 ). In fact, according to many opinion polls, CRISP-DM is the only process framework that is typically known by data science teams ( Saltz, n.d. ), with roughly half the respondents reporting to use some version of CRISP-DM.

Business understanding—includes identification of business objectives and data mining goals

Data understanding—involves data collection, exploration and validation

Data preparation—involves data cleaning, transformation and integration

Modelling—includes selecting modelling technique and creating and assessing models

Evaluation—evaluates the results against business objectives

Deployment—includes planning for deployment, monitoring and maintenance.

CRISP-DM allows some high-level iteration between the steps ( Gao, Koronios & Selle, 2015 ). Typically, when a project uses CRISP-DM, the project moves from one phase (such as data understanding) to the next phase ( e.g ., data preparation). However, as the team deems appropriate, the team can go back to a previous phase. In a sense, one can think of CRISP-DM as a waterfall model for data mining ( Gao, Koronios & Selle, 2015 ).

While CRISP-DM is popular, and CRISP-DM’s phased based approach is helpful to describe what the team should do, there are some limitations with the framework. For example, the framework provides little guidance on how to know when to loop back to a previous phase, iterate on the current phase, or move to the next phase. In addition, CRISP-DM does not contemplate the need for operational support after deployment.

The stated need for more research

Given that many data science teams do not use a well-defined process and that others use CRISP-DM with known challenges, it is not surprising that there has been a consistent calling for more research with respect to data science team process. For example, in Cao’s discussion of Data Science challenges and future directions ( Cao & Fayyad, 2017 ), it was noted that one of the key challenges in analyzing data includes developing methodologies for data science teams. Gupte (2018) similarly noted that the best approach to execute data science projects must be studied. However, even with this noted challenge on data science process, there is a well-accepted view that not enough has been written about the solutions to tackle these problems ( Martinez, Viles & Olaizola, 2021 ).

Is there still a need for more research?

This lack of research on data science process frameworks was certainly true 6 years ago, when the need for concise, thorough and validated information regarding the ways data science projects are organized, managed and coordinated was noted ( Saltz, 2015 ). This need was further clarified when, in a literature review of big data science process research, no papers were found that focused on improving a data science team’s process or overall project management ( Ransbotham, David & Prentice, 2015 ). This was also consistent with the view that most big data science research has focused on the technical capabilities required for data science and has overlooked the topic of managing data science projects ( Saltz & Shamshurin, 2016 ).

RQ1: Has research in this domain increased recently?

RQ2: What are the most common approaches regarding how data science projects are organized, managed and coordinated?

RQ3: What are the phases or activities in a data science project life cycle?

Survey Methodology

While there are many approaches to a literature review, one approach, which is followed in this research, is to combine quantitative and qualitative analysis to provide deeper insights ( Joseph et al., 2007 ). Furthermore, the systematic literature review conducted in this study leveraged the guidelines for performing SLRs suggested by Kitchenham & Charters (2007) and the data were collected in a similar manner as described in Saltz & Dewar (2019) . Hence, the SLR process consisted of three phases: planning, conducting and reporting the review. The subsections below present the outcomes of the first two phases, while the results of the review are reported in the next section.

Planning the review

In general, systematic reviews address the need to summarize and present the existing information about some phenomenon in a thorough and unbiased manner ( Kitchenham & Charters, 2007 ). As previously noted, the need for concise, thorough and validated information regarding the ways data science projects are organized, managed and coordinated is justified by the lack of established and mature methodologies for executing data science projects. This has led to our previously defined research questions, which are the drivers for how we structured our research.

The study search space comprises the following five online sources: ACM Digital Library, IEEEXplore, Scopus, ScienceDirect and Google Scholar. In addition to online sources, the search space might be enriched with reference lists from relevant primary studies and review articles ( Kitchenham & Charters, 2007 ). Specifically, the papers that cite the study providing justification for the present research ( Saltz, 2015 ) and the previous SLR on the subject ( Saltz & Shamshurin, 2016 ) are added to the study search space.

Data science related terms: (“data science” OR “big data” OR “machine learning”).

Project execution related terms: (“process methodology” OR “team process” OR “team coordination” OR “project management”).

To determine whether a paper should be included in our analysis, the following selection criteria are defined:

Papers that fully or partly include a description of the organization, management or coordination of big data science projects.

Papers that suggest specific approaches for executing big data science projects.

Papers that were published after 2015.

Papers that are not written in English

Papers that did not focus on data science team process, but rather, focused on using data analytics to improve overall project management processes were excluded.

Papers that had no form of peer review ( e.g . blogs).

Papers with irrelevant document type such as posters, conference summaries, etc .

Our exclusion of papers that discussed the use of analytics for overall project management considerations was driven by our desire to focus this research on understanding the specific attributes of data science projects, and how different frameworks were, or were not, applicable in the context of a data science project. This does not imply that data science has no role in helping to improve overall project management approaches. In fact, data science can and should add to the field of general project management, but we view this analysis as beyond the scope of our research.

Step1: Title and abstract screen—Initially, after the relevant papers from the search space are identified according to the study search strategy, the selection criteria will be applied considering only the title and the abstracts of the papers. This step is to be executed by the two authors over different sets of identified papers.

Step2: Full text screen—The full text of the candidate papers will then be reviewed by the two authors independently to identify the final set of primary studies to be included for further data analysis.

The approach for data extraction and synthesis followed in our study is based on the content analysis suggested in Elo & Kyngäs (2008) , Hsieh & Shannon (2005) . After exploring the key concepts used within each of the primary studies, general research themes are to be identified and further analysis of the data with respect to the study research questions is to be performed in both qualitative and quantitative manner.

Conducting the review

The SLR procedure was performed at the beginning of May, 2021. Because of the differences in running the searches over the online sources included in our search space, the identification of research and the first step of the selection procedure for Google Scholar were executed independently from the other digital libraries.

Search 1, the “data science” search: “data science” AND (“process methodology” OR “team process” OR “team coordination” OR “project management”).

Search 2, the “machine learning” search: “machine learning” AND (“process methodology” OR “team process” OR “team coordination” OR “project management”).

Search 3, the “big data” search: “big data” AND (“process methodology” OR “team process” OR “team coordination” OR “project management”).

Since the number of papers returned after executing the searches were very large, via a snowball sampling approach, only the first 220 papers in each result sets were included for further analysis. The first step of the selection procedure was executed for the unique papers in each of the sets and 48 papers were selected as candidates for primary studies. Table 1 shows the exact number of papers returned after running the searches and the first step of the selection procedure for Google Scholar.

Search strings Retrieved papers Candidate papers
“data science” search string 9,200 (first 220 used) 37
“machine learning” search string 17,800 (first 220 used) 1
“big data” search string 17,600 (first 220 used) 10

Executing the initial search strings over the digital libraries resulted a vast number of papers ( e.g ., over 1,500 papers for IEEE Xplore full text). Motivated by the results of the executed searches in Google Scholar, an optimization of the search terms was introduced. Since the ratio of candidate to retrieved papers for the “machine learning” Google Scholar search string was very low and only one paper was selected after the first step of the selection procedure, we removed the term “machine learning” from the initial “Data science related terms” search phrase. The final search string that was used for identification of studies from the digital libraries the was: (“data science” OR “big data” OR “machine learning”) AND (“process methodology” OR “team process” OR “team coordination” OR “project management”).

ACM Digital Library—full text search.

IEEEXplore—metadata-based and full text searches.

Scopus—metadata-based search.

ScienceDirect—metadata-based search.

When executing the searches, appropriate filters helping to meet inclusion and exclusion criteria for each of the sources were applied where available. We used Mendeley as a reference management tool to help us organize the retrieved papers and to automate the removal of duplicates. A total of 1,944 was returned by the searches, from which 1,697 were unique papers. After executing the title and abstract screen, 98 papers were selected for candidates for primary studies. The exact numbers of retrieved and candidate papers are presented in Table 2 . The numbers shown in the table include papers duplicated across the digital libraries.

Digital library search Retrieved papers Candidate papers
Scopus: Metadata 327 52
ACM: Full text 330 18
IEEE: All metadata 197 24
IEEE: Full Text 1,066 36
Science Direct: Metadata 24 5

The relevant studies search space comprised the papers that cite the two studies which provide the proper justification and relevant background for our research, namely ( Saltz, 2015 ) and ( Saltz & Shamshurin, 2016 ). A total of 159 papers were found to cite the two papers. After filtering the papers by screening the titles and abstracts, 64 of those papers were selected for candidate primary studies.

A consolidated list of all the candidate papers which were selected in the previous step of the selection procedure was created. The list included 120 unique papers. After performing the next step of the selection procedure (full text review), 68 papers were selected. These papers comprised the list of primary studies that were further analyzed to provide the answers to our research questions. The steps of the SLR procedure that led to the identification of the primary studies for our study are presented in Fig. 1 .

Steps of the SLR procedure for identification of primary studies.

Figure 1: Steps of the SLR procedure for identification of primary studies.

Following the guidelines by Cruzes & Dybå (2011) , thematic analysis and synthesis was applied during data extraction and synthesis. We used the integrated approach ( Cruzes & Dybå, 2011 ), which employs both inductive and deductive code development, for retrieving the research themes related to the execution of data science projects as well as for defining the categories of workflow approaches and the themes for agile adoption presented in the following section.

This section presents the findings of the SLR with regard to the three research questions defined in the planning phase.

Research activity in this domain (RQ1)

As shown in Fig. 2 , there has been an increase in the number of articles published over time. Note that the review was in done in May 2021, so the 2021 year was on pace to have more papers than any other year ( i.e ., over the full year, 2021 was on pace to have 18+ papers). Furthermore, it is likely that 2020 had a reduction due to COVID.

Number of papers per year.

Figure 2: Number of papers per year.

We also explored publishing outlets. Specifically, Fig. 3 shows the number of papers for each publisher. IEEE was the most frequent publisher, with 31 (46%) papers, due in part to a yearly IEEE workshop on this domain, that started in 2015. The next highest publisher was ACM, with nine papers (13%).

Number of papers for each publisher.

Figure 3: Number of papers for each publisher.

Approaches for executing data science projects (rq2).

Table 3 provides an overview of the six themes identified, with respect to the approaches for defining and using a data science process framework. The table also shows the relevant primary studies. While the six themes that we identified in our SLR are all relevant to project execution, there was a wide range in the number of papers published for the different themes. The ratio of publications across the different themes provides a high-level view of current research efforts regarding the execution of data science projects.

Theme Primary studies Total number
Workflows See 27
Agility See 26
Process adoption ( , ; ; ; ; ) 6
General PM ( ; ; ; ) 4
Tools ( ; ; ; ; ) 5
Reviews ( ; ; ; ; ; ; ) 7
Category Reference workflows Primary studies
New N\A ( ; )
CRISP-DM ( ; ; )
KDD, CRISP-DM ( )
Standard CRISP-DM ( ; ; )
Specialization CRISP-DM ( ; )
KDD ( )
Extension CRISP-DM ( ; ; ; )
KDD ( )
other ( ; ; )
Enrichment CRISP-DM ( ; ; ; ; ; )
other ( )
Theme Primary studies Type Total number
Conceptual Benefits of Agility ( ; ; ; ; ; ; ; ; ; ; ; ; ; ; ) Conceptual 15 (58%)
Challenges in Scrum ( ; ; ; ; ) Case Study 5 (19%)
Scrum is used ( ; ) Case Study 2 (7%)
Conceptual Benefits of Scrum ( ; ) Conceptual 2 (7%)
Conceptual Benefits of Lean ( ) Conceptual 1 (4%)
Challenges in Kanban ( ) Case Study 1 (4%)

Below we provide a description for each of the themes, with an expanded focus on the two most popular themes (workflows and agility).

Workflows papers explored how data science projects were organized with respect to the phases, steps, activities and tasks of the execution process ( e.g ., CRISP-DM’s project phases). There were 27 papers in this theme, which is about 40% of the total number of primary studies. Workflow approaches are discussed in our second research question and a detailed overview of the relevant studies will be provided in the following section.

Agility papers described the adoption of agile approaches and considered specific aspects of project execution such as the need for iterations or how teams should coordination and collaborate. The high number of papers categorized in the Agility theme (26 out of 68) might be due to the successful adoption of agile methodologies in various software development projects. The theme will be covered in the next section since agile adoption is also relevant to our second research question. Seven papers explored both the workflows and agility themes.

Process adoption papers discussed the key factors as well as the challenges for a data science team to adopt a new process. Specifically, the papers that discussed process adoption considered questions such as acceptance factors ( Saltz, 2017 , 2018 ; Saltz & Hotz, 2021 ), project success factors ( Soukaina et al., 2019 ), exploring the application of software engineering practices in the data science context ( Saltz & Shamshurin, 2017 ), and would deep learning impact a data science teams process adoption ( Shamshurin & Saltz, 2019a ).

General PM papers discussed general project management challenges. These papers did not focus on addressing any data science unique characteristics, but rather, general management challenges such as the team’s process maturity ( Saltz & Shamshurin, 2015 ), the need for collaboration ( Mao et al., 2019 ), the organizational needs and challenges when executing projects ( Ramesh & Ramakrishna, 2018 ) and training of human resources ( Mullarkey et al., 2019 ).

Tools focused papers described new tools that could improve the data science team’s productivity. Five papers explored how different tools, both custom and commercial, could be used to support various aspects of the execution of the data science projects. The tools explored focused on communication and collaboration ( Marin, 2019 ; Wang et al., 2019 ), Continuous Integration/Continuous Development ( Chen et al., 2020 ), the maintainability of a data science project ( Saltz et al., 2020 ) and a tool to improve the coordination of the data science team ( Crowston et al., 2021 ).

Reviews were papers that reported on a SLR for a specific topic related to data science project execution or papers that report on an industry survey. An SLR aiming to find out benefits and challenges on applying CRISP-DM in research studies is presented in Schröer, Kruse & Gómez (2021) . How different data mining methodologies are adapted in practice is investigated in Plotnikova, Dumas & Milani (2020) . That literature review covered 207 peer-reviewed and ‘grey’ publications and identified four adaptation patters and two recurrent purposes for adaptation. Another SLR focused on experience reports and explored the adoption of agile software development methods in data science projects ( Krasteva & Ilieva, 2020 ). An extensive critical review over 19 data science methodologies is presented in Martinez, Viles & Olaizola (2021) . The paper also proposed principles of an integral methodology for data science which should include the three foundation stones: project, team and data & information management. Professionals with different roles across multiple organizations were surveyed in Saltz et al. (2018) about the methodology they used in their data science projects and whether an improved project management process would benefit their results. The two papers that formed the core of our search space of related papers ( Saltz, 2015 ) and ( Saltz & Shamshurin, 2016 ), were also included in the Reviews thematic category.

Workflow approaches

Specialization—adjustments to standard workflows, which are made to better suit particular big data technology or specific domain.

Extension—addition of new steps, tasks or activities to extend standard workflow phases.

Enrichment—extension of the scope of a standard workflow to provide more comprehensive coverage of the project execution activities.

An overview of workflow categories and respective primary studies is presented in Table 4 . Multiple studies of the same workflow are shown in brackets. Most of the workflows use a standard framework as a reference point for specification of both new and adapted workflows. As seen in Table 4 , CRISP-DM provides the basis for the majority of the workflow papers. Below we explore each of these categories in more depth.

New workflows

While the workflow proposed in Grady (2016) make use of CRISP-DM activities, a new workflow with four phases, five stages and more than 15 activities was designed to accommodate big data technologies and data science activities. Providing a more focused technology perspective ( Amershi et al., 2019 ) proposes a nine-stage workflow for integrating machine learning into application and platform development. Uniting the advantages of experimentation and iterative working along with a greater understanding of the user requirements, a novel approach for data projects is proposed in Ahmed, Dannhauser & Philip (2019) . The suggested workflow consists of three stages and seven steps and integrates the principles of the Lean Start-up method and design thinking with CRISP-DM activities. The workflows in Dutta & Bose (2015) and Shah, Gochtovtt & Baldini (2019) are designed and used in companies, and integrate strategic perspective with planning, management and implementation.

Standard workflows

Three of the primary studies reported on using CRISP-DM in student projects and compared and contracted the adoption of different methodologies ( e.g . CRISP-DM, Scrum and Kanban) for executing data science projects.

Workflow specializations

Specialization category is the smallest of the three adaption sub-categories. Two of the workflows in this category were based on CRISP-DM and were specialized for sequence analysis ( Kalgotra & Sharda, 2016 ) or anomaly detection ( Schwenzfeier & Gruhn, 2018 ). In addition, a revised KDD procedure model for time-series data was proposed in Vernickel et al. (2019) .

Workflow extensions

An extension to CRISP-DM for knowledge discovery on social networks was specified as a seven-stage workflow that can be applied in different domains intersecting with social network platforms ( Asamoah & Sharda, 2019 ). While this workflow extended CRISP-DM for big data, the workflows in Ponsard, Touzani & Majchrowski (2017) and Qadadeh & Abdallah (2020) added additional workflow steps focused on identification of data value and business objectives. An extension to KDD for public healthcare was proposed in Silva, Saraee & Saraee (2019) . The suggested workflow implies user-friendly techniques and tools to help healthcare professionals use data science in their daily work. By performing a SLR of recent developments in KD process models ( Baijens & Helms, 2019 ) proposes relevant adjustments of the steps and tasks of the Refined Data Mining Process ( Mariscal, Marbán & Fernández, 2010 ). The IBM’s Analytics Solutions Unified Method for Data Mining/predictive analytics (ASUM-DM) is extended in Angée et al. (2018) for a specific use case in the banking sector with focus on big data analytics, prototyping and evaluation. A software engineering lifecycle process for big data projects is proposed in Lin & Huang (2017) as an extension to the ISO/IEC standard 15288:2008.

Workflow enrichments

There were several papers that extend CRISP-DM in different dimensions. The studies in Kolyshkina & Simoff (2019) and Fahse, Huber & van Giffen (2021) addressed two important aspects of ML solutions—interpretability and bias, respectively. They suggested new activities and methods integrated in CRISP-DM steps for satisfying desired interpretability level and for bias prevention and mitigation. A novel approach for custom workflow creation from a flexible and comprehensive Data Science Trajectory map of activities was suggested in Martinez-Plumed et al. (2019) . The approach is designed to address the diversity of data science projects and their exploratory nature. The workflow presented in Kordon (2020) proposes improvements to CRISP-DM in several areas—maintenance and support, knowledge acquisition and project management. Scheduling, roles and tools are integrated with CRISP-DM in a methodology, presented in Costa & Aparicio (2020) . Checkpoints and synchronization are used in the proposed in Yamada & Peran (2017) Analytics Governance Framework to facilitate communication and coordination between the client and the data science team. Collaboration is the primary focus in Zhang, Muller & Wang (2020) , in which a basic workflow is extended with collaborative practices, roles and tools.

Agile approaches

As shown in Table 5 , there were 26 papers that focused on the need for agility within data science projects. Only 31% of the papers actually reported on teams using an agile approach. The rest of the papers, 69% (18 of the 26 papers), were conceptual in nature. These conceptual papers explained why it makes sense that a framework should be helpful for a data science project but provided no examples that the framework actually helps a data science team.

Specifically, the vast majority of the papers (15 papers), explored the potential benefits of agility for data science projects. These papers were labeled general agility papers since they did not explicitly support any specific agile approach, but rather, noted the benefits teams should get by adopting an agile framework. The expected benefits of agility typically focused on the need for multiple iterations to support the exploratory nature of data science projects, especially since the outcomes are uncertain. This would allow teams to adjust their future plans based on the results of their current iteration.

Two papers discussed the potential benefits of Scrum. However, five papers reported on the difficulty teams encountered when they actually tried to use Scrum. Often times, issues arose due to the challenge in accurately estimating how long a task would take to complete. This issue of task estimation impacted the team’s ability to determine what work items could fit into a sprint. Two other papers reported on the use of Scrum within data science team, but both of those papers did not describe in depth how the team used Scrum, nor if there were any benefits or issues due to their use of Scrum.

Finally, one paper discussed the conceptual benefits of using a lean approach and a different paper reported on the challenge in using Kanban (which can be thought as supporting both agility and lean principles). That paper explored the need for the process master role, similar to the Scrum Master role in Scrum.

Combined approaches

The seven papers that covered both the workflow and agility themes presented a more comprehensive methodology for project execution. Several proposed new frameworks ( Grady, Payne & Parker, 2017 ; Ponsard, Touzani & Majchrowski, 2017 ; Ponsard et al., 2017 ; Ahmed, Dannhauser & Philip, 2019 ). All of the newly proposed frameworks defined a new workflow (typically based on CRISP-DM), and also suggested that the project do iterations and focus on creating a minimal viable product (MVP). However, there was no consensus on if the iterations should be time-boxed or capability based. Furthermore, there no consensus on how to integrate the data science life cycle into each iteration. In fact, two papers didn’t explicitly address this question ( Ponsard, Touzani & Majchrowski, 2017 ; Ponsard et al., 2017 ) and another article implied that something should be done for each phase in each sprint ( Grady, Payne & Parker, 2017 ). Yet another article suggested that maybe some iterations focus on a specific phase and other iterations might focus on more than one phase ( Ahmed, Dannhauser & Philip, 2019 ).

Three articles analyzed existing frameworks, including both workflow and agile frameworks ( Saltz, Shamshurin & Crowston, 2017 ; Saltz, Heckman & Shamshurin, 2017 ; Shah, Gochtovtt & Baldini, 2019 ). For both of these articles, there was not explicit discussion on how to integrate workflow frameworks with agile frameworks.

Data science project life cycle activities (RQ3)

Table 6 shows a synthesized overview of the life cycle phases mentioned in the workflow papers, presented above. This table also shows the number (and percentage) of papers that mention a specific data science life cycle phase. One can note that the most common phases are the CRISP-DM phases.

Theme Total number CRISP-DM phase
Readiness assessment 1 (4%)
Project organization 5 (18%)
Business understanding 19 (68%)
Problem identification 8 (29%)
Data acquisition 10 (36%)
Data understanding 15 (54%)
Data preparation 21 (75%)
Feature engineering 4 (14%)
Data analysis/Exploration 9 (32%)
Modeling 25 (89%)
Model refinement 2 (7%)
Evaluation 23 (82%)
Interpret/Explain 2 (7%)
Deployment 20 (71%)
Business value 5 (18%)
Monitoring 2 (7%)
Maintenance 3 (11%)

The section presents further analysis on the findings of the study, highlighting the insights and implications for future research as well as exploring several validity threats.

Insights and implications for future research

The analysis of the information extracted for each primary study provided interesting insights on how data science projects are currently organized, managed and executed. The findings regarding categories of workflows confirm the trend observed in Plotnikova, Dumas & Milani (2020) of the large number of adaptations of workflow frameworks ( vs proposing new methodologies). While CRISP-DM is reported to be the most widely used framework for data science projects ( e.g . Saltz & Hotz, 2020 ), the adaptions of CRISP-DM in data science projects are much more commonly reported in the research literature, which raises the question if teams are adapting CRISP-DM, when they are using it within their project.

Most of the agility papers were conceptual in nature, and many of the other papers reported on issues when using Scrum. Hence, more research is needed to explore how to achieve the theorized benefits of agility, perhaps by adapting Scrum or using a different framework.

Combining workflow approaches with agile frameworks within a data science context is a way to achieve an integral framework for project execution. However, more research is needed on how to combine these two approaches. For example, the research presented in Martinez, Viles & Olaizola (2021) over the 19 methodologies for data science projects determined that only four of them could be classified as integral according to the criteria defined in the study. Specifying new data science methodologies that cover different aspects of project execution ( e.g . team coordination, data and system engineering, stakeholder collaboration) is a promising direction for future research.

To explore if the life cycle activities mentioned in the workflow papers have changed over time, we conducted a comparative analysis with a similar SLR in which 23 data mining process models are compared based on process steps ( Rotondo & Quilligan, 2020 ). As all of the papers from the previous SLR were prior to 2018, comparing the two SLR’s provides a way to see if the usage of different phases has changed over time. It was observed that the use of an exploratory phase (Data Analysis/Exploration) was increasing, while the model interpretation and explanation phase (Interpret/Explain) was decreasing. The last is perhaps due to these tasks being integrated into the evaluation phase.

Validity threats

Several limitations of the study present potential threats to its validity. One limitation is that the SLR was based on a specific set of search strings. It is possible a different search string could have identified other interesting articles. Adding an additional search space based on citations of relevant studies tried to mitigate the impact of this potential threat.

Another limitation is that while authors explored ACM Digital Library, IEEEXplore, Scopus, ScienceDirect and Google Scholar databases, which index high impact journals and conference papers from IEEE, ACM, SpringerLink, and Elsevier, it is possible that some relevant articles from other publication outlets could have been missed. In addition, the grey literature was not analyzed. This literature could have provided additional insights on the adoption of data science approaches in industrial settings. Yet another limitation is that the analysis and synthesis were based on qualitative content analysis and thematic synthesis of the selected articles by the research team. The authors tried to minimize the subjectivity of researchers’ interpretation by cross-checking papers to reduce bias.

Conclusions

This study presents a systematic review of research focused on the adoption of big data science process frameworks. The study shows that research on how data science projects are organized, managed and executed has increased significantly during the last 6 years. Furthermore, the review identified 68 primary studies and thematically classified these studies in six key themes, with respect to current research on how teams execute data science projects (workflows, agility, process adoption, general PM, tools, and reviews). CRISP-DM was the most common workflow discussed, and the different adaption patterns of CRISP-DM—specializations, extensions and enrichments, were the most common approaches for specifying and using adjusted workflows for data science projects.

However, standardized approaches explicitly designed for the data science context were not identified, and hence, is a gap in current research and practice. Similarly, with respect to agile approaches, more research is needed to explore how and if the conceptual benefits of agility noted in many of the identified papers can actually be achieved in practice. In addition, another direction for future research is to explore combining workflow and agile approaches into a more comprehensive framework that covers different aspects of project execution.

The current study can be enhanced and extended in three directions. First, the search space could be expanded by using the snowballing technique ( Wohlin, 2014 ) for identification of relevant articles. Some of the primary studies identified in the current study can be used as seed papers in a future execution of the procedure. Second, conducting a multivocal literature review ( Garousi, Felderer & Mäntylä, 2016 ) including grey literature can complement the results of the study by collecting more experience reports and real-world adoptions from industry. Finally, future research could explore if the process used should vary based on different industries, or if, the appropriate data science process is independent of the specific industry project context.

Download article

Report a problem.

Common use cases Typos, corrections needed, missing information, abuse, etc

Our promise PeerJ promises to address all issues as quickly and professionally as possible. We thank you in advance for your patience and understanding.

Typo Missing or incorrect metadata Quality: PDF, figure, table, or data quality Download issues Abusive behavior Research misconduct Other issue not listed above

Follow this publication for updates

You can also choose to receive updates via daily or weekly email digests. If you are following multiple publications then we will send you no more than one email per day or week based on your preferences.

Note: You are now also subscribed to the subject areas of this publication and will receive updates in the daily or weekly email digests if turned on. You can add specific subject areas through your profile settings.

Change notification settings or unfollow

Loading ...

Usage since published - updated daily

Top referrals unique visitors

Share this publication, articles citing this paper.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PLoS Comput Biol
  • v.9(7); 2013 Jul

Logo of ploscomp

Ten Simple Rules for Writing a Literature Review

Marco pautasso.

1 Centre for Functional and Evolutionary Ecology (CEFE), CNRS, Montpellier, France

2 Centre for Biodiversity Synthesis and Analysis (CESAB), FRB, Aix-en-Provence, France

Literature reviews are in great demand in most scientific fields. Their need stems from the ever-increasing output of scientific publications [1] . For example, compared to 1991, in 2008 three, eight, and forty times more papers were indexed in Web of Science on malaria, obesity, and biodiversity, respectively [2] . Given such mountains of papers, scientists cannot be expected to examine in detail every single new paper relevant to their interests [3] . Thus, it is both advantageous and necessary to rely on regular summaries of the recent literature. Although recognition for scientists mainly comes from primary research, timely literature reviews can lead to new synthetic insights and are often widely read [4] . For such summaries to be useful, however, they need to be compiled in a professional way [5] .

When starting from scratch, reviewing the literature can require a titanic amount of work. That is why researchers who have spent their career working on a certain research issue are in a perfect position to review that literature. Some graduate schools are now offering courses in reviewing the literature, given that most research students start their project by producing an overview of what has already been done on their research issue [6] . However, it is likely that most scientists have not thought in detail about how to approach and carry out a literature review.

Reviewing the literature requires the ability to juggle multiple tasks, from finding and evaluating relevant material to synthesising information from various sources, from critical thinking to paraphrasing, evaluating, and citation skills [7] . In this contribution, I share ten simple rules I learned working on about 25 literature reviews as a PhD and postdoctoral student. Ideas and insights also come from discussions with coauthors and colleagues, as well as feedback from reviewers and editors.

Rule 1: Define a Topic and Audience

How to choose which topic to review? There are so many issues in contemporary science that you could spend a lifetime of attending conferences and reading the literature just pondering what to review. On the one hand, if you take several years to choose, several other people may have had the same idea in the meantime. On the other hand, only a well-considered topic is likely to lead to a brilliant literature review [8] . The topic must at least be:

  • interesting to you (ideally, you should have come across a series of recent papers related to your line of work that call for a critical summary),
  • an important aspect of the field (so that many readers will be interested in the review and there will be enough material to write it), and
  • a well-defined issue (otherwise you could potentially include thousands of publications, which would make the review unhelpful).

Ideas for potential reviews may come from papers providing lists of key research questions to be answered [9] , but also from serendipitous moments during desultory reading and discussions. In addition to choosing your topic, you should also select a target audience. In many cases, the topic (e.g., web services in computational biology) will automatically define an audience (e.g., computational biologists), but that same topic may also be of interest to neighbouring fields (e.g., computer science, biology, etc.).

Rule 2: Search and Re-search the Literature

After having chosen your topic and audience, start by checking the literature and downloading relevant papers. Five pieces of advice here:

  • keep track of the search items you use (so that your search can be replicated [10] ),
  • keep a list of papers whose pdfs you cannot access immediately (so as to retrieve them later with alternative strategies),
  • use a paper management system (e.g., Mendeley, Papers, Qiqqa, Sente),
  • define early in the process some criteria for exclusion of irrelevant papers (these criteria can then be described in the review to help define its scope), and
  • do not just look for research papers in the area you wish to review, but also seek previous reviews.

The chances are high that someone will already have published a literature review ( Figure 1 ), if not exactly on the issue you are planning to tackle, at least on a related topic. If there are already a few or several reviews of the literature on your issue, my advice is not to give up, but to carry on with your own literature review,

An external file that holds a picture, illustration, etc.
Object name is pcbi.1003149.g001.jpg

The bottom-right situation (many literature reviews but few research papers) is not just a theoretical situation; it applies, for example, to the study of the impacts of climate change on plant diseases, where there appear to be more literature reviews than research studies [33] .

  • discussing in your review the approaches, limitations, and conclusions of past reviews,
  • trying to find a new angle that has not been covered adequately in the previous reviews, and
  • incorporating new material that has inevitably accumulated since their appearance.

When searching the literature for pertinent papers and reviews, the usual rules apply:

  • be thorough,
  • use different keywords and database sources (e.g., DBLP, Google Scholar, ISI Proceedings, JSTOR Search, Medline, Scopus, Web of Science), and
  • look at who has cited past relevant papers and book chapters.

Rule 3: Take Notes While Reading

If you read the papers first, and only afterwards start writing the review, you will need a very good memory to remember who wrote what, and what your impressions and associations were while reading each single paper. My advice is, while reading, to start writing down interesting pieces of information, insights about how to organize the review, and thoughts on what to write. This way, by the time you have read the literature you selected, you will already have a rough draft of the review.

Of course, this draft will still need much rewriting, restructuring, and rethinking to obtain a text with a coherent argument [11] , but you will have avoided the danger posed by staring at a blank document. Be careful when taking notes to use quotation marks if you are provisionally copying verbatim from the literature. It is advisable then to reformulate such quotes with your own words in the final draft. It is important to be careful in noting the references already at this stage, so as to avoid misattributions. Using referencing software from the very beginning of your endeavour will save you time.

Rule 4: Choose the Type of Review You Wish to Write

After having taken notes while reading the literature, you will have a rough idea of the amount of material available for the review. This is probably a good time to decide whether to go for a mini- or a full review. Some journals are now favouring the publication of rather short reviews focusing on the last few years, with a limit on the number of words and citations. A mini-review is not necessarily a minor review: it may well attract more attention from busy readers, although it will inevitably simplify some issues and leave out some relevant material due to space limitations. A full review will have the advantage of more freedom to cover in detail the complexities of a particular scientific development, but may then be left in the pile of the very important papers “to be read” by readers with little time to spare for major monographs.

There is probably a continuum between mini- and full reviews. The same point applies to the dichotomy of descriptive vs. integrative reviews. While descriptive reviews focus on the methodology, findings, and interpretation of each reviewed study, integrative reviews attempt to find common ideas and concepts from the reviewed material [12] . A similar distinction exists between narrative and systematic reviews: while narrative reviews are qualitative, systematic reviews attempt to test a hypothesis based on the published evidence, which is gathered using a predefined protocol to reduce bias [13] , [14] . When systematic reviews analyse quantitative results in a quantitative way, they become meta-analyses. The choice between different review types will have to be made on a case-by-case basis, depending not just on the nature of the material found and the preferences of the target journal(s), but also on the time available to write the review and the number of coauthors [15] .

Rule 5: Keep the Review Focused, but Make It of Broad Interest

Whether your plan is to write a mini- or a full review, it is good advice to keep it focused 16 , 17 . Including material just for the sake of it can easily lead to reviews that are trying to do too many things at once. The need to keep a review focused can be problematic for interdisciplinary reviews, where the aim is to bridge the gap between fields [18] . If you are writing a review on, for example, how epidemiological approaches are used in modelling the spread of ideas, you may be inclined to include material from both parent fields, epidemiology and the study of cultural diffusion. This may be necessary to some extent, but in this case a focused review would only deal in detail with those studies at the interface between epidemiology and the spread of ideas.

While focus is an important feature of a successful review, this requirement has to be balanced with the need to make the review relevant to a broad audience. This square may be circled by discussing the wider implications of the reviewed topic for other disciplines.

Rule 6: Be Critical and Consistent

Reviewing the literature is not stamp collecting. A good review does not just summarize the literature, but discusses it critically, identifies methodological problems, and points out research gaps [19] . After having read a review of the literature, a reader should have a rough idea of:

  • the major achievements in the reviewed field,
  • the main areas of debate, and
  • the outstanding research questions.

It is challenging to achieve a successful review on all these fronts. A solution can be to involve a set of complementary coauthors: some people are excellent at mapping what has been achieved, some others are very good at identifying dark clouds on the horizon, and some have instead a knack at predicting where solutions are going to come from. If your journal club has exactly this sort of team, then you should definitely write a review of the literature! In addition to critical thinking, a literature review needs consistency, for example in the choice of passive vs. active voice and present vs. past tense.

Rule 7: Find a Logical Structure

Like a well-baked cake, a good review has a number of telling features: it is worth the reader's time, timely, systematic, well written, focused, and critical. It also needs a good structure. With reviews, the usual subdivision of research papers into introduction, methods, results, and discussion does not work or is rarely used. However, a general introduction of the context and, toward the end, a recapitulation of the main points covered and take-home messages make sense also in the case of reviews. For systematic reviews, there is a trend towards including information about how the literature was searched (database, keywords, time limits) [20] .

How can you organize the flow of the main body of the review so that the reader will be drawn into and guided through it? It is generally helpful to draw a conceptual scheme of the review, e.g., with mind-mapping techniques. Such diagrams can help recognize a logical way to order and link the various sections of a review [21] . This is the case not just at the writing stage, but also for readers if the diagram is included in the review as a figure. A careful selection of diagrams and figures relevant to the reviewed topic can be very helpful to structure the text too [22] .

Rule 8: Make Use of Feedback

Reviews of the literature are normally peer-reviewed in the same way as research papers, and rightly so [23] . As a rule, incorporating feedback from reviewers greatly helps improve a review draft. Having read the review with a fresh mind, reviewers may spot inaccuracies, inconsistencies, and ambiguities that had not been noticed by the writers due to rereading the typescript too many times. It is however advisable to reread the draft one more time before submission, as a last-minute correction of typos, leaps, and muddled sentences may enable the reviewers to focus on providing advice on the content rather than the form.

Feedback is vital to writing a good review, and should be sought from a variety of colleagues, so as to obtain a diversity of views on the draft. This may lead in some cases to conflicting views on the merits of the paper, and on how to improve it, but such a situation is better than the absence of feedback. A diversity of feedback perspectives on a literature review can help identify where the consensus view stands in the landscape of the current scientific understanding of an issue [24] .

Rule 9: Include Your Own Relevant Research, but Be Objective

In many cases, reviewers of the literature will have published studies relevant to the review they are writing. This could create a conflict of interest: how can reviewers report objectively on their own work [25] ? Some scientists may be overly enthusiastic about what they have published, and thus risk giving too much importance to their own findings in the review. However, bias could also occur in the other direction: some scientists may be unduly dismissive of their own achievements, so that they will tend to downplay their contribution (if any) to a field when reviewing it.

In general, a review of the literature should neither be a public relations brochure nor an exercise in competitive self-denial. If a reviewer is up to the job of producing a well-organized and methodical review, which flows well and provides a service to the readership, then it should be possible to be objective in reviewing one's own relevant findings. In reviews written by multiple authors, this may be achieved by assigning the review of the results of a coauthor to different coauthors.

Rule 10: Be Up-to-Date, but Do Not Forget Older Studies

Given the progressive acceleration in the publication of scientific papers, today's reviews of the literature need awareness not just of the overall direction and achievements of a field of inquiry, but also of the latest studies, so as not to become out-of-date before they have been published. Ideally, a literature review should not identify as a major research gap an issue that has just been addressed in a series of papers in press (the same applies, of course, to older, overlooked studies (“sleeping beauties” [26] )). This implies that literature reviewers would do well to keep an eye on electronic lists of papers in press, given that it can take months before these appear in scientific databases. Some reviews declare that they have scanned the literature up to a certain point in time, but given that peer review can be a rather lengthy process, a full search for newly appeared literature at the revision stage may be worthwhile. Assessing the contribution of papers that have just appeared is particularly challenging, because there is little perspective with which to gauge their significance and impact on further research and society.

Inevitably, new papers on the reviewed topic (including independently written literature reviews) will appear from all quarters after the review has been published, so that there may soon be the need for an updated review. But this is the nature of science [27] – [32] . I wish everybody good luck with writing a review of the literature.

Acknowledgments

Many thanks to M. Barbosa, K. Dehnen-Schmutz, T. Döring, D. Fontaneto, M. Garbelotto, O. Holdenrieder, M. Jeger, D. Lonsdale, A. MacLeod, P. Mills, M. Moslonka-Lefebvre, G. Stancanelli, P. Weisberg, and X. Xu for insights and discussions, and to P. Bourne, T. Matoni, and D. Smith for helpful comments on a previous draft.

Funding Statement

This work was funded by the French Foundation for Research on Biodiversity (FRB) through its Centre for Synthesis and Analysis of Biodiversity data (CESAB), as part of the NETSEED research project. The funders had no role in the preparation of the manuscript.

Help | Advanced Search

Computer Science > Artificial Intelligence

Title: comprehensive review and empirical evaluation of causal discovery algorithms for numerical data.

Abstract: Causal analysis has become an essential component in understanding the underlying causes of phenomena across various fields. Despite its significance, the existing literature on causal discovery algorithms is fragmented, with inconsistent methodologies and a lack of comprehensive evaluations. This study addresses these gaps by conducting an exhaustive review and empirical evaluation of causal discovery methods for numerical data, aiming to provide a clearer and more structured understanding of the field. Our research began with a comprehensive literature review spanning over a decade, revealing that existing surveys fall short in covering the vast array of causal discovery advancements. We meticulously analyzed over 200 scholarly articles to identify 24 distinct algorithms. This extensive analysis led to the development of a novel taxonomy tailored to the complexities of causal discovery, categorizing methods into six main types. Addressing the lack of comprehensive evaluations, our study conducts an extensive empirical assessment of more than 20 causal discovery algorithms on synthetic and real-world datasets. We categorize synthetic datasets based on size, linearity, and noise distribution, employing 5 evaluation metrics, and summarized the top-3 algorithm recommendations for different data scenarios. The recommendations have been validated on 2 real-world datasets. Our results highlight the significant impact of dataset characteristics on algorithm performance. Moreover, a metadata extraction strategy was developed to assist users in algorithm selection on unknown datasets. The accuracy of estimating metadata is higher than 80%. Based on these insights, we offer professional and practical recommendations to help users choose the most suitable causal discovery methods for their specific dataset needs.
Subjects: Artificial Intelligence (cs.AI)
Cite as: [cs.AI]
  (or [cs.AI] for this version)

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

  • Open access
  • Published: 15 July 2024

Establishing evidence criteria for implementation strategies in the US: a Delphi study for HIV services

  • Virginia R. McKay   ORCID: orcid.org/0000-0002-9299-3294 1 ,
  • Alithia Zamantakis 2 , 3 ,
  • Ana Michaela Pachicano 3 ,
  • James L. Merle 4 ,
  • Morgan R. Purrier 3 ,
  • McKenzie Swan 1 ,
  • Dennis H. Li 3 , 5 , 6 , 7 ,
  • Brian Mustanski 2 , 3 , 5 , 6 ,
  • Justin D. Smith 3 ,
  • Lisa R. Hirschhorn 2 &
  • Nanette Benbow 3 , 5 , 6  

Implementation Science volume  19 , Article number:  50 ( 2024 ) Cite this article

409 Accesses

1 Altmetric

Metrics details

There are no criteria specifically for evaluating the quality of implementation research and recommending implementation strategies likely to have impact to practitioners. We describe the development and application of the Best Practices Tool, a set of criteria to evaluate the evidence supporting HIV-specific implementation strategies.

We developed the Best Practices Tool from 2022–2023 in three phases. (1) We developed a draft tool and criteria based on a literature review and key informant interviews. We purposively selected and recruited by email interview participants representing a mix of expertise in HIV service delivery, quality improvement, and implementation science. (2) The tool was then informed and revised through two e-Delphi rounds using a survey delivered online through Qualtrics. The first and second round Delphi surveys consisted of 71 and 52 open and close-ended questions, respectively, asking participants to evaluate, confirm, and make suggestions on different aspects of the rubric. After each survey round, data were analyzed and synthesized as appropriate; and the tool and criteria were revised. (3) We then applied the tool to a set of research studies assessing implementation strategies designed to promote the adoption and uptake of evidence-based HIV interventions to assess reliable application of the tool and criteria.

Our initial literature review yielded existing tools for evaluating intervention-level evidence. For a strategy-level tool, additions emerged from interviews, for example, a need to consider the context and specification of strategies. Revisions were made after both Delphi rounds resulting in the confirmation of five evaluation domains – research design, implementation outcomes, limitations and rigor, strategy specification, and equity – and four evidence levels – best, promising, more evidence needed, and harmful. For most domains, criteria were specified at each evidence level. After an initial pilot round to develop an application process and provide training, we achieved 98% reliability when applying the criteria to 18 implementation strategies.

Conclusions

We developed a tool to evaluate the evidence supporting implementation strategies for HIV services. Although specific to HIV in the US, this tool is adaptable for evaluating strategies in other health areas.

Peer Review reports

Contributions to the literature

The field of implementation science has not yet established criteria to evaluate the quality of evidence for implementation strategies.

Our Delphi process with experts in implementation science, quality improvement, and HIV services identified criteria that should be used to evaluate evidence for HIV-related implementation research in the US.

Our tool can be applied to cases of HIV-related implementation research to make recommendations to practitioners implementation strategies most likely to be effective. They could also be adapted to other health domains.

Introduction

Implementation science is dedicated to improving the uptake and use of evidence-based interventions, practices, and policies to capitalize on scientific knowledge and impact human health. Central to the goals of implementation research is building the evidence for implementation strategies, defined as techniques or change efforts to promote the adoption, implementation, and sustainment of evidence-based interventions (EBIs) [ 1 ]. In a recent review, scholars within the field of implementation science recognized that a more robust research agenda related to implementation strategies is needed to yield the promised benefits of improved EBI implementation for practitioners [ 2 ]. Within this agenda is a call for more research on the effectiveness of implementation strategies. Expanding on this priority, criteria on which to evaluate evidence quality are needed to assess whether the evidence supporting the effectiveness of any given strategy is sufficient. Without criteria on which to evaluate implementation research focusing on strategies, it is difficult to recommend strategies that are likely to be the most valuable for practitioners or to identify strategies that may hold initial promise but would benefit from more robust research. Evidence criteria are also an foundational element of the creation of a compendium of evidence-based implementation strategies, which is a key dissemination approach for delivering evidence to implementers.

At the intervention level, criteria and rubrics are available to synthesize research outcomes and evaluate research quality behind the evidence supporting an intervention and make recommendations about their use, such as Grading of Recommendations Assessment, Development, and Evaluation (GRADE) or that used by the United States Preventative Services Task Force [ 3 , 4 ]. These guidelines often consider different domains of research outcomes and quality, like the health outcomes, the research design, and potential for bias in the outcomes because of the research design. Based on these guides, health institutions, like the Preventative Services Task Force, make recommendations about the best interventions across a wide set of health conditions to assist providers and organizations in making clinical and policy-level decisions. To our knowledge, no equivalent set of criteria for implementation strategies are available. As such, it is difficult to discern the quality of evidence supporting an implementation strategy and whether strategies should be recommended to practitioners to support the implementation of EBIs.

Existing criteria, like GRADE, may serve as a valuable starting point for building criteria applicable to the field of implementation research [ 5 ]. Effectiveness research and associated evaluation criteria, which heavily emphasizes internal validity, considers the highest quality evidence to be from research designs like double-blind randomized control trials. In implementation research, internal validity tends to be more balanced with external validity so that the results are generalizable to target communities. With external validity in mind, implementation research is typically conducted in practice settings and involves assessment of the organizations and providers who will be impacted by the implementation strategy and subsequently the intervention under consideration. As a result, it is often inappropriate, impractical, and/or undesirable to leverage research designs like randomized controlled trials, because it is not possible to blind practitioners to the strategy and/or intervention or randomize at the unit of analysis [ 6 , 7 , 8 ]. These realities make direct application of intervention-level criteria inappropriate—necessitating criteria specific to the field [ 3 ].

HIV and implementation research in the US

We describe our efforts to develop a set of criteria and evaluation process for implementation strategies to address the HIV epidemic in the United States. Improvements in the US HIV epidemic have been modest over the last two decades, with disparities among communities disproportionally affected by HIV increasing [ 9 ]. In an attempt to address HIV incidence, the Centers for Disease Control and Prevention have curated a repository of EBIs to support HIV prevention since the early 2000s and supported dissemination and implementation of a subset of these [ 10 ]. Furthermore, major biomedical advancements, such as pre-exposure prophylaxis (PrEP), have proven to be very effective at preventing HIV. Yet many of these interventions have not been widely implemented with equity to yield their intended benefit. Only an estimated 30% of individuals who would benefit from PrEP receive it, with growing disparities by race, gender, income, citizenship status, and intersectional marginalization [ 11 , 12 , 13 , 14 ]. Uptake and adherence remain suboptimal along the HIV care continuum (i.e., prevention, testing, diagnosis, linkage-to-care, and treatment), indicating, in part, failed implementation and opportunities to develop evidence-informed implementation strategies [ 11 ]. In 2019, the Ending the HIV Epidemic (EHE) Initiative was launched as a coordinated effort among several federal agencies to address HIV-related implementation problems. In alignment with EHE, the National Institutes of Health supported a number of mechanisms and projects to conduct research on implementation strategies [ 15 ]. With the growing mass of HIV-related implementation research has come an equally growing knowledge base of implementation strategies targeting multiple aspects of the HIV care continuum, in a wide scope of settings, evaluating various implementation outcomes [ 16 ].

In an effort to create, synthesize, and disseminate generalizable knowledge, the Implementation Science Coordination Initiative (ISCI) was funded by the National Institutes of Health to provide technical assistance in implementation research funded by the EHE Initiative, coordinate research efforts, synthesize literature through systematic reviews, develop tools to assist researchers, and disseminate research findings to researchers, policymakers, providers, and more [ 17 , 18 ]. As part of this effort, we developed a tool to evaluate the quality of evidence of HIV-related implementation strategies to identify best-practice strategies that can promote effective implementation and uptake of EBIs. The long-term goal of this particular project is to accumulate, warehouse, and disseminate a collection of effective strategies that can be used by HIV practitioners nationwide to support the EHE Initiative.

We conducted the project in three phases: 1) a literature review in tandem with key informant interviews to generate initial criteria for our tool, 2) a modified Delphi to evaluate and revise our initial tool and criteria; 3) a pilot application of our rubric to a set of implementation research studies. Delphi data were collected from March 2022 to June 2023. Piloting occurred in the fall of 2023. Our data collection protocol was reviewed by the Institutional Review Board at Northwestern University and determined to be non-human subjects research. All data collection instruments have been included as a supplemental file (Supplemental File A), and data are available in a de-identified format from the first author on reasonable request. Methods and results are reported according to STROBE reporting guidelines (Supplemental File B).

Key informant interviews and literature review

We first conducted a review of the scientific and grey literature of existing compilations of criteria for assessing EBIs. Google scholar was used to search for tools or criteria published in academic journals. To identify tools within the grey literature, we focused on federal institutions that frequently provide evidence-recommendations such as the US Preventative Task Force, the Centers for Disease Control and Prevention, and Health Services and Resources Administration. We utilized this literature to identify commonalities across tools, to review current debate on the philosophy of science as it relates specifically to implementation science, and to construct an interview guide for key informant experts with questions to elicit information about key differences between implementation research and existing tools. We also used the literature to identify experts who we then recruited for key informant interviews and our Delphi.

We recruited and interviewed a range of experts, including implementation scientists, HIV providers and implementers, representatives from related fields of public health research (e.g., quality improvement), and public health agency officials. All interviews were scheduled in the Spring of 2022, were approximately 30–45 min long, and were conducted by either VM or az. Briefly, the three main questions were: 1. Do you think existing criteria apply to implementation research studies? 2. What are essential indications of generalizability in implementation research? 3. What are ways to evaluate strategies with multiple components? Each question included follow up probes. Interviews were recorded and transcribed via Zoom. Participants were not given an incentive for participation. Two Ph.D.-level researchers with expertise in qualitative and mixed methods research performed an inductive, thematic process of analysis to explore patterns and categorize responses. Based on their responses, we iteratively developed a preliminary tool and criteria.

Modified Delphi

Identification and recruitment of delphi participants.

We conducted an asynchronous, modified eDelphi with participants of similar expertise as our key informants in two rounds. Participants were recruited using snowball recommendations from those that were interviewed as key informants. Our eligibility criteria included fluent English speakers and those working in either HIV services research or those working in implementation research but in another field that may intersect with HIV, for example, mental health, substance misuse, social services, primary care, or women’s health. If participants were unable to complete the survey, an alternative contact could be recommended. After this first invitation, we sent semiweekly reminder emails for six weeks. A $10 gift card was given to participants for completing the first survey, and a $50 gift card was given to participants for completing the second survey.

Data collection and measures

The surveys were implemented using Qualtrics. The surveys were piloted with members of the ISCI research team to ensure question clarity. Each survey took participants approximately 45–75 min to complete.

First-round Delphi instrument

This survey consisted of 71 items. Participants were first introduced to the purpose of the project at large, which was to create a tool and set of criteria on which to evaluate HIV-related implementation science, and then to the specific goals of the Delphi, which was to generate consensus about which aspects of the tool were most important and least important and whether we had included all the elements that participants felt were necessary. The first portion of the survey gathered demographic and basic information about the participant (e.g., age, race, ethnicity, gender), characteristics of the participant’s work (e.g., I work primarily in… select all areas that apply”), as well as the participant’s experience in implementation research (e.g., How would you describe your knowledge level of implementation science?).

The second portion of the survey evaluated proposed domains for the tool (Overall Evidence of Effectiveness, Study Design Quality, Implementation Outcomes, Equity Impact, Strategy Specification, and Bundled Strategies) and corresponding criteria. Participants were asked to agree or disagree (Yes/No) with the adding/dropping/combining of domains; this was followed by an open-ended question asking why they agreed to said addition/dropping/combining (if applicable). This portion also contained two 5-point Likert-type scales asking participants to rank the domains in order from most important to least important. The third portion of the survey was aimed at gaining the participant’s opinion on the specific criteria (e.g., effect size and effect direction for implementation outcomes) within each domain. For each domain, the participant was asked if there were any criteria that needed to be added/dropped (Yes/No), followed by an open-ended question asking why they would like these items added/dropped (if applicable). The participant was then provided a 5-point Likert scale in which they ranked each item from “Very unimportant” to “Very important”. These questions were repeated for all criteria in all domains.

The final portion of the survey introduced the Levels of Evidence (Best Practice Strategy, Promising Strategy, Emerging Strategy, Undetermined Strategies, and Not Recommended Strategy) and their definitions. The participant was asked if there should be any adding/dropping/combining of the evidence levels (Yes/No), followed by an open-ended question asking why they would like these evidence levels to be added/dropped/combined (if applicable).

Second-round Delphi instrument

This survey consisted of 52 items. All participants from Round 1 were recruited for Round 2. Again, participants were reminded of the overall purpose of the project and the specific goal of the Delphi, which was to confirm changes to the tool made in response to the results of Round 1 and receive feedback. The first portion of the survey gathered the same demographic and basic information as in the first round. The second portion consisted of an overview of the updated tool, including definitions of the domains, criteria, and levels of evidence, and asked for feedback on changes made from the Round 1 results. For example, in the first round of the Delphi survey, participants responded that they would like for greater specificity within the criteria of the Study Design domain. As a response, we split this domain into two domains for Round 2: “Study Design” and “Study Rigor and Limitations.” We presented this change to the participant and asked them to agree or disagree with this change (Yes/No); if “No” was selected, this prompted an open-response question asking for further explanation. Lastly, we asked respondents to apply the criteria and give an evidence-level rating to a set of fictional cases of implementation research studies, allowing respondents to comment on the application and rating process.

Data analysis and management

Quantitative data were managed and analyzed in Excel. Quantitative data were analyzed descriptively, primarily as percent agreement or disagreement for domains, evidence levels, and individual criteria within domains. Qualitative data were analyzed in Dedoose software and Excel, using a rapid direct qualitative content analysis approach [ 19 ]. Qualitative data were analyzed by a Ph.D.-level researcher with qualitative research expertise and were intended to confirm or complement quantitative analyses.

Pilot and application to PrEP implementation strategies

To ensure a high-quality process for reviewing literature and consistent application of criteria across the different evidence levels, we piloted and refined the tool with a set of implementation strategies designed to promote the uptake of evidence-based HIV services with members of ISCI, which consists of a mix of faculty, staff, and students holding degrees at the Bachelors, Masters, and PhD levels. VRM led two, hour-long trainings with, four Ph.D.-level members of the ISCI team who were also engaged in systematic reviews of HIV literature on how to apply the criteria. ISCI team members then applied the criteria to an existing set of eight papers reporting on implementation strategies designed to promote PrEP uptake coding a rating for each criteria and domain.  Studies were selected by VRM to represent the full range of evidence ratings and different points of the HIV care continuum (i.e., PrEP delivery, HIV testing, and retention in care for HIV treatment). We calculated agreement as a simple percentage of identical ratings between two coders out of the total number of criteria, domain ratings, and overall rating (40 items). In places where there was high disagreement, the tool was revised and refined to provide better guidance and instruction on how to apply specific criteria. In a final application of the tool, two coders, a Master’s and Ph.D. level member of the ISCI team, applied the criteria to an additional set of 18 implementation strategies designed to improve PrEP uptake and use identified through an existing systematic review [ 20 ] after a single hour-long training.

We report the primary results from each stage of our process as well as significant changes to the tool made at each stage.

Literature review and key informant interviews

Our initial literature review yielded several existing rubrics, tools, criteria and processes for evaluating evidence supporting a specific intervention identified primarily in the grey literature developed by large institutions responsible for disseminating evidence-based interventions [ 5 , 21 ]. Many had a similar structure of grouping criteria by domain (e.g., aspects of the research design or strength of the outcomes) and having different evidence ratings or levels (e.g., low, medium, high evidence strength). For example, the Centers for Disease Control and Prevention National Center for Injury Prevention Guide to the Continuum of Evidence of Effectiveness outlines six domains (i.e., effect, internal validity, type of evidence/research design, independent replication, implementation guidance, external and ecological validity) and seven evidence levels (harmful, unsupported, undetermined, emerging, promising direction, supported, and well supported) while the US Preventative Task Force has six domains or “factors” considered when generating an evidence level and five evidence levels or “grades” [ 21 , 22 , 23 ]. Our literature review also yielded several articles relevant to the generalization of evidence generated from implementation trials. For example, key considerations on whether effects are likely to be transferable from one context to another, balancing internal and external validity, and the need to consider equity in impact [ 21 , 24 , 25 , 26 ].

We conducted a total of 10 interviews representing a mix of expertise in HIV services research, implementation research, and quality improvement research. Informants reflected on different potential domains (e.g., elements of the research design) and listed specific ways that they felt research and evidence quality differed in implementation research from clinical trials. Among factors highlighted were a need to consider the context and specification of strategies, criteria specific to implementation outcomes, and consideration of the equity impact of implementation strategies on the health outcome under consideration. Again, existing implementation science literature helped support and define domains like Proctor's recommendations for strategy specification to ensure that strategies are appropriately described as well as Proctor's implementation outcomes to define and describe implementation outcomes [ 1 , 48 ].

Based on these collective results, conceptually, we modeled our initial tool by grouping criteria according to domain and having a series of evidence levels similar to many tools and criteria that we reviewed. We also worked to integrate current thinking and perspectives on implementation science, evidence, and generalizability into our tool from both the literature and key informant interviews. Briefly, we structured our initial tool along six domains: overall effectiveness, study design quality, implementation outcomes, equity impact, strategy specification, and a bundled strategies domain. Each domain included a set of criteria. For example, criteria for the implementation outcomes domain included operationalization of implementation outcomes; validity and reliability of measure used; significance and direction of effect for quantitative outcomes; and reported effects as beneficial, neutral, or harmful. We also developed and defined five evidence levels with associated recommendations: best practice strategy, promising strategy, emerging strategy, undetermined strategy, non-recommended strategy. As an example, promising strategies were described as demonstrating mostly positive outcomes that may need more rigorous examination to ensure they are having the intended effect or are generalizable to a wider context. Practitioners would be recommended to take caution when using a promising strategy in practice and ensure it is having a similar outcome as demonstrated in the original research.

For the Delphi Round 1, we recruited from a pool of 68 experts. Two individuals responded stating their inability to participate, with one participant suggesting a replacement. Forty-one participants completed the survey, and two participants partially completed the survey for a total of 43 participants (63% response rate). For the Delphi Round 2, we recruited among the responders from Round 1 with no refusals to participate and no partial responses. Thirty participants in total completed the Round 2 survey (70% response rate). Respondent characteristics are provided in Table  1 for both Delphi Rounds. Briefly, one half of Respondents in both rounds self-identified as women (55.8%; 50% in rounds 1 and 2 respectively), with the majority white (83.7%; 80%) and not Hispanic or Latino (86%; 100%). Most respondents worked in academic settings (81.4%; 80%), with most working in HIV in round 1 but not round 2 (83.7%; 36.7% respectively). The highest number respondents had 11–20 years of experience in their area of expertise (44.2%; 43.3% respectively), and three quarters reported experience with leading implementation research projects (76.7%; 73.3%). Both complete and partially complete responses were included in subsequent analyses.

Delphi round 1

Table 2 presents the quantitative outcomes regarding whether the participant believed that domains should be added, dropped, or combined. More than half (58%) of participants thought no new domains should be added, while 44% of participants thought domains should be dropped or combined. When examining the evidence levels, 79% of individuals felt that no additional evidence levels were needed, while 47% thought one or more of the evidence levels could be dropped or combined.

Table 3 summarizes open-ended responses with example quotes for domains and evidence levels that were commented on most often. When reviewing the qualitative responses of those who indicated a domain should be added, most respondents suggested adding specific criteria or wanted greater clarity in how the domains and criteria within domains were defined. For example, regarding the equity domain, individuals desired greater clarity, operationalization, and description of how equity is being considered and evaluated. Of these, four sought greater clarity of equity-related outcomes, and six recommended inclusion of equity metrics or different ways of operationalizing equity. Three participants felt equity should be examined in combination with implementation outcomes. Three suggested greater consideration of community partnership development and inclusion of the target population in the development of the strategy or design of a study. Finally, participants recommended combining promising, emerging, and/or undetermined as levels of evidence and better specifying and operationalizing the levels.

Briefly, we revised the structure of our tool along five domains: study design, implementation outcomes, study rigor and limitations, strategy specification, and equity impact. These domains each included a revised set of criteria. For example, based on the recommended additions to the study design and rigor domain, we split this domain into two domains: 1) study design; and 2) study rigor and limitations. We considered several of the comments on dropping equity but ultimately opted to keep this domain, relax the criteria, and heavily refine the description. Other cross-cutting changes included combining the criteria for bundled strategies and strategy specification. We combined two of the evidence levels (emerging and undetermined) and revised the definitions to include: best practice, promising practice, needs more evidence, and harmful.

Delphi round 2

For the second round of the Delphi, we asked respondents to confirm major changes to the tool based on the first round of the Delphi (Table  2 ), and have respondents evaluate our proposed process for applying the criteria. Most respondents agreed with changes to the domains and evidence levels although there remained some commentary on the equity domain. When examining the open-ended responses among those disagreeing with the changes to the equity domain, we grouped responses into individuals that did not agree with the domain (i.e., a hard no to the revisions) and others who still had additional suggestions for the domain but approved of the domain overall (i.e., a soft no with suggested revisions; Table  3 ). Based on these responses, we finalized the domains and made several additional adjustments to the definitions of equity including defining which target populations can be considered in determining whether the strategy has a positive equity impact or not. Finally, we revised our process for applying the rubric based on the recommendation to apply the criteria across each domain in addition to giving an overall rating. While this did increase time in the review process, this change allowed us to still provide information on how strategies rate across all domains, enabling researchers and practitioners to compare how strategies rate on different domains or select a strategy that is strong in a specific domain, like equity supporting for example.

Pilot application to PrEP implementation strategies

To ensure a consistent, high-quality process for applying criteria to research studies examining implementation strategies, we initially piloted the rubric with eight existing studies on implementation strategies to promote the uptake of evidence-based HIV services including PrEP, HIV testing, and retention in care [ 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 ]. At the conclusion, we were able to achieve 90% reliable application of the criteria, resulting in dropping some criteria and clarifying other criteria and their application. Two members of the ISCI team then applied the rubric to a set of 18 implementation strategies identified through an ongoing systematic review designed to promote uptake of PrEP in a second pilot application, achieving 98% reliability and taking approximately 15–30 min per article.

Among the 18 strategy studies, summarized in Table  4 , one was assigned an overall rating as Best Practice and the remaining were assigned as Needs More Evidence. The primary domains where strategies failed to exceed the Needs More Evidence criteria were in Research Design as well as Study Rigor and Limitations. This was largely because these studies only utilized post-implementation assessment, were intended as pilot or feasibility studies, or were conducted only at a single site. Given the early state of the implementation research related to PrEP implementation in the US, we felt that this mix of ratings was relatively appropriate. While the domains that have parallels in other rating systems resulted in relatively low ratings among our studies, we observed a good mix of ratings on domains unique to our tool and implementation research (i.e., strategy specification and equity) at the Best, Promising, and Needs More Evidence levels, suggesting these domains are sufficiently discerning among the existing set of studies.

A summary of major changes to the rubric and criteria are summarized in Table  5 . The final domains and evidence-levels are provided in Table  6 , and a summary of the criteria by domain at each evidence level is provided in Table  7 . The final tool with domains, criteria, evidence levels, and application instructions are available as a supplement (Supplemental file C).

To our knowledge, this is the first set of criteria to evaluate evidence for implementation strategies and serve as a basis for recommendations to practitioners. Our Best Practice tool was initially informed by existing criteria and interviews, refined by a Delphi, and then piloted with implementation strategies. This process yielded a rating scale (i.e., best, promising, needs more evidence, and harmful) and domains (e.g., study design, implementation outcomes, rigor and limitations), which are common to other tools and rubrics. Yet, implementation research’s system-level focus required tailoring to our rubric for some domains, like study design and outcomes, and the development of entirely new domains, specifically strategy specification and equity. To help define the criteria for the domains, we used results from key informant interviews and existing implementation science literature to help ensure appropriateness for the field [ 1 , 6 , 48 ]. As a specific example of tailoring, we have outlined criteria for the research design domain that considers the realities of where implementation research is conducted and does not require blinding or randomization for strategies to be considered the highest rating. While these helped provide structure and specific criteria at each of the evidence levels, in conducting the pilot we noted missing information which sometimes made it difficult to evaluate the research. We recommend using Standards for Reporting Implementation Studies (StaRI) guidelines as well as Proctor’s recommendations for strategy specification when reporting implementation research to help ensure the needed details to evaluate the research are reported and available for potential practitioners to understand what resources and efforts are needed for implementation strategies [ 1 , 49 ].

In addition to being a new resource for implementation science, to our knowledge this is also the first evidence rating criteria that considers the potential to improve equity in a health issue. Because implementation science directly impacts communities with the potential to improve or exacerbate inequities, HIV included, experts reiterated that equity was a critical domain to include. However, our work among participants, who primarily identified as white and non-Latin, demonstrates a lack of consensus in the implementation science field about what equity in implementation science means. We also encourage continued discussion within the implementation science community that includes diverse perspectives to help foster consensus and bring additional attention to this problem.

For the Best Practices Tool, the criteria within the Equity domain emphasizes community engagement in the research process, a research focus on populations experiencing inequities, as well as equity in outcomes for the Best Practice evidence level rating. as a means to These criteria encourage attention to and improvement in HIV-related inequities as many in the field have advocated [ 50 , 51 , 52 ] with additional, more relaxed criteria for lower evidence ratings. However, we recognize that no single implementation strategy (or intervention) is going to adequately address the deeply rooted structural determinants, like racism and homophobia, which keep inequities entrenched. Implementers who are interested in utilizing strategies may wish to consider additional factors that are relevant to their specific contexts, like whether communities they serve are reflected in the strategies they are considering or whether the strategy responds to the determinants driving inequity in their context. However, it is our hope that by including equity improvement as criteria to be considered the highest quality research, we can bring additional attention to and encourage equity in HIV outcomes in the US.

Our tool and criteria are designed to discern among studies for which there is best evidence specific to HIV implementation strategies in the US, which is a rapidly growing field, rather than having an absolute threshold of effectiveness that studies must meet. There are other heath areas, such as cancer and global HIV implementation research, for which there are more studies leveraging more rigorous research designs to evaluate implementation strategies [ 53 , 54 ]. If applied in these areas, it may be more appropriate to have more stringent criteria to adequately discern among studies for which there is relatively good evidence compared to those which would need additional study. We encourage others who may consider using this tool in their area of implementation science to consider adapting the specific criteria within each of the domains and at each of the evidence-levels to ensure that it appropriately discerns among available studies before routine application. Continuing with the example of more rigorous research designs, it may be appropriate to require better replication of results or more diverse settings than we have incorporated into our specific criteria. However, we would suggest that the overall structure of the tool, specifically the domains and recommendation levels could remain the same regardless of the health field. Conversely, we received many suggestions for more stringent criteria that participants felt like should be included that we were not able to include because it would have resulted in few-to-no strategies identified as best practice. US focused HIV implementation science is still in its adolescence, with many pilots and full-fledged trials underway but not yet published. It is our hope that in the future, we will be able to include more stringent criteria within the rubric so that the needed evidence quality improves over time within HIV implementation research.

There are some notable limitations to the processes used to develop the Best Practice Tool and the criteria themselves. We used a modified eDelphi approach to develop the rubric and criteria with some loss to follow up from the first to the second round of the Delphi, particularly among HIV service providers which may mean the results are not sufficiently representative of this context. However, we did retain many individuals working in settings where HIV services intersect, like substance misuse, mental health, and social services. Our use of a modified Delphi method did not result in consensus, but instead resulted in an approximation of consensus. In addition, we were not able to elicit the opinions about the appropriateness of the tool from the perspective of front-line HIV service implementers on balance with those of the research community. We hope to address this in future iterations of this work.

We envision several future directions for this tool with implications for both researchers and practitioners that will advance the goals of ISCI and support the EHE Initiative. Systematic reviews of HIV-related implementation strategies are currently underway through ISCI [ 55 ]. The next phase will entail applying these criteria to implementation strategies identified through these reviews and developing a compendium of strategies. We recognize that a rating and recommendation is not sufficient to support uptake, and we also have a complementary dissemination effort underway to provide the needed information and materials for wide adoption and scale up which will be available on the ISCI website [ 18 ]. Our criteria and rating system will also yield benefits for researchers conducting HIV implementation research. Through our efforts, we will also identify strategies that hold promise but would benefit from additional research and additional evidence supporting their effectiveness. Researchers can also use these criteria in designing studies of new strategies so that they can score better on these criteria.

For practitioners to fully benefit from research developing and testing implementation strategies targeting HIV services, clear evaluation criteria and recommendations are needed to assess which strategies are the most likely to have benefit and impact. We developed domains and criteria appropriate to evaluate evidence quality in HIV-related implementation strategies. This rubric includes recommendations for practitioners about strategies for which there is best evidence and recommendations for research about strategies for which more evidence is needed. Establishing criteria to evaluate implementation strategies advances implementation science by filling a much-needed gap in HIV implementation research which can be extended to other areas of implementation science.

Availability of data and materials

The Delphi dataset generated during the current study available from the corresponding author on reasonable request.

Proctor EK, Powell BJ, McMillen JC. Implementation strategies: recommendations for specifying and reporting. Implement Sci. 2013;8(1):139.

Article   PubMed   PubMed Central   Google Scholar  

Powell BJ, Fernandez ME, Williams NJ, Aarons GA, Beidas RS, Lewis CC, et al. Enhancing the impact of implementation strategies in healthcare: a research agenda. Front Public Health. 2019;7:3.

Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008;336(7650):924–6.

Sawaya GF, Guirguis-Blake J, LeFevre M, Harris R, Petitti D. for the U.S. Preventive services task force. Update on the methods of the U.S. Preventive services task force: estimating certainty and magnitude of net benefit. Ann Intern Med. 2007;147(12):871.

Article   PubMed   Google Scholar  

Terracciano L, Brozek J, Compalati E, Schünemann H. GRADE system: new paradigm. Current opinion in allergy and clinical immunology. 2010;10(4):377–83.

Kilbourne A, Chinman M, Rogal S, Almirall D. Adaptive designs in implementation science and practice: their promise and the need for greater understanding and improved communication. Annu Rev Public Health. 2024;45(1):69.

Lamont T, Barber N, de Pury J, Fulop N, Garfield-Birkbeck S, Lilford R, et al. New approaches to evaluating complex health and care systems. BMJ. 2016;352:i154.

Schliep ME, Alonzo CN, Morris MA. Beyond RCTs: Innovations in research design and methods to advance implementation science. Evid-Based Commun Assess Interv. 2017;11(3–4):82–98.

Google Scholar  

CDC. The State of the HIV Epidemic in the U.S. | Fact Sheets | Newsroom. 2022. https://www.cdc.gov/nchhstp/newsroom/fact-sheets/hiv/state-of-the-hiv-epidemic-factsheet.html . Accessed 10 Oct 2022.

CDC. Compendium | Intervention Research | Research | HIV. 2022. https://www.cdc.gov/hiv/research/interventionresearch/compendium/index.html . Accessed 10 Oct 2023.

CDC. Volume 28 Number 4| HIV Surveillance | Reports | Resource Library | HIV/AIDS. 2023. https://www.cdc.gov/hiv/library/reports/hiv-surveillance/vol-28-no-4/index.html . Accessed 30 Nov 2023.

Zamantakis A, Li DH, Benbow N, Smith JD, Mustanski B. Determinants of Pre-exposure Prophylaxis (PrEP) implementation in transgender populations: a qualitative scoping review. AIDS Behav. 2023;27(5):1600–18.

Brooks RA, Landrian A, Lazalde G, Galvan FH, Liu H, Chen YT. Predictors of awareness, accessibility and acceptability of Pre-exposure Prophylaxis (PrEP) among English- and Spanish-speaking Latino men who have sex with Men in Los Angeles California. J Immigr Minor Health. 2020;22(4):708–16.

Namara D, Xie H, Miller D, Veloso D, McFarland W. Awareness and uptake of pre-exposure prophylaxis for HIV among low-income, HIV-negative heterosexuals in San Francisco. Int J STD AIDS. 2021;32(8):704–9.

Article   CAS   PubMed   Google Scholar  

Glenshaw MT, Gaist P, Wilson A, Cregg RC, Holtz TH, Goodenow MM. Role of NIH in the ending the HIV epidemic in the US initiative: research improving practice. JAIDS J Acquir Immune Defic Syndr. 2022;90(S1):S9.

Queiroz A, Mongrella M, Keiser B, Li DH, Benbow N, Mustanski B. Profile of the portfolio of NIH-Funded HIV implementation research projects to inform ending the HIV epidemic strategies. JAIDS J Acquir Immune Defic Syndr. 2022;90(S1):S23.

Mustanski B, Smith JD, Keiser B, Li DH, Benbow N. Supporting the growth of domestic HIV implementation research in the United States through coordination, consultation, and collaboration: how we got here and where we are headed. JAIDS J Acquir Immune Defic Syndr. 2022;90(S1):S1.

HIV Implementation Science Coordination Initiative. https://hivimpsci.northwestern.edu/ . Accessed 21 Oct 2023.

Hsieh HF, Shannon SE. Three approaches to qualitative content analysis. Qual Health Res. 2005;15(9):1277–88.

Merle JL, Benbow N, Li DH, Zapata JP, Queiroz A, Zamantakis A, McKay V, Keiser B, Villamar JA, Mustanski B, Smith JD. Improving Delivery and Use of HIV Pre-Exposure Prophylaxis in the US: A Systematic Review of Implementation Strategies and Adjunctive Interventions. AIDS Behav. 2024;28(7):2321–39.

Frieden TR, Degutis LC, Mercy JA, Puddy RW, Wilkins N. Understanding Evidence. https://www.cdc.gov/violenceprevention/pdf/understanding_evidence-a.pdf . Accessed 30 Nov 2023.

United States Preventive Services Taskforce. Methods and Processes. https://www.uspreventiveservicestaskforce.org/uspstf/about-uspstf/methods-and-processes/procedure-manual/procedure-manual-section-6-methods-arriving-recommendatio . Accessed 24 April 2024.

United States Preventive Services Taskforce. Grade Definitions. https://www.uspreventiveservicestaskforce.org/uspstf/about-uspstf/methods-and-processes/grade-definitions . Accessed 24 April 2024.

Schloemer T, Schröder-Bäck P. Criteria for evaluating transferability of health interventions: a systematic review and thematic synthesis. Implement Sci. 2018;13(1):88.

Geng EH, Peiris D, Kruk ME. Implementation science: Relevance in the real world without sacrificing rigor. Plos Med. 2017;14(4):e1002288.

Lifsey S, Cash A, Anthony J, Mathis S, Silva S. Building the evidence base for population-level interventions: barriers and opportunities. Health Educ Behav Off Publ Soc Public Health Educ. 2015;42(1 Suppl):133S-140S.

Burns PA, Omondi AA, Monger M, Ward L, Washington R, Sims Gomillia CE, et al. Meet me where i am: an evaluation of an HIV patient navigation intervention to increase uptake of PrEP among black men who have sex with men in the deep south. J Racial Ethn Health Disparities. 2022;9(1):103–16.

Clement ME, Johnston BE, Eagle C, Taylor D, Rosengren AL, Goldstein BA, et al. Advancing the HIV pre-exposure prophylaxis continuum: a collaboration between a public health department and a federally qualified health center in the Southern United States. AIDS Patient Care STDs. 2019;33(8):366–71.

Brant AR, Dhillon P, Hull S, Coleman M, Ye PP, Lotke PS, et al. Integrating HIV pre-exposure prophylaxis into family planning care: a RE-AIM framework evaluation. AIDS Patient Care STDs. 2020;34(6):259–66.

Chen A, Dowdy DW. Clinical effectiveness and cost-effectiveness of HIV pre-exposure prophylaxis in men who have sex with men: risk calculators for real-world decision-making. Plos One. 2014;9(10):e108742.

Cunningham WE, Ford CL, Kinsler JJ, Seiden D, Andrews L, Nakazono T, et al. Effects of a laboratory health information exchange intervention on antiretroviral therapy use, viral suppression and racial/ethnic disparities. J Acquir Immune Defic Syndr 1999. 2017;75(3):290–8.

Article   Google Scholar  

Havens JP, Scarsi KK, Sayles H, Klepser DG, Swindells S, Bares SH. Acceptability and feasibility of a pharmacist-led human immunodeficiency virus pre-exposure prophylaxis program in the Midwestern United States. Open Forum Infect Dis. 2019;6(10):ofz365.

Horack CL, Newton SL, Vos M, Wolfe BA, Whitaker A. Pre-exposure prophylaxis in a reproductive health setting: a quality improvement project. Health Promot Pract. 2020;21(5):687–9.

Ezeanolue EE, Obiefune MC, Ezeanolue CO, Ehiri JE, Osuji A, Ogidi AG, et al. Effect of a congregation-based intervention on uptake of HIV testing and linkage to care in pregnant women in Nigeria (Baby Shower): a cluster randomised trial. Lancet Glob Health. 2015;3(11):e692-700.

Buchbinder SP, Havlir DV. Getting to zero San Francisco: a collective impact approach. J Acquir Immune Defic Syndr. 2019;82 Suppl 3(Suppl 3):S176-82.

Bunting SR, Saqueton R, Batteson TJ. A guide for designing student-led, interprofessional community education initiatives about HIV risk and pre-exposure prophylaxis. MedEdPORTAL J Teach Learn Resour. 2019;18(15):10818.

Bunting SR, Saqueton R, Batteson TJ. Using a student-led, community-specific training module to increase PrEP uptake amongst at-risk populations: results from an exploratory pilot implementation. AIDS Care. 2020;32(5):546–50.

Coleman M, Hodges A, Henn S, Lambert CC. Integrated pharmacy and PrEP navigation services to support PrEP uptake: a quality improvement project. J Assoc Nurses AIDS Care JANAC. 2020;31(6):685–92.

Gregg E, Linn C, Nace E, Gelberg L, Cowan B, Fulcher JA. Implementation of HIV preexposure prophylaxis in a homeless primary care setting at the Veterans affairs. J Prim Care Community Health. 2020;11:2150132720908370.

Hoth AB, Shafer C, Dillon DB, Mayer R, Walton G, Ohl ME. Iowa TelePrEP: a public-health-partnered telehealth model for human immunodeficiency virus preexposure prophylaxis delivery in a Rural State. Sex Transm Dis. 2019;46(8):507–12.

Khosropour CM, Backus KV, Means AR, Beauchamps L, Johnson K, Golden MR, et al. A pharmacist-led, same-day, HIV pre-exposure prophylaxis initiation program to increase PrEP uptake and decrease time to PrEP initiation. AIDS Patient Care STDs. 2020;34(1):1–6.

Lopez MI, Cocohoba J, Cohen SE, Trainor N, Levy MM, Dong BJ. Implementation of pre-exposure prophylaxis at a community pharmacy through a collaborative practice agreement with San Francisco Department of Public Health. J Am Pharm Assoc JAPhA. 2020;60(1):138–44.

Pathela P, Jamison K, Blank S, Daskalakis D, Hedberg T, Borges C. The HIV Pre-exposure Prophylaxis (PrEP) Cascade at NYC sexual health clinics: navigation is the key to uptake. J Acquir Immune Defic Syndr 1999. 2020;83(4):357–64.

Roth AM, Tran NK, Felsher M, Gadegbeku AB, Piecara B, Fox R, et al. Integrating HIV preexposure prophylaxis with community-based syringe services for women who inject drugs: results from the project SHE demonstration study. J Acquir Immune Defic Syndr. 2021;86(3):e61-70.

Saberi P, Berrean B, Thomas S, Gandhi M, Scott H. A simple Pre-Exposure Prophylaxis (PrEP) optimization intervention for health care providers prescribing PrEP: pilot study. JMIR Form Res. 2018;2(1):e2.

Tung EL, Thomas A, Eichner A, Shalit P. Implementation of a community pharmacy-based pre-exposure prophylaxis service: a novel model for pre-exposure prophylaxis care. Sex Health. 2018;15(6):556–61.

Wood BR, Mann MS, Martinez-Paz N, Unruh KT, Annese M, Spach DH, et al. Project ECHO: telementoring to educate and support prescribing of HIV pre-exposure prophylaxis by community medical providers. Sex Health. 2018;15(6):601–5.

Proctor E, Silmere H, Raghavan R, Hovmand P, Aarons G, Bunger A, et al. Outcomes for implementation research: conceptual distinctions, measurement challenges, and research agenda. Adm Policy Ment Health. 2011;38(2):65–76.

Pinnock H, Barwick M, Carpenter CR, Eldridge S, Grandes G, Griffiths CJ, Rycroft-Malone J, Meissner P, Murray E, Patel A, Sheikh A. Standards for reporting implementation studies (StaRI) statement. bmj. 2017; 356.

Brownson RC, Kumanyika SK, Kreuter MW, Haire-Joshu D. Implementation science should give higher priority to health equity. Implement Sci. 2021;16(1):28.

Shelton RC, Adsul P, Oh A, Moise N, Griffith DM. Application of an antiracism lens in the field of implementation science (IS): Recommendations for reframing implementation research with a focus on justice and racial equity. Implement Res Pract. 2021;1(2):26334895211049480.

Baumann AA, Shelton RC, Kumanyika S, Haire‐Joshu D. Advancing healthcare equity through dissemination and implementation science. Health services research. 2023;58:327–44. Accessed 19 Feb 2024.

Neta G, Sanchez MA, Chambers DA, Phillips SM, Leyva B, Cynkin L, et al. Implementation science in cancer prevention and control: a decade of grant funding by the National Cancer Institute and future directions. Implement Sci. 2015;10(1):4.

Hwang S, Birken SA, Melvin CL, Rohweder CL, Smith JD. Designs and methods for implementation research: advancing the mission of the CTSA program. J Clin Transl Sci. 2020;4(3):159–67.

Merle JL, Li D, Keiser B, Zamantakis A, Queiroz A, Gallo CG, et al. Categorising implementation determinants and strategies within the US HIV implementation literature: a systematic review protocol. BMJ Open. 2023;13(3):e070216.

Download references

Acknowledgements

We would like to acknowledge members of the ISCI leadership team and Melissa Mongrella who developed the survey instruments within REDCap.

This work was supported by an Ending the HIV Epidemic supplement to the Third​ Coast Center for AIDS Research, an NIH funded center (P30 AI117943). Author az’s time was supported by a training grant from the NIMH (T32MH30325). Author JLM’s time was supported by a post-doctoral training grant from the National Library of Medicine (2 T15 LM 007124–26).

Author information

Authors and affiliations.

Center for Public Health Systems Science, Brown School, Washington University in St. Louis, St. Louis, MO, USA

Virginia R. McKay & McKenzie Swan

Department of Medical Social Sciences, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA

Alithia Zamantakis, Brian Mustanski & Lisa R. Hirschhorn

Institute for Sexual and Gender Minority Health and Wellbeing, Northwestern University, Chicago, IL, USA

Alithia Zamantakis, Ana Michaela Pachicano, Morgan R. Purrier, Dennis H. Li, Brian Mustanski, Justin D. Smith & Nanette Benbow

Department of Population Health Sciences, Division of Health System Innovation and Research, Spencer Fox Eccles School of Medicine, The University of Utah, Salt Lake City, UT, USA

James L. Merle

Center for Prevention Implementation Methodology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA

Dennis H. Li, Brian Mustanski & Nanette Benbow

Department of Psychiatry and Behavioral Sciences, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA

Center for Dissemination and Implementation Science, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA

Dennis H. Li

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to the conceptualization of this project and manuscript. VM and AMP were responsible for drafts on Delphi sections of this manuscript. Az was responsible for qualitative portions of the manuscript. BM, NB, DL, JM, and JS supported research conceptualization and research design for the project. LH, MS, and MP were responsible for drafting the introduction and discussion portions of the manuscript. All authors reviewed, revised, and provided feedback on later drafts of the manuscript.

Corresponding author

Correspondence to Virginia R. McKay .

Ethics declarations

Ethics approval and consent to participate.

The protocols and data collection were determined to be non-human subjects research by Northwestern University’s Institutional Review Board.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., supplementary material 2., supplementary material 3., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

McKay, V.R., Zamantakis, A., Pachicano, A.M. et al. Establishing evidence criteria for implementation strategies in the US: a Delphi study for HIV services. Implementation Sci 19 , 50 (2024). https://doi.org/10.1186/s13012-024-01379-3

Download citation

Received : 21 February 2024

Accepted : 28 June 2024

Published : 15 July 2024

DOI : https://doi.org/10.1186/s13012-024-01379-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Evidence-based intervention
  • Implementation science

Implementation Science

ISSN: 1748-5908

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

literature review for data science

COMMENTS

  1. Data Science: Literature Review & State of Art

    However, a review of the literature shows that for the first time the definition of Data Science appears in the book Concise Survey of Computer Methods [20], where the author defines it as "Data ...

  2. A systematic literature review of data science, data analytics and

    The objective of this paper is to assess and synthesize the published literature related to the application of data analytics, big data, data mining and machine learning to healthcare engineering systems.,A systematic literature review (SLR) was conducted to obtain the most relevant papers related to the research study from three different ...

  3. Data Science for Industry 4.0: A Literature Review on Open Design

    This paper presents a literature review of Data Science for Industry 4.0, and how Open Design approaches compare to existing alternatives in industry and engineering. 2. Research Methodology The used method to extract information about the study subject was the Systematic Literature Review (SLR) which is a process that enables researchers to ...

  4. The role of data science in healthcare advancements: applications

    A non-systematic review of all data science, big data in healthcare-related English language literature published in the last decade (2010-2020) was conducted in November 2020 using MEDLINE, Scopus, EMBASE, and Google Scholar. Our search strategy involved creating a search string based on a combination of keywords.

  5. A practical guide to data analysis in general literature reviews

    This article is a practical guide to conducting data analysis in general literature reviews. The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.

  6. Data Scientist: A Systematic Review of the Literature

    After running the literature review, the first conclusion points out the existence of a tendency of few articles published where the work profile and the career profile of a Data Scientist is established. ... The ambiguity of data science team roles and the need for a data science workforce framework, pp. 2355-2361. IEEE (2017). http ...

  7. PDF A Review of Data Science: Trends, Techniques and Applications

    Ankit Kumar Taneja, Devbrat Gupta. a Institute of, Engineering Technology & ManagementAbstract:Data Science stands at the forefront of the technological revolution, serving as a catalyst for knowledgeable. choice-making and innovation throughout diverse domain names. This complete evaluation paper explores the multifaceted landscape of Data ...

  8. How Data Scientists Review the Scholarly Literature

    In this paper, we examine the literature review practices of data scientists. Data science represents a field seeing an exponential rise in papers, and increasingly drawing on and being applied in numerous diverse disciplines. ... Practitioners Teaching Data Science in Industry and Academia: Expectations, Workflows, and Challenges. In ...

  9. How to Write a Literature Review

    Examples of literature reviews. Step 1 - Search for relevant literature. Step 2 - Evaluate and select sources. Step 3 - Identify themes, debates, and gaps. Step 4 - Outline your literature review's structure. Step 5 - Write your literature review.

  10. A systematic literature review towards a conceptual ...

    There is dearth of studies on comprehensive 'Data Science' adoption as an umbrella constituting all of its components. The study conducts a "Systematic Literature Review (SLR)" on enablers and barriers affecting the implementation and success of DSS in enterprises. The SLR comprised of 113 published articles during the period 1998 and 2021.

  11. Data science ethical considerations: a systematic literature review and

    Data science, and the related field of big data, is an emerging discipline involving the analysis of data to solve problems and develop insights. This rapidly growing domain promises many benefits to both consumers and businesses. However, the use of big data analytics can also introduce many ethical concerns, stemming from, for example, the possible loss of privacy or the harming of a sub ...

  12. Data science pedagogical tools and practices: A systematic literature

    The development of data science curricula has gained attention in academia and industry. Yet, less is known about the pedagogical practices and tools employed in data science education. Through a systematic literature review, we summarize prior pedagogical practices and tools used in data science initiatives at the higher education level. Following the Technological Pedagogical Content ...

  13. Data science ethical considerations: a systematic literature review and

    Data science, and the related field of big data, is an emerging discipline involving the analysis of data to solve problems and develop insights. This rapidly growing domain promises many benefits to both consumers and businesses. ... this paper maps and describes the main ethical themes that were identified via systematic literature review. It ...

  14. [2301.03774] How Data Scientists Review the Scholarly Literature

    How Data Scientists Review the Scholarly Literature. Sheshera Mysore, Mahmood Jasim, Haoru Song, Sarah Akbar, Andre Kenneth Chase Randall, Narges Mahyar. Keeping up with the research literature plays an important role in the workflow of scientists - allowing them to understand a field, formulate the problems they focus on, and develop the ...

  15. Current approaches for executing big data science projects—a systematic

    That literature review covered 207 peer-reviewed and 'grey' publications and identified four adaptation patters and two recurrent purposes for adaptation. ... 2020). An extensive critical review over 19 data science methodologies is presented in Martinez, Viles & Olaizola (2021). The paper also proposed principles of an integral methodology ...

  16. The Birth of a New Discipline: Data Science Education

    3.2. Data Collection. As mentioned, a systematic literature review is a common academic research method and several guidelines have been published for its implementation, for example, Kitchenham (2004), Nightingale, (2009), and Torres-Carrión et al. (2018).These guidelines, however, were written with manual systematic reviews in mind, and do not consider algorithmic systematic reviews.

  17. Data Science Methodologies: Current Challenges and ...

    The presented critical literature review was carried out based on the preceding concepts and through a comparison of literature on data science project management. The main reason to go for a critical review rather than a systematic review was that information regarding the use of data science methodologies is scattered among different sources ...

  18. Literature review as a research methodology: An ...

    As mentioned previously, there are a number of existing guidelines for literature reviews. Depending on the methodology needed to achieve the purpose of the review, all types can be helpful and appropriate to reach a specific goal (for examples, please see Table 1).These approaches can be qualitative, quantitative, or have a mixed design depending on the phase of the review.

  19. 69901 PDFs

    Vanessa Anne Borja. Leila Camille Cervera. This paper presents a literature review and bibliometric analysis of the Scopus-indexed documents in the field of Data Science. From the 374 extracted ...

  20. Current approaches for executing big data science projects—a systematic

    The goal of the review was to identify (1) the key themes, with respect to current research on how teams execute data science projects, (2) the most common approaches regarding how data science projects are organized, managed and coordinated, (3) the activities involved in a data science projects life cycle, and (4) the implications for future ...

  21. Data Science Methods and Tools for Industry 4.0: A Systematic

    This article presented a systematic literature review focused on Industry 4.0, data science, and time series. This work investigated the usage of data science methods and software tools in several industrial segments, taking into account the implementation of time series and the data quality employed by the authors.

  22. Ten Simple Rules for Writing a Literature Review

    Literature reviews are in great demand in most scientific fields. Their need stems from the ever-increasing output of scientific publications .For example, compared to 1991, in 2008 three, eight, and forty times more papers were indexed in Web of Science on malaria, obesity, and biodiversity, respectively .Given such mountains of papers, scientists cannot be expected to examine in detail every ...

  23. Data Science: a literature review

    In fact, the entire presentation servers as a literature review for the birth of "Data Science" as a concept, and would make excellent fodder for the "Data Science" page on Wikipedia which, sadly, is still a blank page. One thing that seems certain: Data Science is here to stay.

  24. Semantic-Aware Representation of Multi-Modal Data for Data Ingress: A

    Machine Learning (ML) is continuously permeating a growing amount of application domains. Generative AI such as Large Language Models (LLMs) also sees broad adoption to process multi-modal data such as text, images, audio, and video. While the trend is to use ever-larger datasets for training, managing this data efficiently has become a significant practical challenge in the industry-double as ...

  25. (PDF) Data Analytics: A Literature Review Paper

    This paper aims to analyze some. of the different analytics metho ds and tools which can be applied to big data, as. well as the opportunities provided by the application of big data a nalytics in ...

  26. Integrating multicriteria decision making and principal component

    Abstract. Decision-support methods are crucial for analyzing complex alternatives and criteria in today's data-driven world. This Systematic Literature Review (SLR) explores and synthesizes knowledge about decision support methodologies that integrate Multicriteria Decision Making (MCDM) and Principal Component Analysis (PCA), an unsupervised Machine Learning (ML) technique.

  27. Comprehensive Review and Empirical Evaluation of Causal Discovery

    Causal analysis has become an essential component in understanding the underlying causes of phenomena across various fields. Despite its significance, the existing literature on causal discovery algorithms is fragmented, with inconsistent methodologies and a lack of comprehensive evaluations. This study addresses these gaps by conducting an exhaustive review and empirical evaluation of causal ...

  28. Establishing evidence criteria for implementation strategies in the US

    Literature review and key informant interviews. Our initial literature review yielded several existing rubrics, tools, criteria and processes for evaluating evidence supporting a specific intervention identified primarily in the grey literature developed by large institutions responsible for disseminating evidence-based interventions [5, 21 ...

  29. Literature Review on Big Data Analytics Methods

    Literatu re Review on Big Data Anal ytics Methods DOI: h p:// dx.doi. org/1 0. 57 72/intec hopen.8684 3 and reinforc ement phase are passed in phe romone upda ting procedur e, wher e