Logo for TRU Pressbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

1.4 Scientific Investigations

Created by: CK-12/Adapted by: Christine Miller

“Doing” Science

Science is as much about doing as knowing. Scientists are always trying to learn more and gain a better understanding of the natural world. There are basic methods of gaining knowledge that are common to all of science. At the heart of science is the scientific investigation. A  scientific investigation  is a systematic approach to answering questions about the physical and natural world. Scientific investigations can be observational —  for example, observing a cell under a microscope and recording detailed descriptions. Other scientific investigations are experimental — for example, treating a cell with a drug while recording changes in the behavior of the cell.

The flow chart below shows the typical steps followed in an experimental scientific investigation. The series of steps shown in the flow chart is frequently referred to as the  scientific method .  Science textbooks often present this simple, linear “recipe” for a scientific investigation. This is an oversimplification of how science is actually done, but it does highlight the basic plan and purpose of an experimental scientific investigation: testing ideas with evidence. Each of the steps in the flow chart is discussed in greater detail below.

Science is actually a complex endeavor that cannot be reduced to a single, linear sequence of steps, like the instructions on a package of cake mix. Real science is nonlinear, iterative (repetitive), creative, unpredictable, and exciting. Scientists often undertake the steps of an investigation in a different sequence, or they repeat the same steps many times as they gain more information and develop new ideas. Scientific investigations often raise new questions as old ones are answered. Successive investigations may address the same questions, but at ever deeper levels. Alternatively, an investigation might lead to an unexpected observation that sparks a new question and takes the research in a completely different direction.

Knowing how scientists “do” science can help you in your everyday life, even if you aren’t a scientist. Some steps of the scientific process — such as asking questions and evaluating evidence — can be applied to answering real-life questions and solving practical problems.

Making Observations

Testing an idea typically begins with observations. An  observation is anything that is detected through human senses or with instruments or measuring devices that enhance human senses. We usually think of observations as things we see with our eyes, but we can also make observations with our sense of touch, smell, taste, or hearing. In addition, we can extend and improve our own senses with instruments such as thermometers and microscopes. Other instruments can be used to sense things that human senses cannot detect at all, such as ultraviolet light or radio waves.

Sometimes, chance observations lead to important scientific discoveries. One such observation was made by the Scottish biologist Alexander Fleming (pictured below) in the 1920s. Fleming’s name may sound familiar to you because he is famous for a major discovery. Fleming had been growing a certain type of bacteria on glass plates in his lab when he noticed that one of the plates was contaminated with mold. On closer examination, Fleming observed that the area around the mold was free of bacteria.

Asking Questions

Observations often lead to interesting questions. This is especially true if the observer is thinking like a scientist. Having scientific training and knowledge is also useful. Relevant background knowledge and logical thinking help make sense of observations so the observer can form particularly salient questions. Fleming, for example, wondered whether the mold — or some substance it produced — had killed bacteria on the plate. Fortunately for us, Fleming didn’t just throw out the mold-contaminated plate. Instead, he investigated his question and in so doing, discovered the antibiotic penicillin.

Hypothesis Formation

Typically, the next step in a scientific investigation is to form a hypothesis. A  hypothesis  is a possible answer to a scientific question. But it isn’t just  any  answer. A hypothesis must be based on scientific knowledge. In other words, it shouldn’t be at odds with what is already known about the natural world. A hypothesis also must be logical, and it is beneficial if the hypothesis is relatively simple. In addition, to be useful in science, a hypothesis must be testable and falsifiable . In other words, it must be possible to subject the hypothesis to a test that generates evidence for or against it. It must also be possible to make observations that would disprove the hypothesis if it really is false.

For example, Fleming’s hypothesis might have been: “A particular kind of bacteria growing on a plate will die when exposed to a particular kind of mold.” The hypothesis is logical and based directly on observations. The hypothesis is also simple, involving just one type each of mold and bacteria growing on a plate. In addition, hypotheses are subject to “if/then” conditions. Thus, Fleming might have stated, “If a certain type of mold is introduced to a particular kind of bacteria growing on a plate, then the bacteria will die.” This makes the hypothesis easy to test and ensures that it is falsifiable. If the bacteria were to grow in the presence of the mold, it would disprove the hypothesis (assuming the hypothesis is really false).

Hypothesis Testing

Hypothesis testing is at the heart of the scientific method. How would Fleming test his hypothesis? He would gather relevant data as evidence.  Evidence  is any type of data that may be used to test a hypothesis.  Data  (singular, datum) are essentially just observations. The observations may be measurements in an experiment or just something the researcher notices. Testing a hypothesis then involves using the data to answer two basic questions:

  • If my hypothesis is true, what would I expect to observe?
  • Does what I actually observe match what I expected to observe?

A hypothesis is supported if the actual observations (data) match the expected observations. A hypothesis is refuted if the actual observations differ from the expected observations.

The scientific method is employed by scientists around the world, but it is not always conducted in the order above. Sometimes, hypothesis are formulated before observations are collected; sometimes observations are made before hypothesis are created. Regardless, it is important that scientists record their procedures carefully, allowing others to reproduce and verify the experimental data and results. After many experiments provide results supporting a hypothesis, the hypothesis becomes a  theory . Theories remain theories forever, and are constantly being retested with every experiment and observation. Theories can never become fact or  law .

In science, a law is a mathematical relationship that exists between observations under a given set of conditions. There is a fundamental difference between observations of the physical world and explanations of the nature of the physical world. Hypotheses and theories are explanations, whereas laws and measurements are observational

1.4 Summary

  • The scientific method consists of making observations, formulating a hypothesis, testing the hypothesis with new observations, making a new hypothesis if the new observations contradict the old hypothesis, or continuing to test the hypothesis if the observations agree.
  • A hypothesis is a tentative explanation that can be tested by further observation.
  • A theory is a hypothesis that has been supported with repeated testing.
  • A scientific law is a statement that summarizes the results of many observations.
  • Experimental data must be verified by reproduction from other scientists.
  • Theories must agree with all observations made on the phenomenon under study.
  • Theories are continually tested, forever.

1.4 Review Questions

1.4 explore more.

How simple ideas lead to scientific discoveries, TED-Ed,  2012.


Figure 1.4.1

The Scientific Method (simple) , by Thebiologyprimer on Wikimedia Commons is used under a CC0 1.0 Universal Public Domain Dedication license (https://creativecommons.org/publicdomain/zero/1.0/deed.en).

Figure 1.4.2

Anatomy Bone Bones Check Doctor Examine Film , by rawpixel on Pixabay , used under the Pixabay License (https://pixabay.com/de/service/license/).

Figure 1.4.3

Penicillin Past, Present and Future- the Development and Production of Penicillin, England, 1944 , by Ministry of Information Photo Division Photographer. This photograph was scanned and released by the Imperial War Museum on the IWM Non Commercial Licence. It is now in the public domain (https://en.wikipedia.org/wiki/Public_domain).

TED-Ed. (2012, Mar 13). How simple ideas lead to scientific discoveries. YouTube. https://www.youtube.com/watch?v=F8UFGu2M2gM

Wikipedia contributors. (2020, July 7). Alexander Fleming. In  Wikipedia. https://en.wikipedia.org/w/index.php?title=Alexander_Fleming&oldid=966489433

The way in which scientists and researchers use a systematic approach to answer questions about the world around us.

Principles and procedures for the systematic pursuit of knowledge involving the recognition and formulation of a problem, the collection of data through observation and experiment, and the formulation and testing of hypotheses.

A large body of knowledge and the process by which this knowledge is obtained.

Receiving knowledge of the outside world through our senses, or recording information using scientific tools and instruments.

A testable proposed explanation for a phenomenon.

In the philosophy of science, falsifiability or refutability is the capacity for a statement, theory or hypothesis to be contradicted by evidence. For example, the statement "All swans are white" is falsifiable because one can observe the existence of black swans.

The available body of facts or information indicating whether a belief or proposition is true or valid.

Facts and statistics collected together for reference or analysis.

An explanation of an aspect of the natural world that can be repeatedly tested and verified in accordance with the scientific method.

A statement based on repeated experimental observations that describes some aspect of the world.

Human Biology Copyright © 2020 by Christine Miller is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License , except where otherwise noted.

Share This Book

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Biology LibreTexts

1.1: Scientific Investigation

  • Last updated
  • Save as PDF
  • Page ID 75779

  • Brad Basehore, Michelle A. Bucks, & Christine M. Mummert
  • Harrisburg Area Community College

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

What is science and how do we “do” science?

Science is how we gain knowledge about the natural world. Typically, it pertains only to what we can investigate or observe using our senses – or instruments that extend the ability of our senses. As a science, biology concerns itself with understanding the unity and diversity of living things – the 2,300,00, or so, described (and millions of undescribed) species with which we share planet earth.

Ideally , the SCIENTIFIC METHOD is a process that describes how scientists perform investigations to provide a systematic and rational approach to answer questions about the natural world. One goal is to eliminate bias – and be as objective as possible in what we study. That being said, bias cannot ever be fully removed, but the goal is to recognize and minimize it as much as possible. Ideas that can’t be tested, directly observed, or measured in some way should not be subjected to the scientific method. There are certainly other ways to obtain knowledge (cultural, emotional, etc) they usually do not qualify as science because they do not follow the axioms of science. An axiom in a general sense, is a truth that is accepted without proof. It might seem that science does not assume anything but there are many assumptions that are often ignored such as the relationship between cause and effect, and that our senses and measurements accurately represent reality.

The goal of today’s lab is to familiarize you with the idealized steps of the scientific method but it's important to recognize that science rarely proceeds so linearly.

You will use these steps to determine the effects of caffeine and ethanol on the heart rate of a small aquatic organism known by the Latin name Daphnia magna ( common name : “water flea”; Fig. 1).


This is an ideal model organism because its body is transparent, allowing its internal organs to be viewed with the help of a dissecting microscope (Fig. 2).

importance of a hypothesis in a scientific investigation

Model organisms are non-human species used in research to investigate biological processes. Information learned in studies of model organisms can often be applied to other species, including humans. We use model organisms to learn about many different processes, including genetics, cellular mechanisms, and growth and development. There are certain characteristics that make a species an ideal model organism. For example, it must be easy to manipulate for study, inexpensive and easy to cultivate, and produce lots of offspring. Some commonly used model organisms include Drosophila melanogaster (fruit fly), Caenorhabditis elegans ( C. elegans , roundworm), and Escherichia coli ( E. coli , bacteria). The best model organism to use for a study depends upon the question being investigated. In this study, Daphnia magna is a good model organism because of its transparent body, which allows for ease of measuring heart rate and gathering data on the effects of caffeine and ethanol.


Steps of the Scientific Method

The scientific method consists of the following steps:

  • Making an observation
  • Asking a question based on that observation
  • Forming a logical AND testable answer to that question (stated in terms of a hypothesis )
  • Designing a controlled experiment to see if the hypothesis is supported or rejected
  • Collecting , analyzing , and interpreting the data generated by the experiment

If the conclusion of an experiment is such that a hypothesis is not supported, then another hypothesis must be developed along with another experiment designed to test it. Ultimately, the results of experimentation are often published in peer-reviewed journals (along with detailed methods used to obtain them) so that other researchers can verify or replicate the experiment, and build on that work.

An idealized version of the scientific method is demonstrated in Figure 3. It is considered “idealized” because it is important to note that chance plays an important role in science. Often, the initial observations that result in important discoveries are stumbled upon by accident rather than sought out. Also remember that the scientific method does not apply to observational or discovery science, which is descriptive in nature.



Materials and Supplies:

  • Daphnia magna specimens
  • Compound light microscope
  • Concavity slides
  • Disposable transfer pipettes
  • Test solutions (water and caffeine and ethanol in varying concentrations)
  • Paper towels / Kimwipes
  • Stopwatch / clock (with second hand)
  • Sharpie and plain paper
  • Dissecting probe
  • Diagram of Daphnia magna anatomy

Step 1: Making an Observation

Making and recording observations (often referred to as DESCRIPTIVE SCIENCE ) is the first step in the scientific method. Start by making general observations of the Daphnia in a watch glass.

  • Remove a compound light scope from the storage cabinet as instructed and plug it in.
  • Obtain a concavity slide.
  • Obtain a transfer pipette. Cut off the tip with a pair of scissors. This will prevent the Daphnia from being crushed when forced through a tip that is too narrow.
  • Use the transfer pipette to remove the Daphnia , and some of the water it is in, from the specimen jar and place it on the concavity slide. Make sure that the Daphnia is totally covered by water.
  • Take the Daphnia specimen back to your lab bench and place the slide on the stage of your microscope. Make sure that the 4X objective lens is over the stage.
  • Use the dissecting probe to gently maneuver the Daphnia onto its side so that you can clearly view its heart.
  • View your Daphnia under the microscope. Refer to the anatomy chart and identify the animal’s various parts.
  • Make a sketch of the Daphnia in the circle below. Label the following parts:
  • head region
  • compound eye
  • digestive tract (midgut)
  • thoracic appendages (leg-like structures that function as gills)
  • shell spine
  • Which body parts are moving?
  • Do you see any eggs or young?
  • Once you have found and observed your Daphnia’s heart, count the number of heart beats in one minute. The heartbeat of a healthy specimen is about 2 to 5 beats per second. Because it is so fast, count the heartbeat for 15 seconds and then multiply that number by 4. If necessary, you can keep track of the heart beats by tapping a marker onto a blank sheet of paper and then counting up the number of tap marks.
  • Heart beats per 15 seconds: ____________________________
  • Heart beats per one minute: _____________________________

Steps 2 and 3: Formulating a Question and Stating a Hypothesis

In science, observations often lead to the formulation of questions that generate hypotheses – and associated predictions that are testable. In today’s lab, we are considering the following question: “ What is the effect of commonly consumed chemicals on Daphnia heart rate ”?

A hypothesis is a testable explanation of a set of observations based on available data. It is a tentative answer to the question you are asking based on knowledge about what you're observing and asking. This knowledge can be pre-existing or information from a published resource. For these reasons, it is NOT correct to say that a hypothesis is an educated guess.

In this lab, you need to formulate several hypotheses about how you believe various test solutions will affect the heart rate of a Daphnia based on your prior knowledge of how these solutions affect humans. After you formulate the hypotheses, you will test predictions based on these hypotheses. Hypotheses can be rephrased as predictions and can be written as “ If …., then ….”statements.

For example : “ If I put Daphnia in ice water, then their heart rates will decrease since decreasing temperatures slow down the movement of molecules.”

It’s important to note that the “If…, then…” statement is not the hypothesis, it is a prediction made about the hypothesis.

Formulate a hypothesis to describe what you predict will happen to the Daphnia in each of the following test solutions .

  • Water (from the Daphnia culture jar):
  • Ethyl alcohol (in increasing concentrations: 2%, 4%, 6%, and 8%):
  • Caffeine (in increasing concentrations: 1%, 2%, and 3%):

Step 4: Designing a Controlled Experiment

The next step in the scientific method is to test the predictions based on your hypotheses by designing one or more experiments that allow you to collect the best data to answer your question.

Before doing this, it is necessary to determine the factors (or variables ) you are interested in testing. There are several variables to consider when designing an experiment.

  • An independent variable is the condition or event under study. It is the predetermined condition the investigator sets (and can vary) . Only one independent variable is tested at a time, so that an observed response is attributable to just that variable.
  • A dependent variable is the condition or event that occurs (the data collected) in response to the specified, predetermined, independent variables that are set.
  • Controlled variables are any conditions or events that could potentially affect the outcome of an experiment. Consequently, they must be held constant (controlled) and never varied . In the case of our Daphnia experiment, an example of a controlled variable would be the temperature of the water in which the Daphnia are tested. This variable needs to be controlled because Daphnia hearts beat faster in warm water than they do in cold water.

In the spaces below, define the variables that will be considered in your experiment today:

  • What will be the independent variables in the Daphnia experiments? List all of them.
  • What will be the dependent variables in the Daphnia experiments? Be specific with your answers.
  • Apart from water temperature, what other variables should be controlled ? List at least 3 controlled variables.

Importance of a Control Group

Most well-planned experiments contain a control group in addition to an experimental group . The experimental group is the group whose experience is manipulated – usually by only one variable at a time. The control group is the group used for a comparison; it serves as a baseline against which the effects of a treatment can be evaluated. A control group should be as much like the experimental group as possible. It should be treated in every way like the experimental group except for one manipulated factor (the independent variable).

Performing the Experiment

  • plain water (from the Daphnia culture jar)
  • JUST ONE of the test solutions (at all concentrations listed): Either -
  • Ethyl alcohol (2%, 4%, 6%, and 8%)
  • Caffeine (1%, 2%, 3%, 4%)
  • To test a solution, you will need to remove most of the existing water covering the Daphnia in your concavity slide. “Wick it away” with a Kimwipe at the same time you add your test solution with a transfer pipette. Determine the volume of water needed to fill the concavity slide and cover the Daphnia . Use this same volume of water for each treatment. Make sure to keep the Daphnia submerged in fluid! If your Daphnia dies at any point, you need to re-start the experiment with a new specimen from the culture jar .
  • You will subject the Daphnia to water for a replicate of 8 treatments. Add the first treatment (water from the Daphnia culture jar), wait 1 minute, and count the heartbeats for 15 seconds. Record your data in Table 1 ( Step 5 ) and calculate the number of beats per minute.
  • Add the second treatment (more water from the Daphnia culture jar) by wicking the previous water sample away as described above. Wait 1 minute, then count the heartbeats for 15 seconds. Record your data in Table 1 and calculate the number of beats per minute.
  • Repeat these steps 6 more times. Use your data to calculate an average value for the effects of water on the heart rate of your Daphnia . This part of the experiment is the control for your experiment. It serves as the baseline against which you can compare the results from the Daphnia you subject to the ethyl alcohol or caffeine.
  • Next, test all the other solutions your group has been assigned (either ethyl alcohol or caffeine). Start with the lowest concentration of the test solution and progress to the highest concentration.
  • Note : Be sure to keep all the steps of your experimental protocols exactly the same (add the same volume of test solution, equivalent to the volume of water added in the control treatments). Always wait one minute before counting, and record the heartbeats for 15 seconds (just as performed in the control experiment). Due to time constraints, do only one run (treatment) for each test solution . Record your data in Table 2 (Step 5).
  • When your tests have been completed, use a pipe test to transfer your Daphnia to the recovery beaker (as indicated by the instructor).
  • Wash and dry all the glassware you used and put it back where you found it. Dispose of the used pipettes in the trash. Make sure the lids are placed back on all of your solution bottles. Clean up any mess you may have made and wipe down the lab benches with the paper towels.
  • Compile your class data as directed by the instructor.

Step 5: Collecting, Analyzing, and Interpreting the Data

Substance tested by your group: ______________________________________

* Start with the lowest concentration of the test solution, followed by the next higher concentration (lowest to highest concentrations).

Experimental data and results must be displayed in a clear logical manner. Tables, charts , and graphs are usually the most effective tools to provide a concise summary of the type of numerical data you collected today.

A graph is a diagram showing the relationship between independent and dependent variables .

When making graphs, the following rules should be observed:

  • The independent variable is usually plotted on the X-axis (horizontal axis) and the dependent variable is plotted on the Y-axis (vertical axis).
  • Each axis should be labeled properly with the name of the variable and the units of measurement.
  • Data intervals must be evenly spaced across the axes, usually beginning with zero and increasing in consistent even increments.
  • All graphs should have a title or caption to describe the information presented. Capitalize the first word in the title and place a period at the end.
  • Line graphs show changes in the quantity of the chosen variable and emphasize the rise and fall of the values over their range.
  • Bar graphs are used for data that represent separate or discontinuous groups or non-numerical categories, thus emphasizing the discrete differences between the groups.

*Refer to the Graphing Grading Rubric at the end of this lab.

Graph Your Results:

Discuss with your group how to design the graph so it best represents your data and ultimately the conclusions you draw.

Use the grid below to graph your group’s results:


Interpret Your Results:

Once you have collected your data and summarized it as a graph, the last step is to analyze and interpret your results. Ultimately, you have reached the stage in the scientific method process where you need to determine whether the hypothesis you initially generated has been supported or refuted (not supported).

Questions for Review

  • What is the difference between a control group and a controlled variable?
  • What are some types of questions science can’t answer?
  • Where would you find an independent variable on a line graph?
  • What are ‘levels of treatment’?
  • Much like citing your sources, making a graph usually follows format guidelines. What is the correct way to format a graph in APA style?

Practical Challenge

  • Give an example of a well written hypothesis and a prediction based on this hypothesis.
  • What kind of data would be appropriate to use for a bar graph?
  • Did the results of your experiments support or refute your hypothesis?

Science and Hypothesis

  • First Online: 13 June 2021

Cite this chapter

importance of a hypothesis in a scientific investigation

  • Satya Sundar Sethy 2  

In this chapter, we will discuss the significance of a ‘hypothesis’ in a logical inquiry, a scientific investigation, and research work. We will enumerate some of the definitions of ‘hypothesis’. We will elaborate on the nature and scope of the ‘hypothesis’ and the sources to obtain a hypothesis. Further, we will explain the kinds of hypothesis with suitable examples. In the end, we will illustrate methods to verify a hypothesis in a logical inquiry and a scientific investigation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Werkmeister, W.H. (1948). The basis and structure of knowledge . New York: Haper and Bros Publication.

Lundberg, G.A. (1968). Social research: A study in methods of gathering data . New York: Greenwood Press.

Black, J. A., and Champion, D.J. (1976). Method and issues in social research . New York: John Wiley & Sons.

Goode, W.J., and Hatt, P.K. (1971). Methods in social research . New York: McGraw-Hill Publication.

https://www.merriam-webster.com/dictionary/hypothesis .

Sarantakos, S. (2005) (3rd Edition). Social research . New York: Palgrave Macmillan.

Author information

Authors and affiliations.

Department of Humanities and Social Sciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India

Satya Sundar Sethy

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this chapter

Sethy, S.S. (2021). Science and Hypothesis. In: Introduction to Logic and Logical Discourse. Springer, Singapore. https://doi.org/10.1007/978-981-16-2689-0_17

Download citation

DOI : https://doi.org/10.1007/978-981-16-2689-0_17

Published : 13 June 2021

Publisher Name : Springer, Singapore

Print ISBN : 978-981-16-2688-3

Online ISBN : 978-981-16-2689-0

eBook Packages : Religion and Philosophy Philosophy and Religion (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

importance of a hypothesis in a scientific investigation

Home Market Research

Research Hypothesis: What It Is, Types + How to Develop?

A research hypothesis proposes a link between variables. Uncover its types and the secrets to creating hypotheses for scientific inquiry.

A research study starts with a question. Researchers worldwide ask questions and create research hypotheses. The effectiveness of research relies on developing a good research hypothesis. Examples of research hypotheses can guide researchers in writing effective ones.

In this blog, we’ll learn what a research hypothesis is, why it’s important in research, and the different types used in science. We’ll also guide you through creating your research hypothesis and discussing ways to test and evaluate it.

What is a Research Hypothesis?

A hypothesis is like a guess or idea that you suggest to check if it’s true. A research hypothesis is a statement that brings up a question and predicts what might happen.

It’s really important in the scientific method and is used in experiments to figure things out. Essentially, it’s an educated guess about how things are connected in the research.

A research hypothesis usually includes pointing out the independent variable (the thing they’re changing or studying) and the dependent variable (the result they’re measuring or watching). It helps plan how to gather and analyze data to see if there’s evidence to support or deny the expected connection between these variables.

Importance of Hypothesis in Research

Hypotheses are really important in research. They help design studies, allow for practical testing, and add to our scientific knowledge. Their main role is to organize research projects, making them purposeful, focused, and valuable to the scientific community. Let’s look at some key reasons why they matter:

  • A research hypothesis helps test theories.

A hypothesis plays a pivotal role in the scientific method by providing a basis for testing existing theories. For example, a hypothesis might test the predictive power of a psychological theory on human behavior.

  • It serves as a great platform for investigation activities.

It serves as a launching pad for investigation activities, which offers researchers a clear starting point. A research hypothesis can explore the relationship between exercise and stress reduction.

  • Hypothesis guides the research work or study.

A well-formulated hypothesis guides the entire research process. It ensures that the study remains focused and purposeful. For instance, a hypothesis about the impact of social media on interpersonal relationships provides clear guidance for a study.

  • Hypothesis sometimes suggests theories.

In some cases, a hypothesis can suggest new theories or modifications to existing ones. For example, a hypothesis testing the effectiveness of a new drug might prompt a reconsideration of current medical theories.

  • It helps in knowing the data needs.

A hypothesis clarifies the data requirements for a study, ensuring that researchers collect the necessary information—a hypothesis guiding the collection of demographic data to analyze the influence of age on a particular phenomenon.

  • The hypothesis explains social phenomena.

Hypotheses are instrumental in explaining complex social phenomena. For instance, a hypothesis might explore the relationship between economic factors and crime rates in a given community.

  • Hypothesis provides a relationship between phenomena for empirical Testing.

Hypotheses establish clear relationships between phenomena, paving the way for empirical testing. An example could be a hypothesis exploring the correlation between sleep patterns and academic performance.

  • It helps in knowing the most suitable analysis technique.

A hypothesis guides researchers in selecting the most appropriate analysis techniques for their data. For example, a hypothesis focusing on the effectiveness of a teaching method may lead to the choice of statistical analyses best suited for educational research.

Characteristics of a Good Research Hypothesis

A hypothesis is a specific idea that you can test in a study. It often comes from looking at past research and theories. A good hypothesis usually starts with a research question that you can explore through background research. For it to be effective, consider these key characteristics:

  • Clear and Focused Language: A good hypothesis uses clear and focused language to avoid confusion and ensure everyone understands it.
  • Related to the Research Topic: The hypothesis should directly relate to the research topic, acting as a bridge between the specific question and the broader study.
  • Testable: An effective hypothesis can be tested, meaning its prediction can be checked with real data to support or challenge the proposed relationship.
  • Potential for Exploration: A good hypothesis often comes from a research question that invites further exploration. Doing background research helps find gaps and potential areas to investigate.
  • Includes Variables: The hypothesis should clearly state both the independent and dependent variables, specifying the factors being studied and the expected outcomes.
  • Ethical Considerations: Check if variables can be manipulated without breaking ethical standards. It’s crucial to maintain ethical research practices.
  • Predicts Outcomes: The hypothesis should predict the expected relationship and outcome, acting as a roadmap for the study and guiding data collection and analysis.
  • Simple and Concise: A good hypothesis avoids unnecessary complexity and is simple and concise, expressing the essence of the proposed relationship clearly.
  • Clear and Assumption-Free: The hypothesis should be clear and free from assumptions about the reader’s prior knowledge, ensuring universal understanding.
  • Observable and Testable Results: A strong hypothesis implies research that produces observable and testable results, making sure the study’s outcomes can be effectively measured and analyzed.

When you use these characteristics as a checklist, it can help you create a good research hypothesis. It’ll guide improving and strengthening the hypothesis, identifying any weaknesses, and making necessary changes. Crafting a hypothesis with these features helps you conduct a thorough and insightful research study.

Types of Research Hypotheses

The research hypothesis comes in various types, each serving a specific purpose in guiding the scientific investigation. Knowing the differences will make it easier for you to create your own hypothesis. Here’s an overview of the common types:

01. Null Hypothesis

The null hypothesis states that there is no connection between two considered variables or that two groups are unrelated. As discussed earlier, a hypothesis is an unproven assumption lacking sufficient supporting data. It serves as the statement researchers aim to disprove. It is testable, verifiable, and can be rejected.

For example, if you’re studying the relationship between Project A and Project B, assuming both projects are of equal standard is your null hypothesis. It needs to be specific for your study.

02. Alternative Hypothesis

The alternative hypothesis is basically another option to the null hypothesis. It involves looking for a significant change or alternative that could lead you to reject the null hypothesis. It’s a different idea compared to the null hypothesis.

When you create a null hypothesis, you’re making an educated guess about whether something is true or if there’s a connection between that thing and another variable. If the null view suggests something is correct, the alternative hypothesis says it’s incorrect. 

For instance, if your null hypothesis is “I’m going to be $1000 richer,” the alternative hypothesis would be “I’m not going to get $1000 or be richer.”

03. Directional Hypothesis

The directional hypothesis predicts the direction of the relationship between independent and dependent variables. They specify whether the effect will be positive or negative.

If you increase your study hours, you will experience a positive association with your exam scores. This hypothesis suggests that as you increase the independent variable (study hours), there will also be an increase in the dependent variable (exam scores).

04. Non-directional Hypothesis

The non-directional hypothesis predicts the existence of a relationship between variables but does not specify the direction of the effect. It suggests that there will be a significant difference or relationship, but it does not predict the nature of that difference.

For example, you will find no notable difference in test scores between students who receive the educational intervention and those who do not. However, once you compare the test scores of the two groups, you will notice an important difference.

05. Simple Hypothesis

A simple hypothesis predicts a relationship between one dependent variable and one independent variable without specifying the nature of that relationship. It’s simple and usually used when we don’t know much about how the two things are connected.

For example, if you adopt effective study habits, you will achieve higher exam scores than those with poor study habits.

06. Complex Hypothesis

A complex hypothesis is an idea that specifies a relationship between multiple independent and dependent variables. It is a more detailed idea than a simple hypothesis.

While a simple view suggests a straightforward cause-and-effect relationship between two things, a complex hypothesis involves many factors and how they’re connected to each other.

For example, when you increase your study time, you tend to achieve higher exam scores. The connection between your study time and exam performance is affected by various factors, including the quality of your sleep, your motivation levels, and the effectiveness of your study techniques.

If you sleep well, stay highly motivated, and use effective study strategies, you may observe a more robust positive correlation between the time you spend studying and your exam scores, unlike those who may lack these factors.

07. Associative Hypothesis

An associative hypothesis proposes a connection between two things without saying that one causes the other. Basically, it suggests that when one thing changes, the other changes too, but it doesn’t claim that one thing is causing the change in the other.

For example, you will likely notice higher exam scores when you increase your study time. You can recognize an association between your study time and exam scores in this scenario.

Your hypothesis acknowledges a relationship between the two variables—your study time and exam scores—without asserting that increased study time directly causes higher exam scores. You need to consider that other factors, like motivation or learning style, could affect the observed association.

08. Causal Hypothesis

A causal hypothesis proposes a cause-and-effect relationship between two variables. It suggests that changes in one variable directly cause changes in another variable.

For example, when you increase your study time, you experience higher exam scores. This hypothesis suggests a direct cause-and-effect relationship, indicating that the more time you spend studying, the higher your exam scores. It assumes that changes in your study time directly influence changes in your exam performance.

09. Empirical Hypothesis

An empirical hypothesis is a statement based on things we can see and measure. It comes from direct observation or experiments and can be tested with real-world evidence. If an experiment proves a theory, it supports the idea and shows it’s not just a guess. This makes the statement more reliable than a wild guess.

For example, if you increase the dosage of a certain medication, you might observe a quicker recovery time for patients. Imagine you’re in charge of a clinical trial. In this trial, patients are given varying dosages of the medication, and you measure and compare their recovery times. This allows you to directly see the effects of different dosages on how fast patients recover.

This way, you can create a research hypothesis: “Increasing the dosage of a certain medication will lead to a faster recovery time for patients.”

10. Statistical Hypothesis

A statistical hypothesis is a statement or assumption about a population parameter that is the subject of an investigation. It serves as the basis for statistical analysis and testing. It is often tested using statistical methods to draw inferences about the larger population.

In a hypothesis test, statistical evidence is collected to either reject the null hypothesis in favor of the alternative hypothesis or fail to reject the null hypothesis due to insufficient evidence.

For example, let’s say you’re testing a new medicine. Your hypothesis could be that the medicine doesn’t really help patients get better. So, you collect data and use statistics to see if your guess is right or if the medicine actually makes a difference.

If the data strongly shows that the medicine does help, you say your guess was wrong, and the medicine does make a difference. But if the proof isn’t strong enough, you can stick with your original guess because you didn’t get enough evidence to change your mind.

How to Develop a Research Hypotheses?

Step 1: identify your research problem or topic..

Define the area of interest or the problem you want to investigate. Make sure it’s clear and well-defined.

Start by asking a question about your chosen topic. Consider the limitations of your research and create a straightforward problem related to your topic. Once you’ve done that, you can develop and test a hypothesis with evidence.

Step 2: Conduct a literature review

Review existing literature related to your research problem. This will help you understand the current state of knowledge in the field, identify gaps, and build a foundation for your hypothesis. Consider the following questions:

  • What existing research has been conducted on your chosen topic?
  • Are there any gaps or unanswered questions in the current literature?
  • How will the existing literature contribute to the foundation of your research?

Step 3: Formulate your research question

Based on your literature review, create a specific and concise research question that addresses your identified problem. Your research question should be clear, focused, and relevant to your field of study.

Step 4: Identify variables

Determine the key variables involved in your research question. Variables are the factors or phenomena that you will study and manipulate to test your hypothesis.

  • Independent Variable: The variable you manipulate or control.
  • Dependent Variable: The variable you measure to observe the effect of the independent variable.

Step 5: State the Null hypothesis

The null hypothesis is a statement that there is no significant difference or effect. It serves as a baseline for comparison with the alternative hypothesis.

Step 6: Select appropriate methods for testing the hypothesis

Choose research methods that align with your study objectives, such as experiments, surveys, or observational studies. The selected methods enable you to test your research hypothesis effectively.

Creating a research hypothesis usually takes more than one try. Expect to make changes as you collect data. It’s normal to test and say no to a few hypotheses before you find the right answer to your research question.

Testing and Evaluating Hypotheses

Testing hypotheses is a really important part of research. It’s like the practical side of things. Here, real-world evidence will help you determine how different things are connected. Let’s explore the main steps in hypothesis testing:

  • State your research hypothesis.

Before testing, clearly articulate your research hypothesis. This involves framing both a null hypothesis, suggesting no significant effect or relationship, and an alternative hypothesis, proposing the expected outcome.

  • Collect data strategically.

Plan how you will gather information in a way that fits your study. Make sure your data collection method matches the things you’re studying.

Whether through surveys, observations, or experiments, this step demands precision and adherence to the established methodology. The quality of data collected directly influences the credibility of study outcomes.

  • Perform an appropriate statistical test.

Choose a statistical test that aligns with the nature of your data and the hypotheses being tested. Whether it’s a t-test, chi-square test, ANOVA, or regression analysis, selecting the right statistical tool is paramount for accurate and reliable results.

  • Decide if your idea was right or wrong.

Following the statistical analysis, evaluate the results in the context of your null hypothesis. You need to decide if you should reject your null hypothesis or not.

  • Share what you found.

When discussing what you found in your research, be clear and organized. Say whether your idea was supported or not, and talk about what your results mean. Also, mention any limits to your study and suggest ideas for future research.

The Role of QuestionPro to Develop a Good Research Hypothesis

QuestionPro is a survey and research platform that provides tools for creating, distributing, and analyzing surveys. It plays a crucial role in the research process, especially when you’re in the initial stages of hypothesis development. Here’s how QuestionPro can help you to develop a good research hypothesis:

  • Survey design and data collection: You can use the platform to create targeted questions that help you gather relevant data.
  • Exploratory research: Through surveys and feedback mechanisms on QuestionPro, you can conduct exploratory research to understand the landscape of a particular subject.
  • Literature review and background research: QuestionPro surveys can collect sample population opinions, experiences, and preferences. This data and a thorough literature evaluation can help you generate a well-grounded hypothesis by improving your research knowledge.
  • Identifying variables: Using targeted survey questions, you can identify relevant variables related to their research topic.
  • Testing assumptions: You can use surveys to informally test certain assumptions or hypotheses before formalizing a research hypothesis.
  • Data analysis tools: QuestionPro provides tools for analyzing survey data. You can use these tools to identify the collected data’s patterns, correlations, or trends.
  • Refining your hypotheses: As you collect data through QuestionPro, you can adjust your hypotheses based on the real-world responses you receive.

A research hypothesis is like a guide for researchers in science. It’s a well-thought-out idea that has been thoroughly tested. This idea is crucial as researchers can explore different fields, such as medicine, social sciences, and natural sciences. The research hypothesis links theories to real-world evidence and gives researchers a clear path to explore and make discoveries.

QuestionPro Research Suite is a helpful tool for researchers. It makes creating surveys, collecting data, and analyzing information easily. It supports all kinds of research, from exploring new ideas to forming hypotheses. With a focus on using data, it helps researchers do their best work.

Are you interested in learning more about QuestionPro Research Suite? Take advantage of QuestionPro’s free trial to get an initial look at its capabilities and realize the full potential of your research efforts.



importance of a hypothesis in a scientific investigation

What Are My Employees Really Thinking? The Power of Open-ended Survey Analysis

May 24, 2024

When I think of “disconnected”, it is important that this is not just in relation to people analytics, Employee Experience or Customer Experience - it is also relevant to looking across them.

I Am Disconnected – Tuesday CX Thoughts

May 21, 2024

Customer success tools

20 Best Customer Success Tools of 2024

May 20, 2024

AI-Based Services in Market Research

AI-Based Services Buying Guide for Market Research (based on ESOMAR’s 20 Questions) 

Other categories.

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

What is a scientific hypothesis?

It's the initial building block in the scientific method.

A girl looks at plants in a test tube for a science experiment. What's her scientific hypothesis?

Hypothesis basics

What makes a hypothesis testable.

  • Types of hypotheses
  • Hypothesis versus theory

Additional resources


A scientific hypothesis is a tentative, testable explanation for a phenomenon in the natural world. It's the initial building block in the scientific method . Many describe it as an "educated guess" based on prior knowledge and observation. While this is true, a hypothesis is more informed than a guess. While an "educated guess" suggests a random prediction based on a person's expertise, developing a hypothesis requires active observation and background research. 

The basic idea of a hypothesis is that there is no predetermined outcome. For a solution to be termed a scientific hypothesis, it has to be an idea that can be supported or refuted through carefully crafted experimentation or observation. This concept, called falsifiability and testability, was advanced in the mid-20th century by Austrian-British philosopher Karl Popper in his famous book "The Logic of Scientific Discovery" (Routledge, 1959).

A key function of a hypothesis is to derive predictions about the results of future experiments and then perform those experiments to see whether they support the predictions.

A hypothesis is usually written in the form of an if-then statement, which gives a possibility (if) and explains what may happen because of the possibility (then). The statement could also include "may," according to California State University, Bakersfield .

Here are some examples of hypothesis statements:

  • If garlic repels fleas, then a dog that is given garlic every day will not get fleas.
  • If sugar causes cavities, then people who eat a lot of candy may be more prone to cavities.
  • If ultraviolet light can damage the eyes, then maybe this light can cause blindness.

A useful hypothesis should be testable and falsifiable. That means that it should be possible to prove it wrong. A theory that can't be proved wrong is nonscientific, according to Karl Popper's 1963 book " Conjectures and Refutations ."

An example of an untestable statement is, "Dogs are better than cats." That's because the definition of "better" is vague and subjective. However, an untestable statement can be reworded to make it testable. For example, the previous statement could be changed to this: "Owning a dog is associated with higher levels of physical fitness than owning a cat." With this statement, the researcher can take measures of physical fitness from dog and cat owners and compare the two.

Types of scientific hypotheses

Elementary-age students study alternative energy using homemade windmills during public school science class.

In an experiment, researchers generally state their hypotheses in two ways. The null hypothesis predicts that there will be no relationship between the variables tested, or no difference between the experimental groups. The alternative hypothesis predicts the opposite: that there will be a difference between the experimental groups. This is usually the hypothesis scientists are most interested in, according to the University of Miami .

For example, a null hypothesis might state, "There will be no difference in the rate of muscle growth between people who take a protein supplement and people who don't." The alternative hypothesis would state, "There will be a difference in the rate of muscle growth between people who take a protein supplement and people who don't."

If the results of the experiment show a relationship between the variables, then the null hypothesis has been rejected in favor of the alternative hypothesis, according to the book " Research Methods in Psychology " (​​BCcampus, 2015). 

There are other ways to describe an alternative hypothesis. The alternative hypothesis above does not specify a direction of the effect, only that there will be a difference between the two groups. That type of prediction is called a two-tailed hypothesis. If a hypothesis specifies a certain direction — for example, that people who take a protein supplement will gain more muscle than people who don't — it is called a one-tailed hypothesis, according to William M. K. Trochim , a professor of Policy Analysis and Management at Cornell University.

Sometimes, errors take place during an experiment. These errors can happen in one of two ways. A type I error is when the null hypothesis is rejected when it is true. This is also known as a false positive. A type II error occurs when the null hypothesis is not rejected when it is false. This is also known as a false negative, according to the University of California, Berkeley . 

A hypothesis can be rejected or modified, but it can never be proved correct 100% of the time. For example, a scientist can form a hypothesis stating that if a certain type of tomato has a gene for red pigment, that type of tomato will be red. During research, the scientist then finds that each tomato of this type is red. Though the findings confirm the hypothesis, there may be a tomato of that type somewhere in the world that isn't red. Thus, the hypothesis is true, but it may not be true 100% of the time.

Scientific theory vs. scientific hypothesis

The best hypotheses are simple. They deal with a relatively narrow set of phenomena. But theories are broader; they generally combine multiple hypotheses into a general explanation for a wide range of phenomena, according to the University of California, Berkeley . For example, a hypothesis might state, "If animals adapt to suit their environments, then birds that live on islands with lots of seeds to eat will have differently shaped beaks than birds that live on islands with lots of insects to eat." After testing many hypotheses like these, Charles Darwin formulated an overarching theory: the theory of evolution by natural selection.

"Theories are the ways that we make sense of what we observe in the natural world," Tanner said. "Theories are structures of ideas that explain and interpret facts." 

  • Read more about writing a hypothesis, from the American Medical Writers Association.
  • Find out why a hypothesis isn't always necessary in science, from The American Biology Teacher.
  • Learn about null and alternative hypotheses, from Prof. Essa on YouTube .

Encyclopedia Britannica. Scientific Hypothesis. Jan. 13, 2022. https://www.britannica.com/science/scientific-hypothesis

Karl Popper, "The Logic of Scientific Discovery," Routledge, 1959.

California State University, Bakersfield, "Formatting a testable hypothesis." https://www.csub.edu/~ddodenhoff/Bio100/Bio100sp04/formattingahypothesis.htm  

Karl Popper, "Conjectures and Refutations," Routledge, 1963.

Price, P., Jhangiani, R., & Chiang, I., "Research Methods of Psychology — 2nd Canadian Edition," BCcampus, 2015.‌

University of Miami, "The Scientific Method" http://www.bio.miami.edu/dana/161/evolution/161app1_scimethod.pdf  

William M.K. Trochim, "Research Methods Knowledge Base," https://conjointly.com/kb/hypotheses-explained/  

University of California, Berkeley, "Multiple Hypothesis Testing and False Discovery Rate" https://www.stat.berkeley.edu/~hhuang/STAT141/Lecture-FDR.pdf  

University of California, Berkeley, "Science at multiple levels" https://undsci.berkeley.edu/article/0_0_0/howscienceworks_19

Sign up for the Live Science daily newsletter now

Get the world’s most fascinating discoveries delivered straight to your inbox.

Earth from space: Ethereal algal vortex blooms at the heart of massive Baltic 'dead zone'

10 surprising things that are made from petroleum

Bright comet headed toward Earth could be visible with the naked eye

Most Popular

  • 2 10 surprising things that are made from petroleum
  • 3 Scientists just discovered an enormous lithium reservoir under Pennsylvania
  • 4 Alaska's rivers are turning bright orange and as acidic as vinegar as toxic metal escapes from melting permafrost
  • 5 Ancient Mycenaean armor is so good, it protected users in an 11-hour battle simulation inspired by the Trojan War
  • 2 Deepest blue hole in the world discovered, with hidden caves and tunnels believed to be inside
  • 3 10 surprising things that are made from petroleum
  • 4 What's the highest place on Earth that humans live?
  • 5 Why did Homo sapiens emerge in Africa?

importance of a hypothesis in a scientific investigation

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Microb Biotechnol
  • v.15(11); 2022 Nov

Logo of microbiotech

On the role of hypotheses in science

Harald brüssow.

1 Laboratory of Gene Technology, Department of Biosystems, KU Leuven, Leuven Belgium

Associated Data

Scientific research progresses by the dialectic dialogue between hypothesis building and the experimental testing of these hypotheses. Microbiologists as biologists in general can rely on an increasing set of sophisticated experimental methods for hypothesis testing such that many scientists maintain that progress in biology essentially comes with new experimental tools. While this is certainly true, the importance of hypothesis building in science should not be neglected. Some scientists rely on intuition for hypothesis building. However, there is also a large body of philosophical thinking on hypothesis building whose knowledge may be of use to young scientists. The present essay presents a primer into philosophical thoughts on hypothesis building and illustrates it with two hypotheses that played a major role in the history of science (the parallel axiom and the fifth element hypothesis). It continues with philosophical concepts on hypotheses as a calculus that fits observations (Copernicus), the need for plausibility (Descartes and Gilbert) and for explicatory power imposing a strong selection on theories (Darwin, James and Dewey). Galilei introduced and James and Poincaré later justified the reductionist principle in hypothesis building. Waddington stressed the feed‐forward aspect of fruitful hypothesis building, while Poincaré called for a dialogue between experiment and hypothesis and distinguished false, true, fruitful and dangerous hypotheses. Theoretical biology plays a much lesser role than theoretical physics because physical thinking strives for unification principle across the universe while biology is confronted with a breathtaking diversity of life forms and its historical development on a single planet. Knowledge of the philosophical foundations on hypothesis building in science might stimulate more hypothesis‐driven experimentation that simple observation‐oriented “fishing expeditions” in biological research.

Short abstract

Scientific research progresses by the dialectic dialogue between hypothesis building and the experimental testing of these hypotheses. Microbiologists can rely on an increasing set of sophisticated experimental methods for hypothesis testing but the importance of hypothesis building in science should not be neglected. This Lilliput offers a primer on philosophical concepts on hypotheses in science.


Philosophy of science and the theory of knowledge (epistemology) are important branches of philosophy. However, philosophy has over the centuries lost its dominant role it enjoyed in antiquity and became in Medieval Ages the maid of theology (ancilla theologiae) and after the rise of natural sciences and its technological applications many practising scientists and the general public doubt whether they need philosophical concepts in their professional and private life. This is in the opinion of the writer of this article, an applied microbiologist, shortsighted for several reasons. Philosophers of the 20th century have made important contributions to the theory of knowledge, and many eminent scientists grew interested in philosophical problems. Mathematics which plays such a prominent role in physics and increasingly also in other branches of science is a hybrid: to some extent, it is the paradigm of an exact science while its abstract aspects are deeply rooted in philosophical thinking. In the present essay, the focus is on hypothesis and hypothesis building in science, essentially it is a compilation what philosophers and scientists thought about this subject in past and present. The controversy between the mathematical mind and that of the practical mind is an old one. The philosopher, physicist and mathematician Pascal ( 1623 –1662a) wrote in his Pensées : “Mathematicians who are only mathematicians have exact minds, provided all things are explained to them by means of definitions and axioms; otherwise they are inaccurate. They are only right when the principles are quite clear. And men of intuition cannot have the patience to reach to first principles of things speculative and conceptional, which they have never seen in the world and which are altogether out of the common. The intellect can be strong and narrow, and can be comprehensive and weak.” Hypothesis building is an act both of intuition and exact thinking and I hope that theoretical knowledge about hypothesis building will also profit young microbiologists.


In the following, I will illustrate the importance of hypothesis building for the history of science and the development of knowledge and illustrate it with two famous concepts, the parallel axiom in mathematics and the five elements hypothesis in physics.

Euclidean geometry

The prominent role of hypotheses in the development of science becomes already clear in the first science book of the Western civilization: Euclid's The Elements written about 300 BC starts with a set of statements called Definitions, Postulates and Common Notions that lay out the foundation of geometry (Euclid,  c.323‐c.283 ). This axiomatic approach is very modern as exemplified by the fact that Euclid's book remained for long time after the Bible the most read book in the Western hemisphere and a backbone of school teaching in mathematics. Euclid's twenty‐three definitions start with sentences such as “1. A point is that which has no part; 2. A line is breadthless length; 3. The extremities of a line are points”; and continues with the definition of angles (“8. A plane angle is the inclination to one another of two lines in a plane which meet one another and do not lie in a straight line”) and that of circles, triangles and quadrilateral figures. For the history of science, the 23rd definition of parallels is particularly interesting: “Parallel straight lines are straight lines which, being in the same plane and being produced indefinitely in both directions, do not meet one another in either direction”. This is the famous parallel axiom. It is clear that the parallel axiom cannot be the result of experimental observations, but must be a concept created in the mind. Euclid ends with five Common Notions (“1. Things which are equal to the same thing are also equal to one another, to 5. The whole is greater than the part”). The establishment of a contradiction‐free system for a branch of mathematics based on a set of axioms from which theorems were deduced was revolutionary modern. Hilbert ( 1899 ) formulated a sound modern formulation for Euclidian geometry. Hilbert's axiom system contains the notions “point, line and plane” and the concepts of “betweenness, containment and congruence” leading to five axioms, namely the axioms of Incidence (“Verknüpfung”), of Order (“Anordnung”), of Congruence, of Continuity (“Stetigkeit”) and of Parallels.

Origin of axioms

Philosophers gave various explanations for the origin of the Euclidean hypotheses or axioms. Plato considered geometrical figures as related to ideas (the true things behind the world of appearances). Aristoteles considered geometric figures as abstractions of physical bodies. Descartes perceived geometric figures as inborn ideas from extended bodies ( res extensa ), while Pascal thought that the axioms of Euclidian geometry were derived from intuition. Kant reasoned that Euclidian geometry represented a priori perceptions of space. Newton considered geometry as part of general mechanics linked to theories of measurement. Hilbert argued that the axioms of mathematical geometry are neither the result of contemplation (“Anschauung”) nor of psychological source. For him, axioms were formal propositions (“formale Aussageformen”) characterized by consistency (“Widerspruchsfreiheit”, i.e. absence of contradiction) (Mittelstrass,  1980a ).


Axioms were also differently defined by philosophers. In Topics , Aristoteles calls axioms the assumptions taken up by one partner of a dialogue to initiate a dialectic discussion. Plato states that an axiom needs to be an acceptable or credible proposition, which cannot be justified by reference to other statements. Yet, a justification is not necessary because an axiom is an evident statement. In modern definition, axioms are methodical first sentences in the foundation of a deductive science (Mittelstrass,  1980a ). In Posterior Analytics , Aristotle defines postulates as positions which are at least initially not accepted by the dialogue partners while hypotheses are accepted for the sake of reasoning. In Euclid's book, postulates are construction methods that assure the existence of the geometric objects. Today postulates and axioms are used as synonyms while the 18th‐century philosophy made differences: Lambert defined axioms as descriptive sentences and postulates as prescriptive sentences. According to Kant, mathematical postulates create (synthesize) concepts (Mittelstrass,  1980b ). Definitions then fix the use of signs; they can be semantic definitions that explain the proper meaning of a sign in common language use (in a dictionary style) or they can be syntactic definitions that regulate the use of these signs in formal operations. Nominal definitions explain the words, while real definitions explain the meaning or the nature of the defined object. Definitions are thus essential for the development of a language of science, assuring communication and mutual understanding (Mittelstrass,  1980c ). Finally, hypotheses are also frequently defined as consistent conjectures that are compatible with the available knowledge. The truth of the hypothesis is only supposed in order to explain true observations and facts. Consequences of this hypothetical assumptions should explain the observed facts. Normally, descriptive hypotheses precede explanatory hypotheses in the development of scientific thought. Sometimes only tentative concepts are introduced as working hypotheses to test whether they have an explanatory capacity for the observations (Mittelstrass,  1980d ).

The Euclidian geometry is constructed along a logical “if→then” concept. The “if‐clause” formulates at the beginning the supposition, the “then clause” formulates the consequences from these axioms which provides a system of geometric theorems or insights. The conclusions do not follow directly from the hypothesis; this would otherwise represent self‐evident immediate conclusions. The “if‐then” concept in geometry is not used as in other branches of science where the consequences deduced from the axioms are checked against reality whether they are true, in order to confirm the validity of the hypothesis. The task in mathematics is: what can be logically deduced from a given set of axioms to build a contradiction‐free system of geometry. Whether this applies to the real world is in contrast to the situation in natural sciences another question and absolutely secondary to mathematics (Syntopicon,  1992 ).

Pascal's rules for hypotheses

In his Scientific Treatises on Geometric Demonstrations , Pascal ( 1623‐1662b ) formulates “Five rules are absolutely necessary and we cannot dispense with them without an essential defect and frequently even error. Do not leave undefined any terms at all obscure or ambiguous. Use in definitions of terms only words perfectly well known or already explained. Do not fail to ask that each of the necessary principles be granted, however clear and evident it may be. Ask only that perfectly self‐evident things be granted as axioms. Prove all propositions, using for their proof only axioms that are perfectly self‐evident or propositions already demonstrated or granted. Never get caught in the ambiguity of terms by failing to substitute in thought the definitions which restrict or define them. One should accept as true only those things whose contradiction appears to be false. We may then boldly affirm the original statement, however incomprehensible it is.”

Kant's rules on hypotheses

Kant ( 1724–1804 ) wrote that the analysis described in his book The Critique of Pure Reason “has now taught us that all its efforts to extend the bounds of knowledge by means of pure speculation, are utterly fruitless. So much the wider field lies open to hypothesis; as where we cannot know with certainty, we are at liberty to make guesses and to form suppositions. Imagination may be allowed, under the strict surveillance of reason, to invent suppositions; but these must be based on something that is perfectly certain‐ and that is the possibility of the object. Such a supposition is termed a hypothesis. We cannot imagine or invent any object or any property of an object not given in experience and employ it in a hypothesis; otherwise we should be basing our chain of reasoning upon mere chimerical fancies and not upon conception of things. Thus, we have no right to assume of new powers, not existing in nature and consequently we cannot assume that there is any other kind of community among substances than that observable in experience, any kind of presence than that in space and any kind of duration than that in time. The conditions of possible experience are for reason the only conditions of the possibility of things. Otherwise, such conceptions, although not self‐contradictory, are without object and without application. Transcendental hypotheses are therefore inadmissible, and we cannot use the liberty of employing in the absence of physical, hyperphysical grounds of explanation because such hypotheses do not advance reason, but rather stop it in its progress. When the explanation of natural phenomena happens to be difficult, we have constantly at hand a transcendental ground of explanation, which lifts us above the necessity of investigating nature. The next requisite for the admissibility of a hypothesis is its sufficiency. That is it must determine a priori the consequences which are given in experience and which are supposed to follow from the hypothesis itself.” Kant stresses another aspect when dealing with hypotheses: “It is our duty to try to discover new objections, to put weapons in the hands of our opponent, and to grant him the most favorable position. We have nothing to fear from these concessions; on the contrary, we may rather hope that we shall thus make ourselves master of a possession which no one will ever venture to dispute.”

For Kant's analytical and synthetical judgements and Difference between philosophy and mathematics (Kant, Whitehead) , see Appendices  S1 and S2 , respectively.

Poincaré on hypotheses

The mathematician‐philosopher Poincaré ( 1854 –1912a) explored the foundation of mathematics and physics in his book Science and Hypothesis . In the preface to the book, he summarizes common thinking of scientists at the end of the 19th century. “To the superficial observer scientific truth is unassailable, the logic of science is infallible, and if scientific men sometimes make mistakes, it is because they have not understood the rules of the game. Mathematical truths are derived from a few self‐evident propositions, by a chain of flawless reasoning, they are imposed not only by us, but on Nature itself. This is for the minds of most people the origin of certainty in science.” Poincaré then continues “but upon more mature reflection the position held by hypothesis was seen; it was recognized that it is as necessary to the experimenter as it is to the mathematician. And then the doubt arose if all these constructions are built on solid foundations.” However, “to doubt everything or to believe everything are two equally convenient solutions: both dispense with the necessity of reflection. Instead, we should examine with the utmost care the role of hypothesis; we shall then recognize not only that it is necessary, but that in most cases it is legitimate. We shall also see that there are several kinds of hypotheses; that some are verifiable and when once confirmed by experiment become truths of great fertility; that others may be useful to us in fixing our ideas; and finally that others are hypotheses only in appearance, and reduce to definitions or to conventions in disguise.” Poincaré argues that “we must seek mathematical thought where it has remained pure‐i.e. in arithmetic, in the proofs of the most elementary theorems. The process is proof by recurrence. We first show that a theorem is true for n  = 1; we then show that if it is true for n –1 it is true for n; and we conclude that it is true for all integers. The essential characteristic of reasoning by recurrence is that it contains, condensed in a single formula, an infinite number of syllogisms.” Syllogism is logical argument that applies deductive reasoning to arrive at a conclusion. Poincaré notes “that here is a striking analogy with the usual process of induction. But an essential difference exists. Induction applied to the physical sciences is always uncertain because it is based on the belief in a general order of the universe, an order which is external to us. Mathematical induction‐ i.e. proof by recurrence – is on the contrary, necessarily imposed on us, because it is only the affirmation of a property of the mind itself. No doubt mathematical recurrent reasoning and physical inductive reasoning are based on different foundations, but they move in parallel lines and in the same direction‐namely, from the particular to the general.”

Non‐Euclidian geometry: from Gauss to Lobatschewsky

Mathematics is an abstract science that intrinsically does not request that the structures described reflect a physical reality. Paradoxically, mathematics is the language of physics since the founder of experimental physics Galilei used Euclidian geometry when exploring the laws of the free fall. In his 1623 treatise The Assayer , Galilei ( 1564 –1642a) famously formulated that the book of Nature is written in the language of mathematics, thus establishing a link between formal concepts in mathematics and the structure of the physical world. Euclid's parallel axiom played historically a prominent role for the connection between mathematical concepts and physical realities. Mathematicians had doubted that the parallel axiom was needed and tried to prove it. In Euclidian geometry, there is a connection between the parallel axiom and the sum of the angles in a triangle being two right angles. It is therefore revealing that the famous mathematician C.F. Gauss investigated in the early 19th century experimentally whether this Euclidian theorem applies in nature. He approached this problem by measuring the sum of angles in a real triangle by using geodetic angle measurements of three geographical elevations in the vicinity of Göttingen where he was teaching mathematics. He reportedly measured a sum of the angles in this triangle that differed from 180°. Gauss had at the same time also developed statistical methods to evaluate the accuracy of measurements. Apparently, the difference of his measured angles was still within the interval of Gaussian error propagation. He did not publish the reasoning and the results for this experiment because he feared the outcry of colleagues about this unorthodox, even heretical approach to mathematical reasoning (Carnap,  1891 ‐1970a). However, soon afterwards non‐Euclidian geometries were developed. In the words of Poincaré, “Lobatschewsky assumes at the outset that several parallels may be drawn through a point to a given straight line, and he retains all the other axioms of Euclid. From these hypotheses he deduces a series of theorems between which it is impossible to find any contradiction, and he constructs a geometry as impeccable in its logic as Euclidian geometry. The theorems are very different, however, from those to which we are accustomed, and at first will be found a little disconcerting. For instance, the sum of the angles of a triangle is always less than two right angles, and the difference between that sum and two right angles is proportional to the area of the triangle. Lobatschewsky's propositions have no relation to those of Euclid, but are none the less logically interconnected.” Poincaré continues “most mathematicians regard Lobatschewsky's geometry as a mere logical curiosity. Some of them have, however, gone further. If several geometries are possible, they say, is it certain that our geometry is true? Experiments no doubt teaches us that the sum of the angles of a triangle is equal to two right angles, but this is because the triangles we deal with are too small” (Poincaré,  1854 ‐1912a)—hence the importance of Gauss' geodetic triangulation experiment. Gauss was aware that his three hills experiment was too small and thought on measurements on triangles formed with stars.

Poincaré vs. Einstein

Lobatschewsky's hyperbolic geometry did not remain the only non‐Euclidian geometry. Riemann developed a geometry without the parallel axiom, while the other Euclidian axioms were maintained with the exception of that of Order (Anordnung). Poincaré notes “so there is a kind of opposition between the geometries. For instance the sum of the angles in a triangle is equal to two right angles in Euclid's geometry, less than two right angles in that of Lobatschewsky, and greater than two right angles in that of Riemann. The number of parallel lines that can be drawn through a given point to a given line is one in Euclid's geometry, none in Riemann's, and an infinite number in the geometry of Lobatschewsky. Let us add that Riemann's space is finite, although unbounded.” As further distinction, the ratio of the circumference to the diameter of a circle is equal to π in Euclid's, greater than π in Lobatschewsky's and smaller than π in Riemann's geometry. A further difference between these geometries concerns the degree of curvature (Krümmungsmass k) which is 0 for a Euclidian surface, smaller than 0 for a Lobatschewsky and greater than 0 for a Riemann surface. The difference in curvature can be roughly compared with plane, concave and convex surfaces. The inner geometric structure of a Riemann plane resembles the surface structure of a Euclidean sphere and a Lobatschewsky plane resembles that of a Euclidean pseudosphere (a negatively curved geometry of a saddle). What geometry is true? Poincaré asked “Ought we then, to conclude that the axioms of geometry are experimental truths?” and continues “If geometry were an experimental science, it would not be an exact science. The geometric axioms are therefore neither synthetic a priori intuitions as affirmed by Kant nor experimental facts. They are conventions. Our choice among all possible conventions is guided by experimental facts; but it remains free and is only limited by the necessity of avoiding contradictions. In other words, the axioms of geometry are only definitions in disguise. What then are we to think of the question: Is Euclidean geometry true? It has no meaning. One geometry cannot be more true than another, it can only be more convenient. Now, Euclidean geometry is, and will remain, the most convenient, 1 st because it is the simplest and 2 nd because it sufficiently agrees with the properties of natural bodies” (Poincaré,  1854 ‐1912a).

Poincaré's book was published in 1903 and only a few years later Einstein published his general theory of relativity ( 1916 ) where he used a non‐Euclidean, Riemann geometry and where he demonstrated a structure of space that deviated from Euclidean geometry in the vicinity of strong gravitational fields. And in 1919, astronomical observations during a solar eclipse showed that light rays from a distant star were indeed “bent” when passing next to the sun. These physical observations challenged the view of Poincaré, and we should now address some aspects of hypotheses in physics (Carnap,  1891 ‐1970b).


The long life of the five elements hypothesis.

Physical sciences—not to speak of biological sciences — were less developed in antiquity than mathematics which is already demonstrated by the primitive ideas on the elements constituting physical bodies. Plato and Aristotle spoke of the four elements which they took over from Thales (water), Anaximenes (air) and Parmenides (fire and earth) and add a fifth element (quinta essentia, our quintessence), namely ether. Ether is imagined a heavenly element belonging to the supralunar world. In Plato's dialogue Timaios (Plato,  c.424‐c.348 BC a ), the five elements were associated with regular polyhedra in geometry and became known as Platonic bodies: tetrahedron (fire), octahedron (air), cube (earth), icosahedron (water) and dodecahedron (ether). In regular polyhedra, faces are congruent (identical in shape and size), all angles and all edges are congruent, and the same number of faces meet at each vertex. The number of elements is limited to five because in Euclidian space there are exactly five regular polyhedral. There is in Plato's writing even a kind of geometrical chemistry. Since two octahedra (air) plus one tetrahedron (fire) can be combined into one icosahedron (water), these “liquid” elements can combine while this is not the case for combinations with the cube (earth). The 12 faces of the dodecahedron were compared with the 12 zodiac signs (Mittelstrass,  1980e ). This geometry‐based hypothesis of physics had a long life. As late as 1612, Kepler in his Mysterium cosmographicum tried to fit the Platonic bodies into the planetary shells of his solar system model. The ether theory even survived into the scientific discussion of the 19th‐century physics and the idea of a mathematical structure of the universe dominated by symmetry operations even fertilized 20th‐century ideas about symmetry concepts in the physics of elementary particles.

Huygens on sound waves in air

The ether hypothesis figures prominently in the 1690 Treatise on Light from Huygens ( 1617‐1670 ). He first reports on the transmission of sound by air when writing “this may be proved by shutting up a sounding body in a glass vessel from which the air is withdrawn and care was taken to place the sounding body on cotton that it cannot communicate its tremor to the glass vessel which encloses it. After having exhausted all the air, one hears no sound from the metal though it is struck.” Huygens comes up with some foresight when suspecting “the air is of such a nature that it can be compressed and reduced to a much smaller space than that it normally occupies. Air is made up of small bodies which float about and which are agitated very rapidly. So that the spreading of sound is the effort which these little bodies make in collisions with one another, to regain freedom when they are a little more squeezed together in the circuit of these waves than elsewhere.”

Huygens on light waves in ether

“That is not the same air but another kind of matter in which light spreads; since if the air is removed from the vessel the light does not cease to traverse it as before. The extreme velocity of light cannot admit such a propagation of motion” as sound waves. To achieve the propagation of light, Huygens invokes ether “as a substance approaching to perfect hardness and possessing springiness as prompt as we choose. One may conceive light to spread successively by spherical waves. The propagation consists nowise in the transport of those particles but merely in a small agitation which they cannot help communicate to those surrounding.” The hypothesis of an ether in outer space fills libraries of physical discussions, but all experimental approaches led to contradictions with respect to postulated properties of this hypothetical material for example when optical experiments showed that light waves display transversal and not longitudinal oscillations.

The demise of ether

Mechanical models for the transmission of light or gravitation waves requiring ether were finally put to rest by the theory of relativity from Einstein (Mittelstrass,  1980f ). This theory posits that the speed of light in an empty space is constant and does not depend on movements of the source of light or that of an observer as requested by the ether hypothesis. The theory of relativity also provides an answer how the force of gravitation is transmitted from one mass to another across an essentially empty space. In the non‐Euclidian formulation of the theory of relativity (Einstein used the Riemann geometry), there is no gravitation force in the sense of mechanical or electromagnetic forces. The gravitation force is in this formulation simply replaced by a geometric structure (space curvature near high and dense masses) of a four‐dimensional space–time system (Carnap,  1891 ‐1970c; Einstein & Imfeld,  1956 ) Gravitation waves and gravitation lens effects have indeed been experimental demonstrated by astrophysicists (Dorfmüller et al.,  1998 ).

For Aristotle's on physical hypotheses , see Appendix  S3 .


In the following, the opinions of a number of famous scientists and philosophers on hypotheses are quoted to provide a historical overview on the subject.

Copernicus' hypothesis: a calculus which fits observations

In his book Revolutions of Heavenly Spheres Copernicus ( 1473–1543 ) reasoned in the preface about hypotheses in physics. “Since the newness of the hypotheses of this work ‐which sets the earth in motion and puts an immovable sun at the center of the universe‐ has already received a great deal of publicity, I have no doubt that certain of the savants have taken great offense.” He defended his heliocentric thesis by stating “For it is the job of the astronomer to use painstaking and skilled observations in gathering together the history of the celestial movements‐ and then – since he cannot by any line of reasoning reach the true causes of these movements‐ to think up or construct whatever causes or hypotheses he pleases such that, by the assumption of these causes, those same movements can be calculated from the principles of geometry for the past and the future too. This artist is markedly outstanding in both of these respects: for it is not necessary that these hypotheses should be true, or even probable; but it is enough if they provide a calculus which fits the observations.” This preface written in 1543 sounds in its arguments very modern physics. However, historians of science have discovered that it was probably written by a theologian friend of Copernicus to defend the book against the criticism by the church.

Bacon's intermediate hypotheses

In his book Novum Organum , Francis Bacon ( 1561–1626 ) claims for hypotheses and scientific reasoning “that they augur well for the sciences, when the ascent shall proceed by a true scale and successive steps, without interruption or breach, from particulars to the lesser axioms, thence to the intermediates and lastly to the most general.” He then notes “that the lowest axioms differ but little from bare experiments, the highest and most general are notional, abstract, and of no real weight. The intermediate are true, solid, full of life, and up to them depend the business and fortune of mankind.” He warns that “we must not then add wings, but rather lead and ballast to the understanding, to prevent its jumping and flying, which has not yet been done; but whenever this takes place we may entertain greater hopes of the sciences.” With respect to methodology, Bacon claims that “we must invent a different form of induction. The induction which proceeds by simple enumeration is puerile, leads to uncertain conclusions, …deciding generally from too small a number of facts. Sciences should separate nature by proper rejections and exclusions and then conclude for the affirmative, after collecting a sufficient number of negatives.”

Gilbert and Descartes for plausible hypotheses

William Gilbert introduced in his book On the Loadstone (Gilbert,  1544‐1603 ) the argument of plausibility into physical hypothesis building. “From these arguments, therefore, we infer not with mere probability, but with certainty, the diurnal rotation of the earth; for nature ever acts with fewer than with many means; and because it is more accordant to reason that the one small body, the earth, should make a daily revolution than the whole universe should be whirled around it.”

Descartes ( 1596‐1650 ) reflected on the sources of understanding in his book Rules for Direction and distinguished what “comes about by impulse, by conjecture, or by deduction. Impulse can assign no reason for their belief and when determined by fanciful disposition, it is almost always a source of error.” When speaking about the working of conjectures he quotes thoughts of Aristotle: “water which is at a greater distance from the center of the globe than earth is likewise less dense substance, and likewise the air which is above the water, is still rarer. Hence, we hazard the guess that above the air nothing exists but a very pure ether which is much rarer than air itself. Moreover nothing that we construct in this way really deceives, if we merely judge it to be probable and never affirm it to be true; in fact it makes us better instructed. Deduction is thus left to us as the only means of putting things together so as to be sure of their truth. Yet in it, too, there may be many defects.”

Care in formulating hypotheses

Locke ( 1632‐1704 ) in his treatise Concerning Human Understanding admits that “we may make use of any probable hypotheses whatsoever. Hypotheses if they are well made are at least great helps to the memory and often direct us to new discoveries. However, we should not take up any one too hastily.” Also, practising scientists argued against careless use of hypotheses and proposed remedies. Lavoisier ( 1743‐1794 ) in the preface to his Element of Chemistry warned about beaten‐track hypotheses. “Instead of applying observation to the things we wished to know, we have chosen rather to imagine them. Advancing from one ill‐founded supposition to another, we have at last bewildered ourselves amidst a multitude of errors. These errors becoming prejudices, are adopted as principles and we thus bewilder ourselves more and more. We abuse words which we do not understand. There is but one remedy: this is to forget all that we have learned, to trace back our ideas to their sources and as Bacon says to frame the human understanding anew.”

Faraday ( 1791–1867 ) in a Speculation Touching Electric Conduction and the Nature of Matter highlighted the fundamental difference between hypotheses and facts when noting “that he has most power of penetrating the secrets of nature, and guessing by hypothesis at her mode of working, will also be most careful for his own safe progress and that of others, to distinguish that knowledge which consists of assumption, by which I mean theory and hypothesis, from that which is the knowledge of facts and laws; never raising the former to the dignity or authority of the latter.”

Explicatory power justifies hypotheses

Darwin ( 1809 –1882a) defended the conclusions and hypothesis of his book The Origin of Species “that species have been modified in a long course of descent. This has been affected chiefly through the natural selection of numerous, slight, favorable variations.” He uses a post hoc argument for this hypothesis: “It can hardly be supposed that a false theory would explain, to so satisfactory a manner as does the theory of natural selection, the several large classes of facts” described in his book.

The natural selection of hypotheses

In the concluding chapter of The Descent of Man Darwin ( 1809 –1882b) admits “that many of the views which have been advanced in this book are highly speculative and some no doubt will prove erroneous.” However, he distinguished that “false facts are highly injurious to the progress of science for they often endure long; but false views do little harm for everyone takes a salutory pleasure in proving their falseness; and when this is done, one path to error is closed and the road to truth is often at the same time opened.”

The American philosopher William James ( 1842–1907 ) concurred with Darwin's view when he wrote in his Principles of Psychology “every scientific conception is in the first instance a spontaneous variation in someone'’s brain. For one that proves useful and applicable there are a thousand that perish through their worthlessness. The scientific conceptions must prove their worth by being verified. This test, however, is the cause of their preservation, not of their production.”

The American philosopher J. Dewey ( 1859‐1952 ) in his treatise Experience and Education notes that “the experimental method of science attaches more importance not less to ideas than do other methods. There is no such thing as experiment in the scientific sense unless action is directed by some leading idea. The fact that the ideas employed are hypotheses, not final truths, is the reason why ideas are more jealously guarded and tested in science than anywhere else. As fixed truths they must be accepted and that is the end of the matter. But as hypotheses, they must be continuously tested and revised, a requirement that demands they be accurately formulated. Ideas or hypotheses are tested by the consequences which they produce when they are acted upon. The method of intelligence manifested in the experimental method demands keeping track of ideas, activities, and observed consequences. Keeping track is a matter of reflective review.”

The reductionist principle

James ( 1842‐1907 ) pushed this idea further when saying “Scientific thought goes by selection. We break the solid plenitude of fact into separate essences, conceive generally what only exists particularly, and by our classifications leave nothing in its natural neighborhood. The reality exists as a plenum. All its part are contemporaneous, but we can neither experience nor think this plenum. What we experience is a chaos of fragmentary impressions, what we think is an abstract system of hypothetical data and laws. We must decompose each chaos into single facts. We must learn to see in the chaotic antecedent a multitude of distinct antecedents, in the chaotic consequent a multitude of distinct consequents.” From these considerations James concluded “even those experiences which are used to prove a scientific truth are for the most part artificial experiences of the laboratory gained after the truth itself has been conjectured. Instead of experiences engendering the inner relations, the inner relations are what engender the experience here.“

Following curiosity

Freud ( 1856–1939 ) considered curiosity and imagination as driving forces of hypothesis building which need to be confronted as quickly as possible with observations. In Beyond the Pleasure Principle , Freud wrote “One may surely give oneself up to a line of thought and follow it up as far as it leads, simply out of scientific curiosity. These innovations were direct translations of observation into theory, subject to no greater sources of error than is inevitable in anything of the kind. At all events there is no way of working out this idea except by combining facts with pure imagination and thereby departing far from observation.” This can quickly go astray when trusting intuition. Freud recommends “that one may inexorably reject theories that are contradicted by the very first steps in the analysis of observation and be aware that those one holds have only a tentative validity.”

Feed‐forward aspects of hypotheses

The geneticist Waddington ( 1905–1975 ) in his essay The Nature of Life states that “a scientific theory cannot remain a mere structure within the world of logic, but must have implications for action and that in two rather different ways. It must involve the consequence that if you do so and so, such and such result will follow. That is to say it must give, or at least offer, the possibility of controlling the process. Secondly, its value is quite largely dependent on its power of suggesting the next step in scientific advance. Any complete piece of scientific work starts with an activity essentially the same as that of an artist. It starts by asking a relevant question. The first step may be a new awareness of some facet of the world that no one else had previously thought worth attending to. Or some new imaginative idea which depends on a sensitive receptiveness to the oddity of nature essentially similar to that of the artist. In his logical analysis and manipulative experimentation, the scientist is behaving arrogantly towards nature, trying to force her into his categories of thought or to trick her into doing what he wants. But finally he has to be humble. He has to take his intuition, his logical theory and his manipulative skill to the bar of Nature and see whether she answers yes or no; and he has to abide by the result. Science is often quite ready to tolerate some logical inadequacy in a theory‐or even a flat logical contradiction like that between the particle and wave theories of matter‐so long as it finds itself in the possession of a hypothesis which offers both the possibility of control and a guide to worthwhile avenues of exploration.”

Poincaré: the dialogue between experiment and hypothesis

Poincaré ( 1854 –1912b) also dealt with physics in Science and Hypothesis . “Experiment is the sole source of truth. It alone can teach us certainty. Cannot we be content with experiment alone? What place is left for mathematical physics? The man of science must work with method. Science is built up of facts, as a house is built of stones, but an accumulation of facts is no more a science than a heap of stones is a house. It is often said that experiments should be made without preconceived concepts. That is impossible. Without the hypothesis, no conclusion could have been drawn; nothing extraordinary would have been seen; and only one fact the more would have been catalogued, without deducing from it the remotest consequence.” Poincaré compares science to a library. Experimental physics alone can enrich the library with new books, but mathematical theoretical physics draw up the catalogue to find the books and to reveal gaps which have to be closed by the purchase of new books.

Poincaré: false, true, fruitful and dangerous hypotheses

Poincaré continues “we all know that there are good and bad experiments. The latter accumulate in vain. Whether there are hundred or thousand, one single piece of work will be sufficient to sweep them into oblivion. Bacon invented the term of an experimentum crucis for such experiments. What then is a good experiment? It is that which teaches us something more than an isolated fact. It is that which enables us to predict and to generalize. Experiments only gives us a certain number of isolated points. They must be connected by a continuous line and that is true generalization. Every generalization is a hypothesis. It should be as soon as possible submitted to verification. If it cannot stand the test, it must be abandoned without any hesitation. The physicist who has just given up one of his hypotheses should rejoice, for he found an unexpected opportunity of discovery. The hypothesis took into account all the known factors which seem capable of intervention in the phenomenon. If it is not verified, it is because there is something unexpected. Has the hypothesis thus rejected been sterile? Far from it. It has rendered more service than a true hypothesis.” Poincaré notes that “with a true hypothesis only one fact the more would have been catalogued, without deducing from it the remotest consequence. It may be said that the wrong hypothesis has rendered more service than a true hypothesis.” However, Poincaré warns that “some hypotheses are dangerous – first and foremost those which are tacit and unconscious. And since we make them without knowing them, we cannot get rid of them.” Poincaré notes that here mathematical physics is of help because by its precision one is compelled to formulate all the hypotheses, revealing also the tacit ones.

Arguments for the reductionist principle

Poincaré also warned against multiplying hypotheses indefinitely: “If we construct a theory upon multiple hypotheses, and if experiment condemns it, which of the premisses must be changed?” Poincaré also recommended to “resolve the complex phenomenon given directly by experiment into a very large number of elementary phenomena. First, with respect to time. Instead of embracing in its entirety the progressive development of a phenomenon, we simply try to connect each moment with the one immediately preceding. Next, we try to decompose the phenomenon in space. We must try to deduce the elementary phenomenon localized in a very small region of space.” Poincaré suggested that the physicist should “be guided by the instinct of simplicity, and that is why in physical science generalization so readily takes the mathematical form to state the problem in the form of an equation.” This argument goes back to Galilei ( 1564 –1642b) who wrote in The Two Sciences “when I observe a stone initially at rest falling from an elevated position and continually acquiring new increments of speed, why should I not believe that such increases take place in a manner which is exceedingly simple and rather obvious to everybody? If now we examine the matter carefully we find no addition or increment more simple than that which repeats itself always in the same manner. It seems we shall not be far wrong if we put the increment of speed as proportional to the increment of time.” With a bit of geometrical reasoning, Galilei deduced that the distance travelled by a freely falling body varies as the square of the time. However, Galilei was not naïve and continued “I grant that these conclusions proved in the abstract will be different when applied in the concrete” and considers disturbances cause by friction and air resistance that complicate the initially conceived simplicity.

Four sequential steps of discovery…

Some philosophers of science attributed a fundamental importance to observations for the acquisition of experience in science. The process starts with accidental observations (Aristotle), going to systematic observations (Bacon), leading to quantitative rules obtained with exact measurements (Newton and Kant) and culminating in observations under artificially created conditions in experiments (Galilei) (Mittelstrass,  1980g ).

…rejected by Popper and Kant

In fact, Newton wrote that he had developed his theory of gravitation from experience followed by induction. K. Popper ( 1902‐1994 ) in his book Conjectures and Refutations did not agree with this logical flow “experience leading to theory” and that for several reasons. This scheme is according to Popper intuitively false because observations are always inexact, while theory makes absolute exact assertions. It is also historically false because Copernicus and Kepler were not led to their theories by experimental observations but by geometry and number theories of Plato and Pythagoras for which they searched verifications in observational data. Kepler, for example, tried to prove the concept of circular planetary movement influenced by Greek theory of the circle being a perfect geometric figure and only when he could not demonstrate this with observational data, he tried elliptical movements. Popper noted that it was Kant who realized that even physical experiments are not prior to theories when quoting Kant's preface to the Critique of Pure Reason : “When Galilei let his globes run down an inclined plane with a gravity which he has chosen himself, then a light dawned on all natural philosophers. They learnt that our reason can only understand what it creates according to its own design; that we must compel Nature to answer our questions, rather than cling to Nature's apron strings and allow her to guide us. For purely accidental observations, made without any plan having been thought out in advance, cannot be connected by a law‐ which is what reason is searching for.” From that reasoning Popper concluded that “we ourselves must confront nature with hypotheses and demand a reply to our questions; and that lacking such hypotheses, we can only make haphazard observations which follow no plan and which can therefore never lead to a natural law. Everyday experience, too, goes far beyond all observations. Everyday experience must interpret observations for without theoretical interpretation, observations remain blind and uninformative. Everyday experience constantly operates with abstract ideas, such as that of cause and effect, and so it cannot be derived from observation.” Popper agreed with Kant who said “Our intellect does not draw its laws from nature…but imposes them on nature”. Popper modifies this statement to “Our intellect does not draw its laws from nature, but tries‐ with varying degrees of success – to impose upon nature laws which it freely invents. Theories are seen to be free creations of our mind, the result of almost poetic intuition. While theories cannot be logically derived from observations, they can, however, clash with observations. This fact makes it possible to infer from observations that a theory is false. The possibility of refuting theories by observations is the basis of all empirical tests. All empirical tests are therefore attempted refutations.”


Is biology special.

Waddington notes that “living organisms are much more complicated than the non‐living things. Biology has therefore developed more slowly than sciences such as physics and chemistry and has tended to rely on them for many of its basic ideas. These older physical sciences have provided biology with many firm foundations which have been of the greatest value to it, but throughout most of its history biology has found itself faced with the dilemma as to how far its reliance on physics and chemistry should be pushed” both with respect to its experimental methods and its theoretical foundations. Vitalism is indeed such a theory maintaining that organisms cannot be explained solely by physicochemical laws claiming specific biological forces active in organisms. However, efforts to prove the existence of such vital forces have failed and today most biologists consider vitalism a superseded theory.

Biology as a branch of science is as old as physics. If one takes Aristotle as a reference, he has written more on biology than on physics. Sophisticated animal experiments were already conducted in the antiquity by Galen (Brüssow, 2022 ). Alertus Magnus displayed biological research interest during the medieval time. Knowledge on plants provided the basis of medical drugs in early modern times. What explains biology's decreasing influence compared with the rapid development of physics by Galilei and Newton? One reason is the possibility to use mathematical equations to describe physical phenomena which was not possible for biological phenomena. Physics has from the beginning displayed a trend to few fundamental underlying principles. This is not the case for biology. With the discovery of new continents, biologists were fascinated by the diversity of life. Diversity was the conducting line of biological thinking. This changed only when taxonomists and comparative anatomists revealed recurring pattern in this stunning biological variety and when Darwin provided a theoretical concept to understand variation as a driving force in biology. Even when genetics and molecular biology allowed to understand biology from a few universally shared properties, such as a universal genetic code, biology differed in fundamental aspects from physics and chemistry. First, biology is so far restricted to the planet earth while the laws of physic and chemistry apply in principle to the entire universe. Second, biology is to a great extent a historical discipline; many biological processes cannot be understood from present‐day observations because they are the result of historical developments in evolution. Hence, the importance of Dobzhansky's dictum that nothing makes sense in biology except in the light of evolution. The great diversity of life forms, the complexity of processes occurring in cells and their integration in higher organisms and the importance of a historical past for the understanding of extant organisms, all that has delayed the successful application of mathematical methods in biology or the construction of theoretical frameworks in biology. Theoretical biology by far did not achieve a comparable role as theoretical physics which is on equal foot with experimental physics. Many biologists are even rather sceptical towards a theoretical biology and see progress in the development of ever more sophisticated experimental methods instead in theoretical concepts expressed by new hypotheses.

Knowledge from data without hypothesis?

Philosophers distinguish rational knowledge ( cognitio ex principiis ) from knowledge from data ( cognitio ex data ). Kant associates these two branches with natural sciences and natural history, respectively. The latter with descriptions of natural objects as prominently done with systematic classification of animals and plants or, where it is really history, when describing events in the evolution of life forms on earth. Cognitio ex data thus played a much more prominent role in biology than in physics and explains why the compilation of data and in extremis the collection of museum specimen characterizes biological research. To account for this difference, philosophers of the logical empiricism developed a two‐level concept of science languages consisting of a language of observations (Beobachtungssprache) and a language of theories (Theoriesprache) which are linked by certain rules of correspondence (Korrespondenzregeln) (Carnap,  1891 –1970d). If one looks into leading biological research journals, it becomes clear that biology has a sophisticated language of observation and a much less developed language of theories.

Do we need more philosophical thinking in biology or at least a more vigorous theoretical biology? The breathtaking speed of progress in experimental biology seems to indicate that biology can well develop without much theoretical or philosophical thinking. At the same time, one could argue that some fields in biology might need more theoretical rigour. Microbiologists might think on microbiome research—one of the breakthrough developments of microbiology research in recent years. The field teems with fascinating, but ill‐defined terms (our second genome; holobionts; gut–brain axis; dysbiosis, symbionts; probiotics; health benefits) that call for stricter definitions. One might also argue that biologists should at least consider the criticism of Goethe ( 1749–1832 ), a poet who was also an active scientist. In Faust , the devil ironically teaches biology to a young student.

“Wer will was Lebendigs erkennen und beschreiben, Sucht erst den Geist herauszutreiben, Dann hat er die Teile in seiner Hand, Fehlt, leider! nur das geistige Band.” (To docket living things past any doubt. You cancel first the living spirit out: The parts lie in the hollow of your hand, You only lack the living thing you banned).

We probably need both in biology: more data and more theory and hypotheses.


The author reports no conflict of interest.


No funding information provided.

Supporting information

Appendix S1

Brüssow, H. (2022) On the role of hypotheses in science . Microbial Biotechnology , 15 , 2687–2698. Available from: 10.1111/1751-7915.14141 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

  • Bacon, F. (1561. –1626) Novum Organum. In: Adler, M.J. (Ed.) (editor‐in‐chief) Great books of the western world . Chicago, IL: Encyclopaedia Britannica, Inc. 2nd edition 1992 vol 1–60 (abbreviated below as GBWW) here: GBWW vol. 28: 128. [ Google Scholar ]
  • Brüssow, H. (2022) What is Truth – in science and beyond . Environmental Microbiology , 24 , 2895–2906. [ PubMed ] [ Google Scholar ]
  • Carnap, R. (1891. ‐1970a) Philosophical foundations of physics. Ch. 14 . Basic Books, Inc., New York, 1969. [ Google Scholar ]
  • Carnap, R. (1891. ‐1970b) Philosophical foundations of physics. Ch. 15 . Basic Books, Inc., New York, 1969. [ Google Scholar ]
  • Carnap, R. (1891. ‐1970c) Philosophical foundations of physics. Ch. 16 . Basic Books, Inc., New York, 1969. [ Google Scholar ]
  • Carnap, R. (1891. ‐1970d) Philosophical foundations of physics. Ch. 27–28 . Basic Books, Inc., New York, 1969. [ Google Scholar ]
  • Copernicus . (1473. ‐1543) Revolutions of heavenly spheres . GBWW , vol. 15 , 505–506. [ Google Scholar ]
  • Darwin, C. (1809. ‐1882a) The origin of species . GBWW , vol. 49 : 239. [ Google Scholar ]
  • Darwin, C. (1809. ‐1882b) The descent of man . GBWW , vol. 49 : 590. [ Google Scholar ]
  • Descartes, R. (1596. ‐1650) Rules for direction . GBWW , vol. 28 , 245. [ Google Scholar ]
  • Dewey, J. (1859. –1952) Experience and education . GBWW , vol. 55 , 124. [ Google Scholar ]
  • Dorfmüller, T. , Hering, W.T. & Stierstadt, K. (1998) Bergmann Schäfer Lehrbuch der Experimentalphysik: Band 1 Mechanik, Relativität, Wärme. In: Was ist Schwerkraft: Von Newton zu Einstein . Berlin, New York: Walter de Gruyter, pp. 197–203. [ Google Scholar ]
  • Einstein, A. (1916) Relativity . GBWW , vol. 56 , 191–243. [ Google Scholar ]
  • Einstein, A. & Imfeld, L. (1956) Die Evolution der Physik . Hamburg: Rowohlts deutsche Enzyklopädie, Rowohlt Verlag. [ Google Scholar ]
  • Euclid . (c.323‐c.283) The elements . GBWW , vol. 10 , 1–2. [ Google Scholar ]
  • Faraday, M. (1791. –1867) Speculation touching electric conduction and the nature of matter . GBWW , 42 , 758–763. [ Google Scholar ]
  • Freud, S. (1856. –1939) Beyond the pleasure principle . GBWW , vol. 54 , 661–662. [ Google Scholar ]
  • Galilei, G. (1564. ‐1642a) The Assayer, as translated by S. Drake (1957) Discoveries and Opinions of Galileo pp. 237–8 abridged pdf at Stanford University .
  • Galilei, G. (1564. ‐1642b) The two sciences . GBWW vol. 26 : 200. [ Google Scholar ]
  • Gilbert, W. (1544. ‐1603) On the Loadstone . GBWW , vol. 26 , 108–110. [ Google Scholar ]
  • Goethe, J.W. (1749. –1832) Faust . GBWW , vol. 45 , 20. [ Google Scholar ]
  • Hilbert, D. (1899) Grundlagen der Geometrie . Leipzig, Germany: Verlag Teubner. [ Google Scholar ]
  • Huygens, C. (1617. ‐1670) Treatise on light . GBWW , vol. 32 , 557–560. [ Google Scholar ]
  • James, W. (1842. –1907) Principles of psychology . GBWW , vol. 53 , 862–866. [ Google Scholar ]
  • Kant, I. (1724. –1804) Critique of pure reason . GBWW , vol. 39 , 227–230. [ Google Scholar ]
  • Lavoisier, A.L. (1743. ‐1794) Element of chemistry . GBWW , vol. 42 , p. 2, 6‐7, 9‐10. [ Google Scholar ]
  • Locke, J. (1632. ‐1704) Concerning Human Understanding . GBWW , vol. 33 , 317–362. [ Google Scholar ]
  • Mittelstrass, J. (1980a) Enzyklopädie Philosophie und Wissenschaftstheorie Bibliographisches Institut Mannheim, Wien, Zürich B.I. Wissenschaftsverlag Vol. 1: 239–241 .
  • Mittelstrass, J. (1980b) Enzyklopädie Philosophie und Wissenschaftstheorie Bibliographisches Institut Mannheim, Wien, Zürich B.I. Wissenschaftsverlag Vol. 3: 307 .
  • Mittelstrass, J. (1980c) Enzyklopädie Philosophie und Wissenschaftstheorie Bibliographisches Institut Mannheim, Wien, Zürich B.I. Wissenschaftsverlag Vol. 1: 439–442 .
  • Mittelstrass, J. (1980d) Enzyklopädie Philosophie und Wissenschaftstheorie Bibliographisches Institut Mannheim, Wien, Zürich B.I. Wissenschaftsverlag Vol. 2: 157–158 .
  • Mittelstrass, J. (1980e) Enzyklopädie Philosophie und Wissenschaftstheorie Bibliographisches Institut Mannheim, Wien, Zürich B.I. Wissenschaftsverlag Vol. 3: 264‐267, 449.450 .
  • Mittelstrass, J. (1980f) Enzyklopädie Philosophie und Wissenschaftstheorie Bibliographisches Institut Mannheim, Wien, Zürich B.I. Wissenschaftsverlag Vol. 1: 209–210 .
  • Mittelstrass, J. (1980g) Enzyklopädie Philosophie und Wissenschaftstheorie Bibliographisches Institut Mannheim, Wien, Zürich B.I. Wissenschaftsverlag Vol. 1: 281–282 .
  • Pascal, B. (1623. ‐1662a) Pensées GBWW vol. 30 : 171–173. [ Google Scholar ]
  • Pascal, B. (1623. ‐1662b) Scientific treatises on geometric demonstrations . GBWW vol. 30 : 442–443. [ Google Scholar ]
  • Plato . (c.424‐c.348 BC a) Timaeus . GBWW , vol. 6 , 442–477. [ Google Scholar ]
  • Poincaré, H. (1854. ‐1912a) Science and hypothesis GBWW , vol. 56 : XV‐XVI, 1–5, 10–15 [ Google Scholar ]
  • Poincaré, H. (1854. ‐1912b) Science and hypothesis GBWW , vol. 56 : 40–52. [ Google Scholar ]
  • Popper, K. (1902. ‐1994) Conjectures and refutations . London and New York, 2002: The Growth of Scientific Knowledge Routledge Classics, pp. 249–261. [ Google Scholar ]
  • Syntopicon . (1992) Hypothesis . GBWW , vol. 1 , 576–587. [ Google Scholar ]
  • Waddington, C.H. (1905. –1975) The nature of life . GBWW , vol. 56 , 697–699. [ Google Scholar ]
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

importance of a hypothesis in a scientific investigation

Understanding Science

How science REALLY works...

  • Understanding Science 101
  • There are many routes into the process of science.
  • The process of science involves testing ideas with evidence, getting input from the scientific community, and interacting with the larger society.

A blueprint for scientific investigations

A scaffold for scientific investigations The process of science involves many layers of complexity, but the key points of that process are straightforward:

There are many routes into the process, including serendipity (e.g., being hit on the head by the proverbial apple), concern over a practical problem (e.g., finding a new treatment for diabetes), and a technological development (e.g., the launch of a more advanced telescope). Scientists often begin an investigation by plain old poking around: tinkering, brainstorming, trying to make some new observations , chatting with colleagues about an idea, or doing some reading.

Scientific testing is at the heart of the process. In science, all ideas are tested with evidence from the natural world , which may take many different forms —Antarctic ice cores, particle accelerator experiments , or detailed descriptions of sedimentary rock layers. You can’t move through the process of science without examining how that evidence reflects on your ideas about how the world works — even if that means giving up a favorite hypothesis .

The scientific community helps ensure science’s accuracy. Members of the scientific community (i.e., researchers, technicians, educators, and students, to name a few) play many roles in the process of science, but are especially important in generating ideas, scrutinizing ideas, and weighing the evidence for and against them. Through the action of this community, science is self-correcting. For example, in the 1990s, John Christy and Roy Spencer reported that temperature measurements taken by satellite, instead of from the Earth’s surface, seemed to indicate that the Earth was cooling, not warming. However, other researchers soon pointed out that those measurements didn’t correct for the fact that satellites slowly lose altitude as they orbit. Once these corrections were made, the satellite measurements were much more consistent with the warming trend observed at the surface. Christy and Spencer immediately acknowledged the need for that correction.

The process of science is intertwined with society. The process of science both influences society (e.g., investigations of X-rays leading to the development of CT scanners) and is influenced by society (e.g., a society’s concern about the spread of HIV leading to studies of the molecular interactions within the immune system).

Now that you have an overview of the process of science, get the details on each of the main activities above. Here are three ways to explore:

  • Learn by example . Explore  Asteroids and dinosaurs , which traces the path of scientists through the flowchart as they investigate the events surrounding the extinction of the dinosaurs.
  • Pick and choose . Use the flowchart interactively to learn more about different parts of the process.
  • Or simply read on for a guided tour of the process of science…
  • Teaching resources
  • Use our  web interactive  to help students document and reflect on the process of science.
  • Learn strategies for building lessons and activities around the Science Flowchart: Grades 3-5 Grades 6-8 Grades 9-12 Grades 13-16
  • Find lesson plans for introducing the Science Flowchart to your students in: Grades 3-5 Grades 6-8 Grades 9-16
  • Get  graphics and pdfs of the Science Flowchart  to use in your classroom. Translations are available in Spanish, French, Japanese, and Swahili.
  • Introduce the flowchart to your class with a short video. Videos about a  study of spiders  and a  study of climate change  are available.

The real process of science

Exploration and discovery

Subscribe to our newsletter

  • The science flowchart
  • Science stories
  • Grade-level teaching guides
  • Teaching resource database
  • Journaling tool
  • Misconceptions

National Academies Press: OpenBook

Taking Science to School: Learning and Teaching Science in Grades K-8 (2007)

Chapter: 5 generating and evaluating scientific evidence and explanations, 5 generating and evaluating scientific evidence and explanations.

Major Findings in the Chapter:

Children are far more competent in their scientific reasoning than first suspected and adults are less so. Furthermore, there is great variation in the sophistication of reasoning strategies across individuals of the same age.

In general, children are less sophisticated than adults in their scientific reasoning. However, experience plays a critical role in facilitating the development of many aspects of reasoning, often trumping age.

Scientific reasoning is intimately intertwined with conceptual knowledge of the natural phenomena under investigation. This conceptual knowledge sometimes acts as an obstacle to reasoning, but often facilitates it.

Many aspects of scientific reasoning require experience and instruction to develop. For example, distinguishing between theory and evidence and many aspects of modeling do not emerge without explicit instruction and opportunities for practice.

In this chapter, we discuss the various lines of research related to Strand 2—generate and evaluate evidence and explanations. 1 The ways in which

scientists generate and evaluate scientific evidence and explanations have long been the focus of study in philosophy, history, anthropology, and sociology. More recently, psychologists and learning scientists have begun to study the cognitive and social processes involved in building scientific knowledge. For our discussion, we draw primarily from the past 20 years of research in developmental and cognitive psychology that investigates how children’s scientific thinking develops across the K-8 years.

We begin by developing a broad sketch of how key aspects of scientific thinking develop across the K-8 years, contrasting children’s abilities with those of adults. This contrast allows us to illustrate both how children’s knowledge and skill can develop over time and situations in which adults’ and children’s scientific thinking are similar. Where age differences exist, we comment on what underlying mechanisms might be responsible for them. In this research literature, two broad themes emerge, which we take up in detail in subsequent sections of the chapter. The first is the role of prior knowledge in scientific thinking at all ages. The second is the importance of experience and instruction.

Scientific investigation, broadly defined, includes numerous procedural and conceptual activities, such as asking questions, hypothesizing, designing experiments, making predictions, using apparatus, observing, measuring, being concerned with accuracy, precision, and error, recording and interpreting data, consulting data records, evaluating evidence, verification, reacting to contradictions or anomalous data, presenting and assessing arguments, constructing explanations (to oneself and others), constructing various representations of the data (graphs, maps, three-dimensional models), coordinating theory and evidence, performing statistical calculations, making inferences, and formulating and revising theories or models (e.g., Carey et al., 1989; Chi et al., 1994; Chinn and Malhotra, 2001; Keys, 1994; McNay and Melville, 1993; Schauble et al., 1995; Slowiaczek et al., 1992; Zachos et al., 2000). As noted in Chapter 2 , over the past 20 to 30 years, the image of “doing science” emerging from across multiple lines of research has shifted from depictions of lone scientists conducting experiments in isolated laboratories to the image of science as both an individual and a deeply social enterprise that involves problem solving and the building and testing of models and theories.

Across this same period, the psychological study of science has evolved from a focus on scientific reasoning as a highly developed form of logical thinking that cuts across scientific domains to the study of scientific thinking as the interplay of general reasoning strategies, knowledge of the natural phenomena being studied, and a sense of how scientific evidence and explanations are generated. Much early research on scientific thinking and inquiry tended to focus primarily either on conceptual development or on the development of reasoning strategies and processes, often using very

simplified reasoning tasks. In contrast, many recent studies have attempted to describe a larger number of the complex processes that are deployed in the context of scientific inquiry and to describe their coordination. These studies often engage children in firsthand investigations in which they actively explore multivariable systems. In such tasks, participants initiate all phases of scientific discovery with varying amounts of guidance provided by the researcher. These studies have revealed that, in the context of inquiry, reasoning processes and conceptual knowledge are interdependent and in fact facilitate each other (Schauble, 1996; Lehrer et al. 2001).

It is important to note that, across the studies reviewed in this chapter, researchers have made different assumptions about what scientific reasoning entails and which aspects of scientific practice are most important to study. For example, some emphasize the design of well-controlled experiments, while others emphasize building and critiquing models of natural phenomena. In addition, some researchers study scientific reasoning in stripped down, laboratory-based tasks, while others examine how children approach complex inquiry tasks in the context of the classroom. As a result, the research base is difficult to integrate and does not offer a complete picture of students’ skills and knowledge related to generating and evaluating evidence and explanations. Nor does the underlying view of scientific practice guiding much of the research fully reflect the image of science and scientific understanding we developed in Chapter 2 .


Generating evidence.

The evidence-gathering phase of inquiry includes designing the investigation as well as carrying out the steps required to collect the data. Generating evidence entails asking questions, deciding what to measure, developing measures, collecting data from the measures, structuring the data, systematically documenting outcomes of the investigations, interpreting and evaluating the data, and using the empirical results to develop and refine arguments, models, and theories.

Asking Questions and Formulating Hypotheses

Asking questions and formulating hypotheses is often seen as the first step in the scientific method; however, it can better be viewed as one of several phases in an iterative cycle of investigation. In an exploratory study, for example, work might start with structured observation of the natural world, which would lead to formulation of specific questions and hypotheses. Further data might then be collected, which lead to new questions,

revised hypotheses, and yet another round of data collection. The phase of asking questions also includes formulating the goals of the activity and generating hypotheses and predictions (Kuhn, 2002).

Children differ from adults in their strategies for formulating hypotheses and in the appropriateness of the hypotheses they generate. Children often propose different hypotheses from adults (Klahr, 2000), and younger children (age 10) often conduct experiments without explicit hypotheses, unlike 12- to 14-year-olds (Penner and Klahr, 1996a). In self-directed experimental tasks, children tend to focus on plausible hypotheses and often get stuck focusing on a single hypothesis (e.g., Klahr, Fay, and Dunbar, 1993). Adults are more likely to consider multiple hypotheses (e.g., Dunbar and Klahr, 1989; Klahr, Fay, and Dunbar, 1993). For both children and adults, the ability to consider many alternative hypotheses is a factor contributing to success.

At all ages, prior knowledge of the domain under investigation plays an important role in the formulation of questions and hypotheses (Echevarria, 2003; Klahr, Fay, and Dunbar, 1993; Penner and Klahr, 1996b; Schauble, 1990, 1996; Zimmerman, Raghavan, and Sartoris, 2003). For example, both children and adults are more likely to focus initially on variables they believe to be causal (Kanari and Millar, 2004; Schauble, 1990, 1996). Hypotheses that predict expected results are proposed more frequently than hypotheses that predict unexpected results (Echevarria, 2003). The role of prior knowledge in hypothesis formulation is discussed in greater detail later in the chapter.

Designing Experiments

The design of experiments has received extensive attention in the research literature, with an emphasis on developmental changes in children’s ability to build experiments that allow them to identify causal variables. Experimentation can serve to generate observations in order to induce a hypothesis to account for the pattern of data produced (discovery context) or to test the tenability of an existing hypothesis under consideration (confirmation/ verification context) (Klahr and Dunbar, 1988). At a minimum, one must recognize that the process of experimentation involves generating observations that will serve as evidence that will be related to hypotheses.

Ideally, experimentation should produce evidence or observations that are interpretable in order to make the process of evidence evaluation uncomplicated. One aspect of experimentation skill is to isolate variables in such a way as to rule out competing hypotheses. The control of variables is a basic strategy that allows valid inferences and narrows the number of possible experiments to consider (Klahr, 2000). Confounded experiments, those in which variables have not been isolated correctly, yield indetermi-

nate evidence, thereby making valid inferences and subsequent knowledge gain difficult, if not impossible.

Early approaches to examining experimentation skills involved minimizing the role of prior knowledge in order to focus on the strategies that participants used. That is, the goal was to examine the domain-general strategies that apply regardless of the content to which they are applied. For example, building on the research tradition of Piaget (e.g., Inhelder and Piaget, 1958), Siegler and Liebert (1975) examined the acquisition of experimental design skills by fifth and eighth graders. The problem involved determining how to make an electric train run. The train was connected to a set of four switches, and the children needed to determine the particular on/off configuration required. The train was in reality controlled by a secret switch, so that the discovery of the correct solution was postponed until all 16 combinations were generated. In this task, there was no principled reason why any one of the combinations would be more or less likely, and success was achieved by systematically testing all combinations of a set of four switches. Thus the task involved no domain-specific knowledge that would constrain the hypotheses about which configuration was most likely. A similarly knowledge-lean task was used by Kuhn and Phelps (1982), similar to a task originally used by Inhelder and Piaget (1958), involving identifying reaction properties of a set of colorless fluids. Success on the task was dependent on the ability to isolate and control variables in the set of all possible fluid combinations in order to determine which was causally related to the outcome. The study extended over several weeks with variations in the fluids used and the difficulty of the problem.

In both studies, the importance of practice and instructional support was apparent. Siegler and Liebert’s study included two experimental groups of children who received different kinds of instructional support. Both groups were taught about factors, levels, and tree diagrams. One group received additional, more elaborate support that included practice and help representing all possible solutions with a tree diagram. For fifth graders, the more elaborate instructional support improved their performance compared with a control group that did not receive any support. For eighth graders, both kinds of instructional support led to improved performance. In the Kuhn and Phelps task, some students improved over the course of the study, although an abrupt change from invalid to valid strategies was not common. Instead, the more typical pattern was one in which valid and invalid strategies coexisted both within and across sessions, with a pattern of gradual attainment of stable valid strategies by some students (the stabilization point varied but was typically around weeks 5-7).

Since this early work, researchers have tended to investigate children’s and adults’ performance on experimental design tasks that are more knowledge rich and less constrained. Results from these studies indicate that, in

general, adults are more proficient than children at designing informative experiments. In a study comparing adults with third and sixth graders, adults were more likely to focus on experiments that would be informative (Klahr, Fay, and Dunbar, 1993). Similarly, Schauble (1996) found that during the initial 3 weeks of exploring a domain, children and adults considered about the same number of possible experiments. However, when they began experimentation of another domain in the second 3 weeks of the study, adults considered a greater range of possible experiments. Over the full 6 weeks, children and adults conducted approximately the same number of experiments. Thus, children were more likely to conduct unintended duplicate or triplicate experiments, making their experimentation efforts less informative relative to the adults, who were selecting a broader range of experiments. Similarly, children are more likely to devote multiple experimental trials to variables that were already well understood, whereas adults move on to exploring variables they did not understand as well (Klahr, Fay, and Dunbar, 1993; Schauble, 1996). Evidence also indicates, however, that dimensions of the task often have a greater influence on performance than age (Linn, 1978, 1980; Linn, Chen, and Their, 1977; Linn and Levine, 1978).

With respect to attending to one feature at a time, children are less likely to control one variable at a time than adults. For example, Schauble (1996) found that across two task domains, children used controlled comparisons about a third of the time. In contrast, adults improved from 50 percent usage on the first task to 63 percent on the second task. Children usually begin by designing confounded experiments (often as a means to produce a desired outcome), but with repeated practice begin to use a strategy of changing one variable at time (e.g., Kuhn, Schauble, and Garcia-Mila, 1992; Kuhn et al. 1995; Schauble, 1990).

Reminiscent of the results of the earlier study by Kuhn and Phelps, both children and adults display intraindividual variability in strategy usage. That is, multiple strategy usage is not unique to childhood or periods of developmental transition (Kuhn et al., 1995). A robust finding is the coexistence of valid and invalid strategies (e.g., Kuhn, Schuable, and Garcia-Mila, 1992; Garcia-Mila and Andersen, 2005; Gleason and Schauble, 2000; Schauble, 1990; Siegler and Crowley, 1991; Siegler and Shipley, 1995). That is, participants may progress to the use of a valid strategy, but then return to an inefficient or invalid strategy. Similar use of multiple strategies has been found in research on the development of other academic skills, such as mathematics (e.g., Bisanz and LeFevre, 1990; Siegler and Crowley, 1991), reading (e.g., Perfetti, 1992), and spelling (e.g., Varnhagen, 1995). With respect to experimentation strategies, an individual may begin with an invalid strategy, but once the usefulness of changing one variable at a time is discovered, it is not immediately used exclusively. The newly discovered, effective strategy is only slowly incorporated into an individual’s set of strategies.

An individual’s perception of the goals of an investigation also has an important effect on the hypotheses they generate and their approach to experimentation. Individuals tend to differ in whether they see the overarching goal of an inquiry task as seeking to identify which factors make a difference (scientific) or seeking to produce a desired effect (engineering). It is a question for further research if these different approaches characterize an individual, or if they are invoked by task demand or implicit assumptions.

In a direct exploration of the effect of adopting scientific versus engineering goals, Schauble, Klopfer, and Raghavan (1991) provided fifth and sixth graders with an “engineering context” and a “science context.” When the children were working as scientists, their goal was to determine which factors made a difference and which ones did not. When the children were working as engineers, their goal was optimization, that is, to produce a desired effect (i.e., the fastest boat in the canal task). When working in the science context, the children worked more systematically, by establishing the effect of each variable, alone and in combination. There was an effort to make inclusion inferences (i.e., an inference that a factor is causal) and exclusion inferences (i.e., an inference that a factor is not causal). In the engineering context, children selected highly contrastive combinations and focused on factors believed to be causal while overlooking factors believed or demonstrated to be noncausal. Typically, children took a “try-and-see” approach to experimentation while acting as engineers, but they took a theory-driven approach to experimentation when acting as scientists. Schauble et al. (1991) found that children who received the engineering instructions first, followed by the scientist instructions, made the greatest improvements. Similarly, Sneider et al. (1984) found that students’ ability to plan and critique experiments improved when they first engaged in an engineering task of designing rockets.

Another pair of contrasting approaches to scientific investigation is the theorist versus the experimentalist (Klahr and Dunbar, 1998; Schauble, 1990). Similar variation in strategies for problem solving have been observed for chess, puzzles, physics problems, science reasoning, and even elementary arithmetic (Chase and Simon, 1973; Klahr and Robinson, 1981; Klayman and Ha, 1989; Kuhn et al., 1995; Larkin et al., 1980; Lovett and Anderson, 1995, 1996; Simon, 1975; Siegler, 1987; Siegler and Jenkins, 1989). Individuals who take a theory-driven approach tend to generate hypotheses and then test the predictions of the hypotheses. Experimenters tend to make data-driven discoveries, by generating data and finding the hypothesis that best summarizes or explains that data. For example, Penner and Klahr (1996a) asked 10-to 14-year-olds to conduct experiments to determine how the shape, size, material, and weight of an object influence sinking times. Students’ approaches to the task could be classified as either “prediction oriented” (i.e., a theorist: “I believe that weight makes a difference) or “hypothesis oriented” (i.e., an

experimenter: “I wonder if …”). The 10-year-olds were more likely to take a prediction (or demonstration) approach, whereas the 14-year-olds were more likely to explicitly test a hypothesis about an attribute without a strong belief or need to demonstrate that belief. Although these patterns may characterize approaches to any given task, it has yet to be determined if such styles are idiosyncratic to the individual and likely to remain stable across varying tasks, or if different styles might emerge for the same person depending on task demands or the domain under investigation.

Observing and Recording

Record keeping is an important component of scientific investigation in general, and of self-directed experimental tasks especially, because access to and consulting of cumulative records are often important in interpreting evidence. Early studies of experimentation demonstrated that children are often not aware of their own memory limitations, and this plays a role in whether they document their work during an investigation (e.g., Siegler and Liebert, 1975). Recent studies corroborate the importance of an awareness of one’s own memory limitations while engaged in scientific inquiry tasks, regardless of age. Spontaneous note-taking or other documentation of experimental designs and results may be a factor contributing to the observed developmental differences in performance on both experimental design tasks and in evaluation of evidence. Carey et al. (1989) reported that, prior to instruction, seventh graders did not spontaneously keep records when trying to determine and keep track of which substance was responsible for producing a bubbling reaction in a mixture of yeast, flour, sugar, salt, and warm water. Nevertheless, even though preschoolers are likely to produce inadequate and uninformative notations, they can distinguish between the two when asked to choose between them (Triona and Klahr, in press). Dunbar and Klahr (1988) also noted that children (grades 3-6) were unlikely to check if a current hypothesis was or was not consistent with previous experimental results. In a study by Trafton and Trickett (2001), undergraduates solving scientific reasoning problems in a computer environment were more likely to achieve correct performance when using the notebook function (78 percent) than were nonusers (49 percent), showing that this issue is not unique to childhood.

In a study of fourth graders’ and adults’ spontaneous use of notebooks during a 10-week investigation of multivariable systems, all but one of the adults took notes, whereas only half of the children took notes. Moreover, despite variability in the amount of notebook usage in both groups, on average adults made three times more notebook entries than children did. Adults’ note-taking remained stable across the 10 weeks, but children’s frequency of use decreased over time, dropping to about half of their initial

usage. Children rarely reviewed their notes, which typically consisted of conclusions, but not the variables used or the outcomes of the experimental tests (i.e., the evidence for the conclusion was not recorded) (Garcia-Mila and Andersen, 2005).

Children may differentially record the results of experiments, depending on familiarity or strength of prior theories. For example, 10- to 14-year-olds recorded more data points when experimenting with factors affecting force produced by the weight and surface area of boxes than when they were experimenting with pendulums (Kanari and Millar, 2004). Overall, it is a fairly robust finding that children are less likely than adults to record experimental designs and outcomes or to review what notes they do keep, despite task demands that clearly necessitate a reliance on external memory aids.

Given the increasing attention to the importance of metacognition for proficient performance on such tasks (e.g., Kuhn and Pearsall, 1998, 2000), it is important to determine at what point children and early adolescents recognize their own memory limitations as they navigate through a complex task. Some studies show that children’s understanding of how their own memories work continues to develop across the elementary and middle school grades (Siegler and Alibali, 2005). The implication is that there is no particular age or grade level when memory and limited understanding of one’s own memory are no longer a consideration. As such, knowledge of how one’s own memory works may represent an important moderating variable in understanding the development of scientific reasoning (Kuhn, 2001). For example, if a student is aware that it will be difficult for her to remember the results of multiple trials, she may be more likely to carefully record each outcome. However, it may also be the case that children, like adult scientists, need to be inducted into the practice of record keeping and the use of records. They are likely to need support to understand the important role of records in generating scientific evidence and supporting scientific arguments.

Evaluating Evidence

The important role of evidence evaluation in the process of scientific activity has long been recognized. Kuhn (1989), for example, has argued that the defining feature of scientific thinking is the set of skills involved in differentiating and coordinating theory and evidence. Various strands of research provide insight on how children learn to engage in this phase of scientific inquiry. There is an extensive literature on the evaluation of evidence, beginning with early research on identifying patterns of covariation and cause that used highly structured experimental tasks. More recently researchers have studied how children evaluate evidence in the context of self-directed experimental tasks. In real-world contexts (in contrast to highly controlled laboratory tasks) the process of evidence evaluation is very messy

and requires an understanding of error and variation. As was the case for hypothesis generation and the design of experiments, the role of prior knowledge and beliefs has emerged as an important influence on how individuals evaluate evidence.

Covariation Evidence

A number of early studies on the development of evidence evaluation skills used knowledge-lean tasks that asked participants to evaluate existing data. These data were typically in the form of covariation evidence—that is, the frequency with which two events do or do not occur together. Evaluation of covariation evidence is potentially important in regard to scientific thinking because covariation is one potential cue that two events are causally related. Deanna Kuhn and her colleagues carried out pioneering work on children’s and adults’ evaluation of covariation evidence, with a focus on how participants coordinate their prior beliefs about the phenomenon with the data presented to them (see Box 5-1 ).

Results across a series of studies revealed continuous improvement of the skills involved in differentiating and coordinating theory and evidence, as well as bracketing prior belief while evaluating evidence, from middle childhood (grades 3 and 6) to adolescence (grade 9) to adulthood (Kuhn, Amsel, and O’Loughlin, 1988). These skills, however, did not appear to develop to an optimal level even among adults. Even adults had a tendency to meld theory and evidence into a single mental representation of “the way things are.”

Participants had a variety of strategies for keeping theory and evidence in alignment with one another when they were in fact discrepant. One tendency was to ignore, distort, or selectively attend to evidence that was inconsistent with a favored theory. For example, the protocol from one ninth grader demonstrated that upon repeated instances of covariation between type of breakfast roll and catching colds, he would not acknowledge this relationship: “They just taste different … the breakfast roll to me don’t cause so much colds because they have pretty much the same thing inside” (Kuhn, Amsel, and O’Loughlin, 1998, p. 73).

Another tendency was to adjust a theory to fit the evidence, a process that was most often outside an individual’s conscious awareness and control. For example, when asked to recall their original beliefs, participants would often report a theory consistent with the evidence that was presented, and not the theory as originally stated. Take the case of one ninth grader who did not believe that type of condiment (mustard versus ketchup) was causally related to catching colds. With each presentation of an instance of covariation evidence, he acknowledged the evidence and elaborated a theory based on the amount of ingredients or vitamins and the temperature of the

food the condiment was served with to make sense of the data (Kuhn, Amsel, and O’Loughlin, 1988, p. 83). Kuhn argued that this tendency suggests that the student’s theory does not exist as an object of cognition. That is, a theory and the evidence for that theory are undifferentiated—they do not exist as separate cognitive entities. If they do not exist as separate entities, it is not possible to flexibly and consciously reflect on the relation of one to the other.

A number of researchers have criticized Kuhn’s findings on both methodological and theoretical grounds. Sodian, Zaitchik, and Carey (1991), for example, questioned the finding that third and sixth grade children cannot distinguish between their beliefs and the evidence, pointing to the complex-

ity of the tasks Kuhn used as problematic. They chose to employ simpler tasks that involved story problems about phenomena for which children did not hold strong beliefs. Children’s performance on these tasks demonstrated that even first and second graders could differentiate a hypothesis from the evidence. Likewise, Ruffman et al. (1993) used a simplified task and showed that 6-year-olds were able to form a causal hypothesis based on a pattern of covariation evidence. A study of children and adults (Amsel and Brock, 1996) indicated an important role of prior beliefs, especially for children. When presented with evidence that disconfirmed prior beliefs, children from both grade levels tended to make causal judgments consistent with their prior beliefs. When confronted with confirming evidence, however, both groups of children and adults made similar judgments. Looking across these studies provides insight into the conditions under which children are more or less proficient at coordinating theory and evidence. In some situations, children are better at distinguishing prior beliefs from evidence than the results of Kuhn et al. suggest.

Koslowksi (1996) criticized Kuhn et al.’s work on more theoretical grounds. She argued that reliance on knowledge-lean tasks in which participants are asked to suppress their prior knowledge may lead to an incomplete or distorted picture of the reasoning abilities of children and adults. Instead, Koslowski suggested that using prior knowledge when gathering and evaluating evidence is a valid strategy. She developed a series of experiments to support her thesis and to explore the ways in which prior knowledge might play a role in evaluating evidence. The results of these investigations are described in detail in the later section of this chapter on the role of prior knowledge.

Evidence in the Context of Investigations

Researchers have also looked at reasoning about cause in the context of full investigations of causal systems. Two main types of multivariable systems are used in these studies. In the first type of system, participants are involved in a hands-on manipulation of a physical system, such as a ramp (e.g., Chen and Klahr, 1999; Masnick and Klahr, 2003) or a canal (e.g., Gleason and Schauble, 2000; Kuhn, Schauble, and Garcia-Mila, 1992). The second type of system is a computer simulation, such as the Daytona microworld in which participants discover the factors affecting the speed of race cars (Schauble, 1990). A variety of virtual environments have been created in domains such as electric circuits (Schauble et al., 1992), genetics (Echevarria, 2003), earthquake risk, and flooding risk (e.g., Keselman, 2003).

The inferences that are made based on self-generated experimental evidence are typically classified as either causal (or inclusion), noncausal (or exclusion), indeterminate, or false inclusion. All inference types can be fur-

ther classified as valid or invalid. Invalid inclusion, by definition, is of particular interest because in self-directed experimental contexts, both children and adults often infer based on prior beliefs that a variable is causal, when in reality it is not.

Children tend to focus on making causal inferences during their initial explorations of a causal system. In a study in which children worked to discover the causal structure of a computerized microworld, fifth and sixth graders began by producing confounded experiments and relied on prior knowledge or expectations (Schauble, 1990). As a result, in their early explorations of the causal system, they were more likely to make incorrect causal inferences. In a direct comparison of adults and children (Schauble, 1996), adults also focused on making causal inferences, but they made more valid inferences because their experimentation was more often done using a control-of-variables strategy. Overall, children’s inferences were valid 44 percent of the time, compared with 72 percent for adults. The fifth and sixth graders improved over the course of six sessions, starting at 25 percent but improving to almost 60 percent valid inferences (Schauble, 1996). Adults were more likely than children to make inferences about which variables were noncausal or inferences of indeterminacy (80 and 30 percent, respectively) (Schauble, 1996).

Children’s difficulty with inferences of noncausality also emerged in a study of 10- to 14-year-olds who explored factors influencing the swing of a pendulum or the force needed to pull a box along a level surface (Kanari and Millar, 2004). Only half of the students were able draw correct conclusions about factors that did not covary with outcome. Students were likely to either selectively record data, selectively attend to data, distort or reinterpret the data, or state that noncovariation experimental trials were “inconclusive.” Such tendencies are reminiscent of other findings that some individuals selectively attend to or distort data in order to preserve a prior theory or belief (Kuhn, Amsel, and O’Loughlin, 1988; Zimmerman, Raghavan, and Sartoris, 2003).

Some researchers suggest children’s difficulty with noncausal or indeterminate inferences may be due both to experience and to the inherent complexity of the problem. In terms of experience, in the science classroom it is typical to focus on variables that “make a difference,” and therefore students struggle when testing variables that do not covary with the outcome (e.g., the weight of a pendulum does not affect the time of swing or the vertical height of a weight does not affect balance) (Kanari and Millar, 2004). Also, valid exclusion and indeterminacy inferences may be conceptually more complex, because they require one to consider a pattern of evidence produced from several experimental trials (Kuhn et al., 1995; Schauble, 1996). Looking across several trials may require one to review cumulative records of previous outcomes. As has been suggested previously, children do not

often have the memory skills to either record information, record sufficient information, or consult such information when it has been recorded.

The importance of experience is highlighted by the results of studies conducted over several weeks with fifth and sixth graders. After several weeks with a task, children started making more exclusion inferences (that factors are not causal) and indeterminacy inferences (that one cannot make a conclusive judgment about a confounded comparison) and did not focus solely on causal inferences (e.g., Keselman, 2003; Schauble, 1996). They also began to distinguish between an informative and an uninformative experiment by attending to or controlling other factors leading to an improved ability to make valid inferences. Through repeated exposure, invalid inferences, such as invalid inclusions, dropped in frequency. The tendency to begin to make inferences of indeterminacy suggests that students developed more awareness of the adequacy or inadequacy of their experimentation strategies for generating sufficient and interpretable evidence.

Children and adults also differ in generating sufficient evidence to support inferences. In contexts in which it is possible, children often terminate their search early, believing that they have determined a solution to the problem (e.g., Dunbar and Klahr, 1989). In studies over several weeks in which children must continue their investigation (e.g., Schauble et al., 1991), this is less likely because of the task requirements. Children are also more likely to refer to the most recently generated evidence. They may jump to a conclusion after a single experiment, whereas adults typically need to see the results of several experiments (e.g., Gleason and Schauble, 2000).

As was found with experimentation, children and adults display intraindividual variability in strategy usage with respect to inference types. Likewise, the existence of multiple inference strategies is not unique to childhood (Kuhn et al., 1995). In general, early in an investigation, individuals focus primarily on identifying factors that are causal and are less likely to consider definitely ruling out factors that are not causal. However, a mix of valid and invalid inference strategies co-occur during the course of exploring a causal system. As with experimentation, the addition of a valid inference strategy to an individual’s repertoire does not mean that they immediately give up the others. Early in investigations, there is a focus on causal hypotheses and inferences, whether they are warranted or not. Only with additional exposure do children start to make inferences of noncausality and indeterminacy. Knowledge change and experience—gaining a better understanding of the causal system via experimentation—was associated with the use of valid experimentation and inference strategies.


In the previous section we reviewed evidence on developmental differences in using scientific strategies. Across multiple studies, prior knowledge

emerged as an important influence on several parts of the process of generating and evaluating evidence. In this section we look more closely at the specific ways that prior knowledge may shape part of the process. Prior knowledge includes conceptual knowledge, that is, knowledge of the natural world and specifically of the domain under investigation, as well as prior knowledge and beliefs about the purpose of an investigation and the goals of science more generally. This latter kind of prior knowledge is touched on here and discussed in greater detail in the next chapter.

Beliefs About Causal Mechanism and Plausibility

In response to research on evaluation of covariation evidence that used knowledge-lean tasks or even required participants to suppress prior knowledge, Koslowski (1996) argued that it is legitimate and even helpful to consider prior knowledge when gathering and evaluating evidence. The world is full of correlations, and consideration of plausibility, causal mechanism, and alternative causes can help to determine which correlations between events should be taken seriously and which should be viewed as spurious. For example, the identification of the E. coli bacterium allows a causal relationship between hamburger consumption and certain types of illness or mortality. Because of the absence of a causal mechanism, one does not consider seriously the correlation between ice cream consumption and violent crime rate as causal, but one looks for other covarying quantities (such as high temperatures) that may be causal for both behaviors and thus explain the correlation.

Koslowski (1996) presented a series of experiments that demonstrate the interdependence of theory and evidence in legitimate scientific reasoning (see Box 5-2 for an example). In most of these studies, all participants (sixth graders, ninth graders, and adults) did take mechanism into consideration when evaluating evidence in relation to a hypothesis about a causal relationship. Even sixth graders considered more than patterns of covariation when making causal judgments (Koslowksi and Okagaki, 1986; Koslowski et al., 1989). In fact, as discussed in the previous chapter, results of studies by Koslowski (1996) and others (Ahn et al., 1995) indicate that children and adults have naïve theories about the world that incorporate information about both covariation and causal mechanism.

The plausibility of a mechanism also plays a role in reasoning about cause. In some situations, scientific progress occurs by taking seemingly implausible correlations seriously (Wolpert, 1993). Similarly, Koslowski argued that if people rely on covariation and mechanism information in an interdependent and judicious manner, then they should pay attention to implausible correlations (i.e., those with no apparent mechanism) when the implausible correlation occurs repeatedly. For example, discovering the cause of Kawasaki’s syndrome depended on taking seriously the implausible cor-

relation between the illness and having recently cleaned carpets. Similarly, Thagard (1998a, 1998b) describes the case of researchers Warren and Marshall, who proposed that peptic ulcers could be caused by a bacterium, and their efforts to have their theory accepted by the medical community. The bacterial theory of ulcers was initially rejected as implausible, given the assumption that the stomach is too acidic to allow bacteria to survive.

Studies with both children and adults reveal links between reasoning about mechanism and the plausibility of that mechanism (Koslowski, 1996). When presented with an implausible covariation (e.g., improved gas mileage and color of car), participants rated the causal status of the implausible cause (color) before and after learning about a possible way that the cause could bring about the effect (improved gas mileage). In this example, par-

ticipants learned that the color of the car affects the driver’s alertness (which affects driving quality, which in turn affects gas mileage). At all ages, participants increased their causal ratings after learning about a possible mediating mechanism. The presence of a possible mechanism in addition to a large number of covariations (four or more) was taken to indicate the possibility of a causal relationship for both plausible and implausible covariations. When either generating or assessing mechanisms for plausible covariations, all age groups (sixth and ninth graders and adults) were comparable. When the covariation was implausible, sixth graders were more likely to generate dubious mechanisms to account for the correlation.

The role of prior knowledge, especially beliefs about causal mechanism and plausibility, is also evident in hypothesis formation and the design of investigations. Individuals’ prior beliefs influence the choice of which hypotheses to test, including which hypotheses are tested first, repeatedly, or receive the most time and attention (e.g., Echevarria, 2003; Klahr, Fay, and Dunbar, 1993; Penner and Klahr, 1996b; Schauble, 1990, 1996; Zimmerman, Raghavan, and Sartoris, 2003). For example, children’s favored theories sometimes result in the selection of invalid experimentation and evidence evaluation heuristics (e.g., Dunbar and Klahr, 1989; Schauble, 1990). Plausibility of a hypothesis may serve as a guide for which experiments to pursue. Klahr, Fay, and Dunbar (1993) provided third and sixth grade children and adults with hypotheses to test that were incorrect but either plausible or implausible. For plausible hypotheses, children and adults tended to go about demonstrating the correctness of the hypothesis rather than setting up experiments to decide between rival hypotheses. For implausible hypotheses, adults and some sixth graders proposed a plausible rival hypothesis and set up an experiment that would discriminate between the two. Third graders tended to propose a plausible hypothesis but then ignore or forget the initial implausible hypothesis, getting sidetracked in an attempt to demonstrate that the plausible hypothesis was correct.

Recognizing the interdependence of theory and data in the evaluation of evidence and explanations, Chinn and Brewer (2001) proposed that people evaluate evidence by building a mental model of the interrelationships between theories and data. These models integrate patterns of data, procedural details, and the theoretical explanation of the observed findings (which may include unobservable mechanisms, such as molecules, electrons, enzymes, or intentions and desires). The information and events can be linked by different kinds of connections, including causal, contrastive, analogical, and inductive links. The mental model may then be evaluated by considering the plausibility of these links. In addition to considering the links between, for example, data and theory, the model might also be evaluated by appealing to alternate causal mechanisms or alternate explanations. Essentially, an individual seeks to “undermine one or more of the links in the

model” (p. 337). If no reasons to be critical can be identified, the individual may accept the new evidence or theoretical interpretation.

Some studies suggest that the strength of prior beliefs, as well as the personal relevance of those beliefs, may influence the evaluation of the mental model (Chinn and Malhotra, 2002; Klaczynski, 2000; Klaczynski and Narasimham, 1998). For example, when individuals have reason to disbelieve evidence (e.g., because it is inconsistent with prior belief), they will search harder for flaws in the data (Kunda, 1990). As a result, individuals may not find the evidence compelling enough to reassess their cognitive model. In contrast, beliefs about simple empirical regularities may not be held with such conviction (e.g., the falling speed of heavy versus light objects), making it easier to change a belief in response to evidence.

Evaluating Evidence That Contradicts Prior Beliefs

Anomalous data or evidence refers to results that do not fit with one’s current beliefs. Anomalous data are considered very important by scientists because of their role in theory change, and they have been used by science educators to promote conceptual change. The idea that anomalous evidence promotes conceptual change (in the scientist or the student) rests on a number of assumptions, including that individuals have beliefs or theories about natural or social phenomena, that they are capable of noticing that some evidence is inconsistent with those theories, that such evidence calls into question those theories, and, in some cases, that a belief or theory will be altered or changed in response to the new (anomalous) evidence (Chinn and Brewer, 1998). Chinn and Brewer propose that there are eight possible responses to anomalous data. Individuals can (1) ignore the data; (2) reject the data (e.g., because of methodological error, measurement error, bias); (3) acknowledge uncertainty about the validity of the data; (4) exclude the data as being irrelevant to the current theory; (5) hold the data in abeyance (i.e., withhold a judgment about the relation of the data to the initial theory); (6) reinterpret the data as consistent with the initial theory; (7) accept the data and make peripheral change or minor modification to the theory; or (8) accept the data and change the theory. Examples of all of these responses were found in undergraduates’ responses to data that contradicted theories to explain the mass extinction of dinosaurs and theories about whether dinosaurs were warm-blooded or cold-blooded.

In a series of studies, Chinn and Malhotra (2002) examined how fourth, fifth, and sixth graders responded to experimental data that were inconsistent with their existing beliefs. Experiments from physical science domains were selected in which the outcomes produced either ambiguous or unambiguous data, and for which the findings were counterintuitive for most children. For example, most children assume that a heavy object falls faster

than a light object. When the two objects are dropped simultaneously, there is some ambiguity because it is difficult to observe both objects. An example of a topic that is counterintuitive but results in unambiguous evidence is the reaction temperature of baking soda added to vinegar. Children believe that either no change in temperature will occur, or that the fizzing causes an increase in temperature. Thermometers unambiguously show a temperature drop of about 4 degrees centigrade.

When examining the anomalous evidence produced by these experiments, children’s difficulties seemed to occur in one of four cognitive processes: observation, interpretation, generalization, or retention (Chinn and Malhotra, 2002). For example, prior belief may influence what is “observed,” especially in the case of data that are ambiguous, and children may not perceive the two objects as landing simultaneously. Inferences based on this faulty observation will then be incorrect. At the level of interpretation, even if individuals accurately observed the outcome, they might not shift their theory to align with the evidence. They can fail to do so in many ways, such as ignoring or distorting the data or discounting the data because they are considered flawed. At the level of generalization, an individual may accept, for example, that these particular heavy and light objects fell at the same rate but insist that the same rule may not hold for other situations or objects. Finally, even when children appeared to change their beliefs about an observed phenomenon in the immediate context of the experiment, their prior beliefs reemerged later, indicating a lack of long-term retention of the change.

Penner and Klahr (1996a) investigated the extent to which children’s prior beliefs affect their ability to design and interpret experiments. They used a domain in which most children hold a strong belief that heavier objects sink in fluid faster than light objects, and they examined children’s ability to design unconfounded experiments to test that belief. In this study, for objects of a given composition and shape, sink times for heavy and light objects are nearly indistinguishable to an observer. For example, the sink times for the stainless steel spheres weighing 65 gm and 19 gm were .58 sec and .62 sec, respectively. Only one of the eight children (out of 30) who chose to directly contrast these two objects continued to explore the reason for the unexpected finding that the large and small spheres had equivalent sink times. The process of knowledge change was not straightforward. For example, some children suggested that the size of the smaller steel ball offset the fact that it weighed less because it was able to move through the water as fast as the larger, heavier steel ball. Others concluded that both weight and shape make a difference. That is, there was an attempt to reconcile the evidence with prior knowledge and expectations by appealing to causal mechanisms, alternate causes, or enabling conditions.

What is also important to note about the children in the Penner and Klahr study is that they did in fact notice the surprising finding, rather than

ignore or misrepresent the data. They tried to make sense of the outcome by acting as a theorist who conjectures about the causal mechanisms, boundary conditions, or other ad hoc explanations (e.g., shape) to account for the results of an experiment. In Chinn and Malhotra’s (2002) study of students’ evaluation of observed evidence (e.g., watching two objects fall simultaneously), the process of noticing was found to be an important mediator of conceptual change.

Echevarria (2003) examined seventh graders’ reactions to anomalous data in the domain of genetics and whether they served as a catalyst for knowledge construction during the course of self-directed experimentation. Students in the study completed a 3-week unit on genetics that involved genetics simulation software and observing plant growth. In both the software and the plants, students investigated or observed the transmission of one trait. Anomalies in the data were defined as outcomes that were not readily explainable on the basis of the appearance of the parents.

In general, the number of hypotheses generated, the number of tests conducted, and the number of explanations generated were a function of students’ ability to encounter, notice, and take seriously an anomalous finding. The majority of students (80 percent) developed some explanation for the pattern of anomalous data. For those who were unable to generate an explanation, it was suggested that the initial knowledge was insufficient and therefore could not undergo change as a result of the encounter with “anomalous” evidence. Analogous to case studies in the history of science (e.g., Simon, 2001), these students’ ability to notice and explore anomalies was related to their level of domain-specific knowledge (as suggested by Pasteur’s oft quoted maxim “serendipity favors the prepared mind”). Surprising findings were associated with an increase in hypotheses and experiments to test these potential explanations, but without the domain knowledge to “notice,” anomalies could not be exploited.

There is some evidence that, with instruction, students’ ability to evaluate anomalous data improves (Chinn and Malhotra, 2002). In a study of fourth, fifth, and sixth graders, one group of students was instructed to predict the outcomes of three experiments that produce counterintuitive but unambiguous data (e.g., reaction temperature). A second group answered questions that were designed to promote unbiased observations and interpretations by reflecting on the data. A third group was provided with an explanation of what scientists expected to find and why. All students reported their prediction of the outcome, what they observed, and their interpretation of the experiment. They were then tested for generalizations, and a retention test followed 9-10 days later. Fifth and sixth graders performed better than did fourth graders. Students who heard an explanation of what scientists expected to find and why did best. Further analyses suggest that the explanation-based intervention worked by influencing students’ initial

predictions. This correct prediction then influenced what was observed. A correct observation then led to correct interpretations and generalizations, which resulted in conceptual change that was retained. A similar pattern of results was found using interventions employing either full or reduced explanations prior to the evaluation of evidence.

Thus, it appears that children were able to change their beliefs on the basis of anomalous or unexpected evidence, but only when they were capable of making the correct observations. Difficulty in making observations was found to be the main cognitive process responsible for impeding conceptual change (i.e., rather than interpretation, generalization, or retention). Certain interventions, in particular those involving an explanation of what scientists expected to happen and why, were very effective in mediating conceptual change when encountering counterintuitive evidence. With particular scaffolds, children made observations independent of theory, and they changed their beliefs based on observed evidence.


There is increasing evidence that, as in the case of intellectual skills in general, the development of the component skills of scientific reasoning “cannot be counted on to routinely develop” (Kuhn and Franklin, 2006, p. 47). That is, young children have many requisite skills needed to engage in scientific thinking, but there are also ways in which even adults do not show full proficiency in investigative and inference tasks. Recent research efforts have therefore been focused on how such skills can be promoted by determining which types of educational interventions (e.g., amount of structure, amount of support, emphasis on strategic or metastrategic skills) will contribute most to learning, retention, and transfer, and which types of interventions are best suited to different students. There is a developing picture of what children are capable of with minimal support, and research is moving in the direction of ascertaining what children are capable of, and when, under conditions of practice, instruction, and scaffolding. It may one day be possible to tailor educational opportunities that neither under- or overestimate children’s ability to extract meaningful experiences from inquiry-based science classes.

Very few of the early studies focusing on the development of experimentation and evidence evaluation skills explicitly addressed issues of instruction and experience. Those that did, however, indicated an important role of experience and instruction in supporting scientific thinking. For example, Siegler and Liebert (1975) incorporated instructional manipulations aimed at teaching children about variables and variable levels with or without practice on analogous tasks. In the absence of both instruction and

extended practice, no fifth graders and a small minority of eighth graders were successful. Kuhn and Phelps (1982) reported that, in the absence of explicit instruction, extended practice over several weeks was sufficient for the development and modification of experimentation and inference strategies. Later studies of self-directed experimentation also indicate that frequent engagement with the inquiry environment alone can lead to the development and modification of cognitive strategies (e.g., Kuhn, Schauble, and Garcia-Mila, 1992; Schauble et al., 1991).

Some researchers have suggested that even simple prompts, which are often used in studies of students’ investigation skills, may provide a subtle form of instruction intervention (Klahr and Carver, 1995). Such prompts may cue the strategic requirements of the task, or they may promote explanation or the type of reflection that could induce a metacognitive or metastrategic awareness of task demands. Because of their role in many studies of revealing students’ thinking generation, it may be very difficult to tease apart the relative contributions of practice from the scaffolding provided by researcher prompts.

In the absence of instruction or prompts, students may not routinely ask questions of themselves, such as “What are you going to do next?” “What outcome do you predict?” “What did you learn?” and “How do you know?” Questions such as these may promote self-explanation, which has been shown to enhance understanding in part because it facilitates the integration of newly learned material with existing knowledge (Chi et al., 1994). Questions such as the prompts used by researchers may serve to promote such integration. Chinn and Malhotra (2002) incorporated different kinds of interventions, aimed at promoting conceptual change in response to anomalous experimental evidence. Interventions included practice at making predictions, reflecting on data, and explanation. The explanation-based interventions were most successful at promoting conceptual change, retention, and generalization. The prompts used in some studies of self-directed experimentation are very likely to serve the same function as the prompts used by Chi et al. (1994). Incorporating such prompts in classroom-based inquiry activities could serve as a powerful teaching tool, given that the use of self-explanation in tutoring systems (human and computer interface) has been shown to be quite effective (e.g., Chi, 1996; Hausmann and Chi, 2002).

Studies that compare the effects of different kinds of instruction and practice opportunities have been conducted in the laboratory, with some translation to the classroom. For example, Chen and Klahr (1999) examined the effects of direct and indirect instruction of the control of variables strategy on students’ (grades 2-4) experimentation and knowledge acquisition. The instructional intervention involved didactic teaching of the control-of-variables strategy, along with examples and probes. Indirect (or implicit) training involved the use of systematic probes during the course of children’s

experimentation. A control group did not receive instruction or probes. No group received instruction on domain knowledge for any task used (springs, ramps, sinking objects). For the students who received instruction, use of the control-of-variables strategy increased from 34 percent prior to instruction to 65 percent after, with 61-64 percent usage maintained on transfer tasks that followed after 1 day and again after 7 months, respectively. No such gains were evident for the implicit training or control groups.

Instruction about control of variables improved children’s ability to design informative experiments, which in turn facilitated conceptual change in a number of domains. They were able to design unconfounded experiments, which facilitated valid causal and noncausal inferences, resulting in a change in knowledge about how various multivariable causal systems worked. Significant gains in domain knowledge were evident only for the instruction group. Fourth graders showed better skill retention at long-term assessment than second or third graders.

The positive impact of instruction on control of variables also appears to translate to the classroom (Toth, Klahr, and Chen, 2000; Klahr, Chen and Toth, 2001). Fourth graders who received instruction in the control-of-variables strategy in their classroom increased their use of the strategy, and their domain knowledge improved. The percentage of students who were able to correctly evaluate others’ research increased from 28 to 76 percent.

Instruction also appears to promote longer term use of the control-of-variables strategy and transfer of the strategy to a new task (Klahr and Nigam, 2004). Third and fourth graders who received instruction were more likely to master the control-of-variables strategy than students who explored a multivariable system on their own. Interestingly, although the group that received instruction performed better overall, a quarter of the students who explored the system on their own also mastered the strategy. These results raise questions about the kinds of individual differences that may allow for some students to benefit from the discovery context, but not others. That is, which learner traits are associated with the success of different learning experiences?

Similar effects of experience and instruction have been demonstrated for improving students’ ability to use evidence from multiple records and make correct inferences from noncausal variables (Keselman, 2003). In many cases, students show some improvement when they are given the opportunity for practice, but greater improvement when they receive instruction (Kuhn and Dean, 2005).

Long-term studies of students’ learning in the classroom with instructional support and structured experiences over months and years reveal children’s potential to engage in sophisticated investigations given the appropriate experiences (Metz, 2004; Lehrer and Schauble, 2005). For example, in one classroom-based study, second and fourth and fifth graders took part

in a curriculum unit on animal behavior that emphasized domain knowledge, whole-class collaboration, scaffolded instruction, and discussions about the kinds of questions that can and cannot be answered by observational records (Metz, 2004). Pairs or triads of students then developed a research question, designed an experiment, collected and analyzed data, and presented their findings on a research poster. Such studies have demonstrated that, with appropriate support, students in grades K-8 and students from a variety of socioeconomic, cultural, and linguistic backgrounds can be successful in generating and evaluating scientific evidence and explanations (Kuhn and Dean, 2005; Lehrer and Schauble, 2005; Metz, 2004; Warren, Rosebery, and Conant, 1994).


The picture that emerges from developmental and cognitive research on scientific thinking is one of a complex intertwining of knowledge of the natural world, general reasoning processes, and an understanding of how scientific knowledge is generated and evaluated. Science and scientific thinking are not only about logical thinking or conducting carefully controlled experiments. Instead, building knowledge in science is a complex process of building and testing models and theories, in which knowledge of the natural world and strategies for generating and evaluating evidence are closely intertwined. Working from this image of science, a few researchers have begun to investigate the development of children’s knowledge and skills in modeling.

The kinds of models that scientists construct vary widely, both within and across disciplines. Nevertheless, the rhetoric and practice of science are governed by efforts to invent, revise, and contest models. By modeling, we refer to the construction and test of representations that serve as analogues to systems in the real world (Lehrer and Schauble, 2006). These representations can be of many forms, including physical models, computer programs, mathematical equations, or propositions. Objects and relations in the model are interpreted as representing theoretically important objects and relations in the represented world. Models are useful in summarizing known features and predicting outcomes—that is, they can become elements of or representations of theories. A key hurdle for students is to understand that models are not copies; they are deliberate simplifications. Error is a component of all models, and the precision required of a model depends on the purpose for its current use.

The forms of thinking required for modeling do not progress very far without explicit instruction and fostering (Lehrer and Schauble, 2000). For this reason, studies of modeling have most often taken place in classrooms over sustained periods of time, often years. These studies provide a pro-

vocative picture of the sophisticated scientific thinking that can be supported in classrooms if students are provided with the right kinds of experiences over extended periods of time. The instructional approaches used in studies of students’ modeling, as well as the approach to curriculum that may be required to support the development of modeling skills over multiple years of schooling, are discussed in the chapters in Part III .

Lehrer and Schauble (2000, 2003, 2006) reported observing characteristic shifts in the understanding of modeling over the span of the elementary school grades, from an early emphasis on literal depictional forms, to representations that are progressively more symbolic and mathematically powerful. Diversity in representational and mathematical resources both accompanied and produced conceptual change. As children developed and used new mathematical means for characterizing growth, they understood biological change in increasingly dynamic ways. For example, once students understood the mathematics of ratio and changing ratios, they began to conceive of growth not as simple linear increase, but as a patterned rate of change. These transitions in conception and representation appeared to support each other, and they opened up new lines of inquiry. Children wondered whether plant growth was like animal growth, and whether the growth of yeast and bacteria on a Petri dish would show a pattern like the growth of a single plant. These forms of conceptual development required a context in which teachers systematically supported a restricted set of central ideas, building successively on earlier concepts over the grades of schooling.

Representational Systems That Support Modeling

The development of specific representational forms and notations, such as graphs, tables, computer programs, and mathematical expressions, is a critical part of engaging in mature forms of modeling. Mathematics, data and scale models, diagrams, and maps are particularly important for supporting science learning in grades K-8.


Mathematics and science are, of course, separate disciplines. Nevertheless, for the past 200 years, the steady press in science has been toward increasing quantification, visualization, and precision (Kline, 1980). Mathematics in all its forms is a symbol system that is fundamental to both expressing and understanding science. Often, expressing an idea mathematically results in noticing new patterns or relationships that otherwise would not be grasped. For example, elementary students studying the growth of organisms (plants, tobacco hornworms, populations of bacteria) noted that when they graphed changes in heights over the life span, all the organisms

studied produced an emergent S-shaped curve. However, such seeing depended on developing a “disciplined perception” (Stevens and Hall, 1998), a firm grounding in a Cartesian system. Moreover, the shape of the curve was determined in light of variation, accounted for by selecting and connecting midpoints of intervals that defined piece-wise linear segments. This way of representing typical growth was contentious, because some midpoints did not correspond to any particular case value. This debate was therefore a pathway toward the idealization and imagined qualities of the world necessary for adopting a modeling stance. The form of the growth curve was eventually tested in other systems, and its replications inspired new questions. For example, why would bacteria populations and plants be describable by the same growth curve? In this case and in others, explanatory models and data models mutually bootstrapped conceptual development (Lehrer and Schauble, 2002).

It is not feasible in this report to summarize the extensive body of research in mathematics education, but one point is especially critical for science education: the need to expand elementary school mathematics beyond arithmetic to include space and geometry, measurement, and data/ uncertainty. The National Council of Teachers of Mathematics standards (2000) has strongly supported this extension of early mathematics, based on their judgment that arithmetic alone does not constitute a sufficient mathematics education. Moreover, if mathematics is to be used as a resource for science, the resource base widens considerably with a broader mathematical base, affording students a greater repertoire for making sense of the natural world.

For example, consider the role of geometry and visualization in comparing crystalline structures or evaluating the relationship between the body weights and body structures of different animals. Measurement is a ubiquitous part of the scientific enterprise, although its subtleties are almost always overlooked. Students are usually taught procedures for measuring but are rarely taught a theory of measure. Educators often overestimate children’s understanding of measurement because measuring tools—like rulers or scales—resolve many of the conceptual challenges of measurement for children, so that they may fail to grasp the idea that measurement entails the iteration of constant units, and that these units can be partitioned. It is reasonably common, for example, for even upper elementary students who seem proficient at measuring lengths with rulers to tacitly hold the theory that measuring merely entails the counting of units between boundaries. If these students are given unconnected units (say, tiles of a constant length) and asked to demonstrate how to measure a length, some of them almost always place the units against the object being measured in such a way that the first and last tile are lined up flush with the end of the object measured. This arrangement often requires leaving spaces between units. Diagnosti-

cally, these spaces do not trouble a student who holds this “boundary-filling” conception of measurement (Lehrer, 2003; McClain et al., 1999).

Researchers agree that scientific thinking entails the coordination of theory with evidence (Klahr and Dunbar, 1988; Kuhn, Amsel, and O’Loughlin, 1988), but there are many ways in which evidence may vary in both form and complexity. Achieving this coordination therefore requires tools for structuring and interpreting data and error. Otherwise, students’ interpretation of evidence cannot be accountable. There have been many studies of students’ reasoning about data, variation, and uncertainty, conducted both by psychologists (Kahneman, Solvic, and Tversky, 1982; Konold, 1989; Nisbett et al., 1983) and by educators (Mokros and Russell, 1995; Pollatsek, Lima, and Well, 1981; Strauss and Bichler, 1988). Particularly pertinent here are studies that focus on data modeling (Lehrer and Romberg, 1996), that is, how reasoning with data is recruited as a way of investigating genuine questions about the world.

Data modeling is, in fact, what professionals do when they reason with data and statistics. It is central to a variety of enterprises, including engineering, medicine, and natural science. Scientific models are generated with acute awareness of their entailments for data, and data are recorded and structured as a way of making progress in articulating a scientific model or adjudicating among rival models. The tight relationship between model and data holds generally in domains in which inquiry is conducted by inscribing, representing, and mathematizing key aspects of the world (Goodwin, 2000; Kline, 1980; Latour, 1990).

Understanding the qualities and meaning of data may be enhanced if students spend as much attention on its generation as on its analysis. First and foremost, students need to grasp the notion that data are constructed to answer questions (Lehrer, Giles, and Schauble, 2002). The National Council of Teachers of Mathematics (2000) emphasizes that the study of data should be firmly anchored in students’ inquiry, so that they “address what is involved in gathering and using the data wisely” (p. 48). Questions motivate the collection of certain types of information and not others, and many aspects of data coding and structuring also depend on the question that motivated their collection. Defining the variables involved in addressing a research question, considering the methods and timing to collect data, and finding efficient ways to record it are all involved in the initial phases of data modeling. Debates about the meaning of an attribute often provoke questions that are more precise.

For example, a group of first graders who wanted to learn which student’s pumpkin was the largest eventually understood that they needed to agree

whether they were interested in the heights of the pumpkins, their circumferences, or their weights (Lehrer et al., 2001). Deciding what to measure is bound up with deciding how to measure. As the students went on to count the seeds in their pumpkins (they were pursuing a question about whether there might be relationship between pumpkin size and number of seeds), they had to make decisions about whether they would include seeds that were not full grown and what criteria would be used to decide whether any particular seed should be considered mature.

Data are inherently a form of abstraction: an event is replaced by a video recording, a sensation of heat is replaced by a pointer reading on a thermometer, and so on. Here again, the tacit complexity of tools may need to be explained. Students often have a fragile grasp of the relationship between the event of interest and the operation (hence, the output) of a tool, whether that tool is a microscope, a pan balance, or a “simple” ruler. Some students, for example, do not initially consider measurement to be a form of comparison and may find a balance a very confusing tool. In their mind, the number displayed on a scale is the weight of the object. If no number is displayed, weight cannot be found.

Once the data are recorded, making sense of them requires that they be structured. At this point, students sometimes discover that their data require further abstraction. For example, as they categorized features of self-portraits drawn by other students, a group of fourth graders realized that it would not be wise to follow their original plan of creating 23 categories of “eye type” for the 25 portraits that they wished to categorize (DiPerna, 2002). Data do not come with an inherent structure; rather, structure must be imposed (Lehrer, Giles, and Schauble, 2002). The only structure for a set of data comes from the inquirers’ prior and developing understanding of the phenomenon under investigation. He imposes structure by selecting categories around which to describe and organize the data.

Students also need to mentally back away from the objects or events under study to attend to the data as objects in their own right, by counting them, manipulating them to discover relationships, and asking new questions of already collected data. Students often believe that new questions can be addressed only with new data; they rarely think of querying existing data sets to explore questions that were not initially conceived when the data were collected (Lehrer and Romberg, 1996).

Finally, data are represented in various ways in order to see or understand general trends. Different kinds of displays highlight certain aspects of the data and hide others. An important educational agenda for students, one that extends over several years, is to come to understand the conventions and properties of different kinds of data displays. We do not review here the extensive literature on students’ understanding of different kinds of representational displays (tables, graphs of various kinds, distributions), but, for

purposes of science, students should not only understand the procedures for generating and reading displays, but they should also be able to critique them and to grasp the communicative advantages and disadvantages of alternative forms for a given purpose (diSessa, 2004; Greeno and Hall, 1997). The structure of the data will affect the interpretation. Data interpretation often entails seeking and confirming relationships in the data, which may be at varying levels of complexity. For example, simple linear relationships are easier to spot than inverse relationships or interactions (Schauble, 1990), and students often fail to entertain the possibility that more than one relationship may be operating.

The desire to interpret data may further inspire the creation of statistics, such as measures of center and spread. These measures are a further step of abstraction beyond the objects and events originally observed. Even primary grade students can learn to consider the overall shape of data displays to make interpretations based on the “clumps” and “holes” in the data. Students often employ multiple criteria when trying to identify a “typical value” for a set of data. Many young students tend to favor the mode and justify their choice on the basis of repetition—if more than one student obtained this value, perhaps it is to be trusted. However, students tend to be less satisfied with modes if they do not appear near the center of the data, and they also shy away from measures of center that do not have several other values clustered near them (“part of a clump”). Understanding the mean requires an understanding of ratio, and if students are merely taught to “average” data in a procedural way without having a well-developed sense of ratio, their performance notoriously tends to degrade into “average stew”—eccentric procedures for adding and dividing things that make no sense (Strauss and Bichler, 1988). With good instruction, middle and upper elementary students can simultaneously consider the center and the spread of the data. Students can also generate various forms of mathematical descriptions of error, especially in contexts of measurement, where they can readily grasp the relationships between their own participation in the act of measuring and the resulting variation in measures (Petrosino, Lehrer, and Schauble, 2003).

Scale Models, Diagrams, and Maps

Although data representations are central to science, they are not, of course, the only representations students need to use and understand. Perhaps the most easily interpretable form of representation widely used in science is scale models. Physical models of this kind are used in science education to make it possible for students to visualize objects or processes that are at a scale that makes their direct perception impossible or, alternatively, that permits them to directly manipulate something that otherwise

they could not handle. The ease or difficulty with which students understand these models depends on the complexity of the relationships being communicated. Even preschoolers can understand scale models used to depict location in a room (DeLoache, 2004). Primary grade students can pretty readily overcome the influence of the appearance of the model to focus on and investigate the way it functions (Penner et al., 1997), but middle school students (and some adults) struggle to work out the positional relationships of the earth, the sun, and the moon, which involves not only reconciling different perspectives with respect to perspective and frame (what one sees standing on the earth, what one would see from a hypothetical point in space), but also visualizing how these perspectives would change over days and months (see, for example, the detailed curricular suggestions at the web site http://www.wcer.wisc.edu/ncisla/muse/ ).

Frequently, students are expected to read or produce diagrams, often integrating the information from the diagram with information from accompanying text (Hegarty and Just, 1993; Mayer, 1993). The comprehensibility of diagrams seems to be governed less by domain-general principles than by the specifics of the diagram and its viewer. Comprehensibility seems to vary with the complexity of what is portrayed, the particular diagrammatic details and features, and the prior knowledge of the user.

Diagrams can be difficult to understand for a host of reasons. Sometimes the desired information is missing in the first place; sometimes, features of the diagram unwittingly play into an incorrect preconception. For example, it has been suggested that the common student misconception that the earth is closer to the sun in the summer than in the winter may be due in part to the fact that two-dimensional representations of the three-dimensional orbit make it appear as if the foreshortened orbit is indeed closer to the sun at some points than at others.

Mayer (1993) proposes three common reasons why diagrams mis-communicate: some do not include explanatory information (they are illustrative or decorative rather than explanatory), some lack a causal chain, and some fail to map the explanation to a familiar or recognizable context. It is not clear that school students misperceive diagrams in ways that are fundamentally different from the perceptions of adults. There may be some diagrammatic conventions that are less familiar to children, and children may well have less knowledge about the phenomena being portrayed, but there is no reason to expect that adult novices would respond in fundamentally different ways. Although they have been studied for a much briefer period of time, the same is probably true of complex computer displays.

Finally, there is a growing developmental literature on students’ understanding of maps. Maps can be particularly confusing because they preserve some analog qualities of the space being represented (e.g., relative position and distance) but also omit or alter features of the landscape in ways that

require understanding of mapping conventions. Young children often initially confuse maps of the landscape with pictures of objects in the landscape. It is much easier for youngsters to represent objects than to represent large-scale space (which is the absence of or frame for objects). Students also may struggle with orientation, perspective (the traditional bird’s eye view), and mathematical descriptions of space, such as polar coordinate representations (Lehrer and Pritchard, 2002; Liben and Downs, 1993).


There is a common thread throughout the observations of this chapter that has deep implications for what one expects from children in grades K-8 and how their science learning should be structured. In almost all cases, the studies converge to the position that the skills under study develop with age, but also that this development is significantly enhanced by prior knowledge, experience, and instruction.

One of the continuing themes evident from studies on the development of scientific thinking is that children are far more competent than first suspected, and likewise that adults are less so. Young children experiment, but their experimentation is generally not systematic, and their observations as well as their inferences may be flawed. The progression of ability is seen with age, but it is not uniform, either across individuals or for a given individual. There is variation across individuals at the same age, as well as variation within single individuals in the strategies they use. Any given individual uses a collection of strategies, some more valid than others. Discovering a valid strategy does not mean that an individual, whether a child or an adult, will use the strategy consistently across all contexts. As Schauble (1996, p. 118) noted:

The complex and multifaceted nature of the skills involved in solving these problems, and the variability in performance, even among the adults, suggest that the developmental trajectory of the strategies and processes associated with scientific reasoning is likely to be a very long one, perhaps even lifelong . Previous research has established the existence of both early precursors and competencies … and errors and biases that persist regardless of maturation, training, and expertise.

One aspect of cognition that appears to be particularly important for supporting scientific thinking is awareness of one’s own thinking. Children may be less aware of their own memory limitations and therefore may be unsystematic in recording plans, designs, and outcomes, and they may fail to consult such records. Self-awareness of the cognitive strategies available is also important in order to determine when and why to employ various strategies. Finally, awareness of the status of one’s own knowledge, such as

recognizing the distinctions between theory and evidence, is important for reasoning in the context of scientific investigations. This last aspect of cognition is discussed in detail in the next chapter.

Prior knowledge, particularly beliefs about causality and plausibility, shape the approach to investigations in multiple ways. These beliefs influence which hypotheses are tested, how experiments are designed, and how evidence is evaluated. Characteristics of prior knowledge, such as its type, strength, and relevance, are potential determinants of how new evidence is evaluated and whether anomalies are noticed. Knowledge change occurs as a result of the encounter.

Finally, we conclude that experience and instruction are crucial mediators of the development of a broad range of scientific skills and of the degree of sophistication that children exhibit in applying these skills in new contexts. This means that time spent doing science in appropriately structured instructional frames is a crucial part of science education. It affects not only the level of skills that children develop, but also their ability to think about the quality of evidence and to interpret evidence presented to them. Students need instructional support and practice in order to become better at coordinating their prior theories and the evidence generated in investigations. Instructional support is also critical for developing skills for experimental design, record keeping during investigations, dealing with anomalous data, and modeling.

Ahn, W., Kalish, C.W., Medin, D.L., and Gelman, S.A. (1995). The role of covariation versus mechanism information in causal attribution. Cognition, 54, 299-352.

Amsel, E., and Brock, S. (1996). The development of evidence evaluation skills. Cognitive Development, 11 , 523-550.

Bisanz, J., and LeFevre, J. (1990). Strategic and nonstrategic processing in the development of mathematical cognition. In. D. Bjorklund (Ed.), Children’s strategies: Contemporary views of cognitive development (pp. 213-243). Hillsdale, NJ: Lawrence Erlbaum Associates.

Carey, S., Evans, R., Honda, M., Jay, E., and Unger, C. (1989). An experiment is when you try it and see if it works: A study of grade 7 students’ understanding of the construction of scientific knowledge. International Journal of Science Education, 11 , 514-529.

Chase, W.G., and Simon, H.A. (1973). The mind’s eye in chess. In W.G. Chase (Ed.), Visual information processing . New York: Academic.

Chen, Z., and Klahr, D. (1999). All other things being equal: Children’s acquisition of the control of variables strategy. Child Development, 70, 1098-1120.

Chi, M.T.H. (1996). Constructing self-explanations and scaffolded explanations in tutoring. Applied Cognitive Psychology, 10, 33-49.

Chi, M.T.H., de Leeuw, N., Chiu, M., and Lavancher, C. (1994). Eliciting self-explanations improves understanding. Cognitive Science, 18, 439-477.

Chinn, C.A., and Brewer, W.F. (1998). An empirical test of a taxonomy of responses to anomalous data in science. Journal of Research in Science Teaching, 35, 623-654.

Chinn, C.A., and Brewer, W. (2001). Model of data: A theory of how people evaluate data. Cognition and Instruction , 19 (3), 323-343.

Chinn, C.A., and Malhotra, B.A. (2001). Epistemologically authentic scientific reasoning. In K. Crowley, C.D. Schunn, and T. Okada (Eds.), Designing for science: Implications from everyday, classroom, and professional settings (pp. 351-392). Mahwah, NJ: Lawrence Erlbaum Associates.

Chinn, C.A., and Malhotra, B.A. (2002). Children’s responses to anomalous scientific data: How is conceptual change impeded? Journal of Educational Psychology, 94, 327-343.

DeLoache, J.S. (2004). Becoming symbol-minded. Trends in Cognitive Sciences, 8 , 66-70.

DiPerna, E. (2002). Data models of ourselves: Body self-portrait project. In R. Lehrer and L. Schauble (Eds.), Investigating real data in the classroom: Expanding children’s understanding of math and science. Ways of knowing in science and mathematics series . Willington, VT: Teachers College Press.

diSessa, A.A. (2004). Metarepresentation: Native competence and targets for instruction. Cognition and Instruction, 22 (3), 293-331.

Dunbar, K., and Klahr, D. (1989). Developmental differences in scientific discovery strategies. In D. Klahr and K. Kotovsky (Eds.), Complex information processing: The impact of Herbert A. Simon (pp. 109-143). Hillsdale, NJ: Lawrence Erlbaum Associates.

Echevarria, M. (2003). Anomalies as a catalyst for middle school students’ knowledge construction and scientific reasoning during science inquiry. Journal of Educational Psychology, 95, 357-374 .

Garcia-Mila, M., and Andersen, C. (2005). Developmental change in notetaking during scientific inquiry. Manuscript submitted for publication.

Gleason, M.E., and Schauble, L. (2000). Parents’ assistance of their children’s scientific reasoning. Cognition and Instruction, 17 (4), 343-378.

Goodwin, C. (2000). Introduction: Vision and inscription in practice. Mind, Culture, and Activity , 7 , 1-3.

Greeno, J., and Hall, R. (1997). Practicing representation: Learning with and about representational forms. Phi Delta Kappan, January, 361-367.

Hausmann, R., and Chi, M. (2002) Can a computer interface support self-explaining? The International Journal of Cognitive Technology , 7 (1).

Hegarty, M., and Just, A. (1993). Constructing mental models of machines from text and diagrams. Journal of Memory and Language , 32 , 717-742.

Inhelder, B., and Piaget, J. (1958). The growth of logical thinking from childhood to adolescence . New York: Basic Books.

Kahneman, D., Slovic, P, and Tversky, A. (1982). Judgment under uncertainty: Heuristics and biases . New York: Cambridge University Press.

Kanari, Z., and Millar, R. (2004). Reasoning from data: How students collect and interpret data in science investigations. Journal of Research in Science Teaching , 41 , 17.

Keselman, A. (2003). Supporting inquiry learning by promoting normative understanding of multivariable causality. Journal of Research in Science Teaching, 40, 898-921.

Keys, C.W. (1994). The development of scientific reasoning skills in conjunction with collaborative writing assignments: An interpretive study of six ninth-grade students. Journal of Research in Science Teaching, 31, 1003-1022.

Klaczynski, P.A. (2000). Motivated scientific reasoning biases, epistemological beliefs, and theory polarization: A two-process approach to adolescent cognition. Child Development , 71 (5), 1347-1366.

Klaczynski, P.A., and Narasimham, G. (1998). Development of scientific reasoning biases: Cognitive versus ego-protective explanations. Developmental Psychology, 34 (1), 175-187.

Klahr, D. (2000). Exploring science: The cognition and development of discovery processes. Cambridge, MA: MIT Press.

Klahr, D., and Carver, S.M. (1995). Scientific thinking about scientific thinking. Monographs of the Society for Research in Child Development, 60, 137-151.

Klahr, D., Chen, Z., and Toth, E.E. (2001). From cognition to instruction to cognition: A case study in elementary school science instruction. In K. Crowley, C.D. Schunn, and T. Okada (Eds.), Designing for science: Implications from everyday, classroom, and professional settings (pp. 209-250). Mahwah, NJ: Lawrence Erlbaum Associates.

Klahr, D., and Dunbar, K. (1988). Dual search space during scientific reasoning. Cognitive Science, 12, 1-48.

Klahr, D., Fay, A., and Dunbar, K. (1993). Heuristics for scientific experimentation: A developmental study. Cognitive Psychology, 25, 111-146.

Klahr, D., and Nigam, M. (2004). The equivalence of learning paths in early science instruction: Effects of direct instruction and discovery learning. Psychological Science, 15 (10), 661-667.

Klahr, D., and Robinson, M. (1981). Formal assessment of problem solving and planning processes in preschool children. Cognitive Psychology , 13 , 113-148.

Klayman, J., and Ha, Y. (1989). Hypothesis testing in rule discovery: Strategy, structure, and content. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15 (4), 596-604.

Kline, M. (1980). Mathematics: The loss of certainty . New York: Oxford University Press.

Konold, C. (1989). Informal conceptions of probability. Cognition and Instruction , 6 , 59-98.

Koslowski, B. (1996). Theory and evidence: The development of scientific reasoning. Cambridge, MA: MIT Press.

Koslowski, B., and Okagaki, L. (1986). Non-human indices of causation in problem-solving situations: Causal mechanisms, analogous effects, and the status of rival alternative accounts. Child Development, 57, 1100-1108.

Koslowski, B., Okagaki, L., Lorenz, C., and Umbach, D. (1989). When covariation is not enough: The role of causal mechanism, sampling method, and sample size in causal reasoning. Child Development, 60, 1316-1327.

Kuhn, D. (1989). Children and adults as intuitive scientists . Psychological Review, 96 , 674-689.

Kuhn, D. (2001). How do people know? Psychological Science, 12, 1-8.

Kuhn, D. (2002). What is scientific thinking and how does it develop? In U. Goswami (Ed.), Blackwell handbook of childhood cognitive development (pp. 371-393). Oxford, England: Blackwell.

Kuhn, D., Amsel, E., and O’Loughlin, M. (1988). The development of scientific thinking skills. Orlando, FL: Academic Press.

Kuhn, D., and Dean, D. (2005). Is developing scientific thinking all about learning to control variables? Psychological Science, 16 (11), 886-870.

Kuhn, D., and Franklin, S. (2006). The second decade: What develops (and how)? In W. Damon, R.M. Lerner, D. Kuhn, and R.S. Siegler (Eds.), Handbook of child psychology, volume 2, cognition, peception, and language, 6th edition (pp. 954-994). Hoboken, NJ: Wiley.

Kuhn, D., Garcia-Mila, M., Zohar, A., and Andersen, C. (1995). Strategies of knowledge acquisition. Monographs of the Society for Research in Child Development, Serial No. 245 (60), 4.

Kuhn, D., and Pearsall, S. (1998). Relations between metastrategic knowledge and strategic performance. Cognitive Development, 13, 227-247.

Kuhn, D., and Pearsall, S. (2000). Developmental origins of scientific thinking. Journal of Cognition and Development, 1, 113-129.

Kuhn, D., and Phelps, E. (1982). The development of problem-solving strategies. In H. Reese (Ed.), Advances in child development and behavior ( vol. 17, pp. 1-44). New York: Academic Press.

Kuhn, D., Schauble, L., and Garcia-Mila, M. (1992). Cross-domain development of scientific reasoning. Cognition and Instruction, 9, 285-327.

Kunda, Z. (1990). The case for motivated reasoning. Psychological Bulletin, 108, 480-498.

Larkin, J.H., McDermott, J., Simon, D.P, and Simon, H.A. (1980). Expert and novice performance in solving physics problems. Science , 208 , 1335-1342.

Latour, B. (1990). Drawing things together. In M. Lynch and S. Woolgar (Eds.), Representation in scientific practice (pp. 19-68). Cambridge, MA: MIT Press.

Lehrer, R. (2003). Developing understanding of measurement. In J. Kilpatrick, W.G. Martin, and D.E. Schifter (Eds.), A research companion to principles and standards for school mathematics (pp. 179-192). Reston, VA: National Council of Teachers of Mathematics.

Lehrer, R., Giles, N., and Schauble, L. (2002). Data modeling. In R. Lehrer and L. Schauble (Eds.), Investigating real data in the classroom: Expanding children’s understanding of math and science (pp. 1-26). New York: Teachers College Press.

Lehrer, R., and Pritchard, C. (2002). Symbolizing space into being. In K. Gravemeijer, R. Lehrer, B. van Oers, and L. Verschaffel (Eds.), Symbolization, modeling and tool use in mathematics education (pp. 59-86). Dordrecht, The Netherlands: Kluwer Academic.

Lehrer, R., and Romberg, T. (1996). Exploring children’s data modeling. Cognition and Instruction , 14 , 69-108.

Lehrer, R., and Schauble, L. (2000). The development of model-based reasoning. Journal of Applied Developmental Psychology, 21 (1), 39-48.

Lehrer, R., and Schauble, L. (2002). Symbolic communication in mathematics and science: Co-constituting inscription and thought. In E.D. Amsel and J. Byrnes (Eds.), Language, literacy, and cognitive development: The development and consequences of symbolic communicat i on (pp. 167-192). Mahwah, NJ: Lawrence Erlbaum Associates.

Lehrer, R., and Schauble, L. (2003). Origins and evolution of model-based reasoning in mathematics and science. In R. Lesh and H.M. Doerr (Eds.), Beyond constructivism: A models and modeling perspective on mathematics problem-solving, learning, and teaching (pp. 59-70). Mahwah, NJ: Lawrence Erlbaum Associates.

Lehrer, R., and Schauble, L., (2005). Developing modeling and argument in the elementary grades. In T.A. Rombert, T.P. Carpenter, and F. Dremock (Eds.), Understanding mathematics and science matters (Part II: Learning with understanding). Mahwah, NJ: Lawrence Erlbaum Associates.

Lehrer, R., and Schauble, L. (2006). Scientific thinking and science literacy. In W. Damon, R. Lerner, K.A. Renninger, and I.E. Sigel (Eds.), Handbook of child psychology, 6th edition (vol. 4). Hoboken, NJ: Wiley.

Lehrer, R., Schauble, L., Strom, D., and Pligge, M. (2001). Similarity of form and substance: Modeling material kind. In D. Klahr and S. Carver (Eds.), Cognition and instruction: 25 years of progress (pp. 39-74). Mahwah, NJ: Lawrence Erlbaum Associates.

Liben, L.S., and Downs, R.M. (1993). Understanding per son-space-map relations: Cartographic and developmental perspectives. Developmental Psychology, 29 , 739-752.

Linn, M.C. (1978). Influence of cognitive style and training on tasks requiring the separation of variables schema. Child Development , 49 , 874-877.

Linn, M.C. (1980). Teaching students to control variables: Some investigations using free choice experiences. In S. Modgil and C. Modgil (Eds.), Toward a theory of psychological development within the Piagettian framework . Windsor Berkshire, England: National Foundation for Educational Research.

Linn, M.C., Chen, B., and Thier, H.S. (1977). Teaching children to control variables: Investigations of a free choice environment. Journal of Research in Science Teaching , 14 , 249-255.

Linn, M.C., and Levine, D.I. (1978). Adolescent reasoning: Influence of question format and type of variables on ability to control variables. Science Education , 62 (3), 377-388.

Lovett, M.C., and Anderson, J.R. (1995). Making heads or tails out of selecting problem-solving strategies. In J.D. Moore and J.F. Lehman (Eds.), Proceedings of the seventieth annual conference of the Cognitive Science Society (pp. 265-270). Hillsdale, NJ: Lawrence Erlbaum Associates.

Lovett, M.C., and Anderson, J.R. (1996). History of success and current context in problem solving. Cognitive Psychology , 31 (2), 168-217.

Masnick, A.M., and Klahr, D. (2003). Error matters: An initial exploration of elementary school children’s understanding of experimental error. Journal of Cognition and Development, 4 , 67-98.

Mayer, R. (1993). Illustrations that instruct. In R. Glaser (Ed.), Advances in instructional psychology (vol. 4, pp. 253-284). Hillsdale, NJ: Lawrence Erlbaum Associates.

McClain, K., Cobb, P., Gravemeijer, K., and Estes, B. (1999). Developing mathematical reasoning within the context of measurement. In L. Stiff (Ed.), Developing mathematical reasoning, K-12 (pp. 93-106). Reston, VA: National Council of Teachers of Mathematics.

McNay, M., and Melville, K.W. (1993). Children’s skill in making predictions and their understanding of what predicting means: A developmental study. Journal of Research in Science Teaching , 30, 561-577.

Metz, K.E. (2004). Children’s understanding of scientific inquiry: Their conceptualization of uncertainty in investigations of their own design. Cognition and Instruction, 22( 2), 219-290.

Mokros, J., and Russell, S. (1995). Children’s concepts of average and representativeness. Journal for Research in Mathematics Education, 26 (1), 20-39.

National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics. Reston, VA: Author.

Nisbett, R.E., Krantz, D.H., Jepson, C., and Kind, Z. (1983). The use of statistical heuristics in everyday inductive reasoning. Psychological Review, 90 , 339-363.

Penner, D., Giles, N.D., Lehrer, R., and Schauble, L. (1997). Building functional models: Designing an elbow. Journal of Research in Science Teaching, 34(2) , 125-143.

Penner, D.E., and Klahr, D. (1996a). The interaction of domain-specific knowledge and domain-general discovery strategies: A study with sinking objects. Child Development, 67, 2709-2727.

Penner, D.E., and Klahr, D. (1996b). When to trust the data: Further investigations of system error in a scientific reasoning task. Memory and Cognition, 24, 655-668 .

Perfetti, CA. (1992). The representation problem in reading acquisition. In P.B. Gough, L.C. Ehri, and R. Treiman (Eds.), Reading acquisition (pp. 145-174). Hillsdale, NJ: Lawrence Erlbaum Associates.

Petrosino, A., Lehrer, R., and Schauble, L. (2003). Structuring error and experimental variation as distribution in the fourth grade. Mathematical Thinking and Learning, 5 (2-3), 131-156.

Pollatsek, A., Lima, S., and Well, A.D. (1981). Concept or computation: Students’ misconceptions of the mean. Educational Studies in Mathematics , 12, 191-204.

Ruffman, T., Perner, I., Olson, D.R., and Doherty, M. (1993). Reflecting on scientific thinking: Children’s understanding of the hypothesis-evidence relation. Child Development, 64 (6), 1617-1636.

Schauble, L. (1990). Belief revision in children: The role of prior knowledge and strategies for generating evidence. Journal of Experimental Child Psychology , 49 (1), 31-57.

Schauble, L. (1996). The development of scientific reasoning in knowledge-rich contexts. Developmental Psychology , 32 (1), 102-119.

Schauble, L., Glaser, R., Duschl, R., Schulze, S., and John, J. (1995). Students’ understanding of the objectives and procedures of experimentation in the science classroom. Journal of the Learning Sciences , 4 (2), 131-166.

Schauble, L., Glaser, R., Raghavan, K., and Reiner, M. (1991). Causal models and experimentation strategies in scientific reasoning. Journal of the Learning Sciences , 1 (2), 201-238.

Schauble, L., Glaser, R., Raghavan, K., and Reiner, M. (1992). The integration of knowledge and experimentation strategies in understanding a physical system. Applied Cognitive Psychology , 6 , 321-343.

Schauble, L., Klopfer, L.E., and Raghavan, K. (1991). Students’ transition from an engineering model to a science model of experimentation. Journal of Research in Science Teaching , 28 (9), 859-882.

Siegler, R.S. (1987). The perils of averaging data over strategies: An example from children’s addition. Journal of Experimental Psychology: General, 116, 250-264 .

Siegler, R.S., and Alibali, M.W. (2005). Children’s thinking (4th ed.). Upper Saddle River, NJ: Prentice Hall.

Siegler, R.S., and Crowley, K. (1991). The microgenetic method: A direct means for studying cognitive development. American Psychologist , 46 , 606-620.

Siegler, R.S., and Jenkins, E. (1989). How children discover new strategies . Hillsdale, NJ: Lawrence Erlbaum Associates.

Siegler, R.S., and Liebert, R.M. (1975). Acquisition of formal experiment. Developmental Psychology , 11 , 401-412.

Siegler, R.S., and Shipley, C. (1995). Variation, selection, and cognitive change. In T. Simon and G. Halford (Eds.), Developing cognitive competence: New approaches to process modeling (pp. 31-76). Hillsdale, NJ: Lawrence Erlbaum Associates.

Simon, H.A. (1975). The functional equivalence of problem solving skills. Cognitive Psychology, 7 , 268-288.

Simon, H.A. (2001). Learning to research about learning. In S.M. Carver and D. Klahr (Eds.), Cognition and instruction: Twenty-five years of progress (pp. 205-226). Mahwah, NJ: Lawrence Erlbaum Associates.

Slowiaczek, L.M., Klayman, J., Sherman, S.J., and Skov, R.B. (1992). Information selection and use in hypothesis testing: What is a good question, and what is a good answer. Memory and Cognition, 20 (4), 392-405.

Sneider, C., Kurlich, K., Pulos, S., and Friedman, A. (1984). Learning to control variables with model rockets: A neo-Piagetian study of learning in field settings. Science Education , 68 (4), 463-484.

Sodian, B., Zaitchik, D., and Carey, S. (1991). Young children’s differentiation of hypothetical beliefs from evidence. Child Development, 62 (4), 753-766.

Stevens, R., and Hall, R. (1998). Disciplined perception: Learning to see in technoscience. In M. Lampert and M.L. Blunk (Eds.), Talking mathematics in school: Studies of teaching and learning (pp. 107-149). Cambridge, MA: Cambridge University Press.

Strauss, S., and Bichler, E. (1988). The development of children’s concepts of the arithmetic average. Journal for Research in Mathematics Education, 19 (1), 64-80.

Thagard, P. (1998a). Ulcers and bacteria I: Discovery and acceptance. Studies in History and Philosophy of Science. Part C: Studies in History and Philosophy of Biology and Biomedical Sciences, 29, 107-136.

Thagard, P. (1998b). Ulcers and bacteria II: Instruments, experiments, and social interactions. Studies in History and Philosophy of Science. Part C: Studies in History and Philosophy of Biology and Biomedical Sciences, 29 (2), 317-342.

Toth, E.E., Klahr, D., and Chen, Z. (2000). Bridging research and practice: A cognitively-based classroom intervention for teaching experimentation skills to elementary school children. Cognition and Instruction , 18 (4), 423-459.

Trafton, J.G., and Trickett, S.B. (2001). Note-taking for self-explanation and problem solving. Human-Computer Interaction, 16, 1-38.

Triona, L., and Klahr, D. (in press). The development of children’s abilities to produce external representations. In E. Teubal, J. Dockrell, and L. Tolchinsky (Eds.), Notational knowledge: Developmental and historical perspectives . Rotterdam, The Netherlands: Sense.

Varnhagen, C. (1995). Children’s spelling strategies. In V. Berninger (Ed.), The varieties of orthographic knowledge: Relationships to phonology, reading and writing (vol. 2, pp. 251-290). Dordrecht, The Netherlands: Kluwer Academic.

Warren, B., Rosebery, A., and Conant, F. (1994). Discourse and social practice: Learning science in language minority classrooms. In D. Spencer (Ed.), Adult biliteracy in the United States (pp. 191-210). McHenry, IL: Delta Systems.

Wolpert, L. (1993). The unnatural nature of science . London, England: Faber and Faber.

Zachos, P., Hick, T.L., Doane, W.E.I., and Sargent, C. (2000). Setting theoretical and empirical foundations for assessing scientific inquiry and discovery in educational programs. Journal of Research in Science Teaching, 37 (9), 938-962.

Zimmerman, C., Raghavan, K., and Sartoris, M.L. (2003). The impact of the MARS curriculum on students’ ability to coordinate theory and evidence. International Journal of Science Education, 25, 1247-1271.

What is science for a child? How do children learn about science and how to do science? Drawing on a vast array of work from neuroscience to classroom observation, Taking Science to School provides a comprehensive picture of what we know about teaching and learning science from kindergarten through eighth grade. By looking at a broad range of questions, this book provides a basic foundation for guiding science teaching and supporting students in their learning. Taking Science to School answers such questions as:

  • When do children begin to learn about science? Are there critical stages in a child's development of such scientific concepts as mass or animate objects?
  • What role does nonschool learning play in children's knowledge of science?
  • How can science education capitalize on children's natural curiosity?
  • What are the best tasks for books, lectures, and hands-on learning?
  • How can teachers be taught to teach science?

The book also provides a detailed examination of how we know what we know about children's learning of science—about the role of research and evidence. This book will be an essential resource for everyone involved in K-8 science education—teachers, principals, boards of education, teacher education providers and accreditors, education researchers, federal education agencies, and state and federal policy makers. It will also be a useful guide for parents and others interested in how children learn.


Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

Do you want to take a quick tour of the OpenBook's features?

Show this book's table of contents , where you can jump to any chapter by name.

...or use these buttons to go back to the previous chapter or skip to the next one.

Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

Switch between the Original Pages , where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

To search the entire text of this book, type in your search term here and press Enter .

Share a link to this book page on your preferred social network or via email.

View our suggested citation for this chapter.

Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

Get Email Updates

Do you enjoy reading reports from the Academies online for free ? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released.

Scientific Method

Illustration by J.R. Bee. ThoughtCo. 

  • Cell Biology
  • Weather & Climate
  • B.A., Biology, Emory University
  • A.S., Nursing, Chattahoochee Technical College

The scientific method is a series of steps followed by scientific investigators to answer specific questions about the natural world. It involves making observations, formulating a hypothesis , and conducting scientific experiments . Scientific inquiry starts with an observation followed by the formulation of a question about what has been observed. The steps of the scientific method are as follows:


The first step of the scientific method involves making an observation about something that interests you. This is very important if you are doing a science project because you want your project to be focused on something that will hold your attention. Your observation can be on anything from plant movement to animal behavior, as long as it is something you really want to know more about.​ This is where you come up with the idea for your science project.

Once you've made your observation, you must formulate a question about what you have observed. Your question should tell what it is that you are trying to discover or accomplish in your experiment. When stating your question you should be as specific as possible.​ For example, if you are doing a project on plants , you may want to know how plants interact with microbes. Your question may be: Do plant spices inhibit bacterial growth ?

The hypothesis is a key component of the scientific process. A hypothesis is an idea that is suggested as an explanation for a natural event, a particular experience, or a specific condition that can be tested through definable experimentation. It states the purpose of your experiment, the variables used, and the predicted outcome of your experiment. It is important to note that a hypothesis must be testable. That means that you should be able to test your hypothesis through experimentation .​ Your hypothesis must either be supported or falsified by your experiment. An example of a good hypothesis is: If there is a relation between listening to music and heart rate, then listening to music will cause a person's resting heart rate to either increase or decrease.

Once you've developed a hypothesis, you must design and conduct an experiment that will test it. You should develop a procedure that states very clearly how you plan to conduct your experiment. It is important that you include and identify a controlled variable or dependent variable in your procedure. Controls allow us to test a single variable in an experiment because they are unchanged. We can then make observations and comparisons between our controls and our independent variables (things that change in the experiment) to develop an accurate conclusion.​

The results are where you report what happened in the experiment. That includes detailing all observations and data made during your experiment. Most people find it easier to visualize the data by charting or graphing the information.​

The final step of the scientific method is developing a conclusion. This is where all of the results from the experiment are analyzed and a determination is reached about the hypothesis. Did the experiment support or reject your hypothesis? If your hypothesis was supported, great. If not, repeat the experiment or think of ways to improve your procedure.

  • Null Hypothesis Examples
  • Examples of Independent and Dependent Variables
  • The 10 Most Important Lab Safety Rules
  • Difference Between Independent and Dependent Variables
  • Six Steps of the Scientific Method
  • Scientific Method Flow Chart
  • What Is an Experiment? Definition and Design
  • Scientific Method Lesson Plan
  • How To Design a Science Fair Experiment
  • Science Projects for Every Subject
  • How to Do a Science Fair Project
  • What Are the Elements of a Good Hypothesis?
  • How to Write a Lab Report
  • What Is a Hypothesis? (Science)
  • Understanding Simple vs Controlled Experiments
  • Biology Science Fair Project Ideas

SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Entry Contents


Academic tools.

  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Scientific Method

Science is an enormously successful human enterprise. The study of scientific method is the attempt to discern the activities by which that success is achieved. Among the activities often identified as characteristic of science are systematic observation and experimentation, inductive and deductive reasoning, and the formation and testing of hypotheses and theories. How these are carried out in detail can vary greatly, but characteristics like these have been looked to as a way of demarcating scientific activity from non-science, where only enterprises which employ some canonical form of scientific method or methods should be considered science (see also the entry on science and pseudo-science ). Others have questioned whether there is anything like a fixed toolkit of methods which is common across science and only science. Some reject privileging one view of method as part of rejecting broader views about the nature of science, such as naturalism (Dupré 2004); some reject any restriction in principle (pluralism).

Scientific method should be distinguished from the aims and products of science, such as knowledge, predictions, or control. Methods are the means by which those goals are achieved. Scientific method should also be distinguished from meta-methodology, which includes the values and justifications behind a particular characterization of scientific method (i.e., a methodology) — values such as objectivity, reproducibility, simplicity, or past successes. Methodological rules are proposed to govern method and it is a meta-methodological question whether methods obeying those rules satisfy given values. Finally, method is distinct, to some degree, from the detailed and contextual practices through which methods are implemented. The latter might range over: specific laboratory techniques; mathematical formalisms or other specialized languages used in descriptions and reasoning; technological or other material means; ways of communicating and sharing results, whether with other scientists or with the public at large; or the conventions, habits, enforced customs, and institutional controls over how and what science is carried out.

While it is important to recognize these distinctions, their boundaries are fuzzy. Hence, accounts of method cannot be entirely divorced from their methodological and meta-methodological motivations or justifications, Moreover, each aspect plays a crucial role in identifying methods. Disputes about method have therefore played out at the detail, rule, and meta-rule levels. Changes in beliefs about the certainty or fallibility of scientific knowledge, for instance (which is a meta-methodological consideration of what we can hope for methods to deliver), have meant different emphases on deductive and inductive reasoning, or on the relative importance attached to reasoning over observation (i.e., differences over particular methods.) Beliefs about the role of science in society will affect the place one gives to values in scientific method.

The issue which has shaped debates over scientific method the most in the last half century is the question of how pluralist do we need to be about method? Unificationists continue to hold out for one method essential to science; nihilism is a form of radical pluralism, which considers the effectiveness of any methodological prescription to be so context sensitive as to render it not explanatory on its own. Some middle degree of pluralism regarding the methods embodied in scientific practice seems appropriate. But the details of scientific practice vary with time and place, from institution to institution, across scientists and their subjects of investigation. How significant are the variations for understanding science and its success? How much can method be abstracted from practice? This entry describes some of the attempts to characterize scientific method or methods, as well as arguments for a more context-sensitive approach to methods embedded in actual scientific practices.

1. Overview and organizing themes

2. historical review: aristotle to mill, 3.1 logical constructionism and operationalism, 3.2. h-d as a logic of confirmation, 3.3. popper and falsificationism, 3.4 meta-methodology and the end of method, 4. statistical methods for hypothesis testing, 5.1 creative and exploratory practices.

  • 5.2 Computer methods and the ‘new ways’ of doing science

6.1 “The scientific method” in science education and as seen by scientists

6.2 privileged methods and ‘gold standards’, 6.3 scientific method in the court room, 6.4 deviating practices, 7. conclusion, other internet resources, related entries.

This entry could have been given the title Scientific Methods and gone on to fill volumes, or it could have been extremely short, consisting of a brief summary rejection of the idea that there is any such thing as a unique Scientific Method at all. Both unhappy prospects are due to the fact that scientific activity varies so much across disciplines, times, places, and scientists that any account which manages to unify it all will either consist of overwhelming descriptive detail, or trivial generalizations.

The choice of scope for the present entry is more optimistic, taking a cue from the recent movement in philosophy of science toward a greater attention to practice: to what scientists actually do. This “turn to practice” can be seen as the latest form of studies of methods in science, insofar as it represents an attempt at understanding scientific activity, but through accounts that are neither meant to be universal and unified, nor singular and narrowly descriptive. To some extent, different scientists at different times and places can be said to be using the same method even though, in practice, the details are different.

Whether the context in which methods are carried out is relevant, or to what extent, will depend largely on what one takes the aims of science to be and what one’s own aims are. For most of the history of scientific methodology the assumption has been that the most important output of science is knowledge and so the aim of methodology should be to discover those methods by which scientific knowledge is generated.

Science was seen to embody the most successful form of reasoning (but which form?) to the most certain knowledge claims (but how certain?) on the basis of systematically collected evidence (but what counts as evidence, and should the evidence of the senses take precedence, or rational insight?) Section 2 surveys some of the history, pointing to two major themes. One theme is seeking the right balance between observation and reasoning (and the attendant forms of reasoning which employ them); the other is how certain scientific knowledge is or can be.

Section 3 turns to 20 th century debates on scientific method. In the second half of the 20 th century the epistemic privilege of science faced several challenges and many philosophers of science abandoned the reconstruction of the logic of scientific method. Views changed significantly regarding which functions of science ought to be captured and why. For some, the success of science was better identified with social or cultural features. Historical and sociological turns in the philosophy of science were made, with a demand that greater attention be paid to the non-epistemic aspects of science, such as sociological, institutional, material, and political factors. Even outside of those movements there was an increased specialization in the philosophy of science, with more and more focus on specific fields within science. The combined upshot was very few philosophers arguing any longer for a grand unified methodology of science. Sections 3 and 4 surveys the main positions on scientific method in 20 th century philosophy of science, focusing on where they differ in their preference for confirmation or falsification or for waiving the idea of a special scientific method altogether.

In recent decades, attention has primarily been paid to scientific activities traditionally falling under the rubric of method, such as experimental design and general laboratory practice, the use of statistics, the construction and use of models and diagrams, interdisciplinary collaboration, and science communication. Sections 4–6 attempt to construct a map of the current domains of the study of methods in science.

As these sections illustrate, the question of method is still central to the discourse about science. Scientific method remains a topic for education, for science policy, and for scientists. It arises in the public domain where the demarcation or status of science is at issue. Some philosophers have recently returned, therefore, to the question of what it is that makes science a unique cultural product. This entry will close with some of these recent attempts at discerning and encapsulating the activities by which scientific knowledge is achieved.

Attempting a history of scientific method compounds the vast scope of the topic. This section briefly surveys the background to modern methodological debates. What can be called the classical view goes back to antiquity, and represents a point of departure for later divergences. [ 1 ]

We begin with a point made by Laudan (1968) in his historical survey of scientific method:

Perhaps the most serious inhibition to the emergence of the history of theories of scientific method as a respectable area of study has been the tendency to conflate it with the general history of epistemology, thereby assuming that the narrative categories and classificatory pigeon-holes applied to the latter are also basic to the former. (1968: 5)

To see knowledge about the natural world as falling under knowledge more generally is an understandable conflation. Histories of theories of method would naturally employ the same narrative categories and classificatory pigeon holes. An important theme of the history of epistemology, for example, is the unification of knowledge, a theme reflected in the question of the unification of method in science. Those who have identified differences in kinds of knowledge have often likewise identified different methods for achieving that kind of knowledge (see the entry on the unity of science ).

Different views on what is known, how it is known, and what can be known are connected. Plato distinguished the realms of things into the visible and the intelligible ( The Republic , 510a, in Cooper 1997). Only the latter, the Forms, could be objects of knowledge. The intelligible truths could be known with the certainty of geometry and deductive reasoning. What could be observed of the material world, however, was by definition imperfect and deceptive, not ideal. The Platonic way of knowledge therefore emphasized reasoning as a method, downplaying the importance of observation. Aristotle disagreed, locating the Forms in the natural world as the fundamental principles to be discovered through the inquiry into nature ( Metaphysics Z , in Barnes 1984).

Aristotle is recognized as giving the earliest systematic treatise on the nature of scientific inquiry in the western tradition, one which embraced observation and reasoning about the natural world. In the Prior and Posterior Analytics , Aristotle reflects first on the aims and then the methods of inquiry into nature. A number of features can be found which are still considered by most to be essential to science. For Aristotle, empiricism, careful observation (but passive observation, not controlled experiment), is the starting point. The aim is not merely recording of facts, though. For Aristotle, science ( epistêmê ) is a body of properly arranged knowledge or learning—the empirical facts, but also their ordering and display are of crucial importance. The aims of discovery, ordering, and display of facts partly determine the methods required of successful scientific inquiry. Also determinant is the nature of the knowledge being sought, and the explanatory causes proper to that kind of knowledge (see the discussion of the four causes in the entry on Aristotle on causality ).

In addition to careful observation, then, scientific method requires a logic as a system of reasoning for properly arranging, but also inferring beyond, what is known by observation. Methods of reasoning may include induction, prediction, or analogy, among others. Aristotle’s system (along with his catalogue of fallacious reasoning) was collected under the title the Organon . This title would be echoed in later works on scientific reasoning, such as Novum Organon by Francis Bacon, and Novum Organon Restorum by William Whewell (see below). In Aristotle’s Organon reasoning is divided primarily into two forms, a rough division which persists into modern times. The division, known most commonly today as deductive versus inductive method, appears in other eras and methodologies as analysis/​synthesis, non-ampliative/​ampliative, or even confirmation/​verification. The basic idea is there are two “directions” to proceed in our methods of inquiry: one away from what is observed, to the more fundamental, general, and encompassing principles; the other, from the fundamental and general to instances or implications of principles.

The basic aim and method of inquiry identified here can be seen as a theme running throughout the next two millennia of reflection on the correct way to seek after knowledge: carefully observe nature and then seek rules or principles which explain or predict its operation. The Aristotelian corpus provided the framework for a commentary tradition on scientific method independent of science itself (cosmos versus physics.) During the medieval period, figures such as Albertus Magnus (1206–1280), Thomas Aquinas (1225–1274), Robert Grosseteste (1175–1253), Roger Bacon (1214/1220–1292), William of Ockham (1287–1347), Andreas Vesalius (1514–1546), Giacomo Zabarella (1533–1589) all worked to clarify the kind of knowledge obtainable by observation and induction, the source of justification of induction, and best rules for its application. [ 2 ] Many of their contributions we now think of as essential to science (see also Laudan 1968). As Aristotle and Plato had employed a framework of reasoning either “to the forms” or “away from the forms”, medieval thinkers employed directions away from the phenomena or back to the phenomena. In analysis, a phenomena was examined to discover its basic explanatory principles; in synthesis, explanations of a phenomena were constructed from first principles.

During the Scientific Revolution these various strands of argument, experiment, and reason were forged into a dominant epistemic authority. The 16 th –18 th centuries were a period of not only dramatic advance in knowledge about the operation of the natural world—advances in mechanical, medical, biological, political, economic explanations—but also of self-awareness of the revolutionary changes taking place, and intense reflection on the source and legitimation of the method by which the advances were made. The struggle to establish the new authority included methodological moves. The Book of Nature, according to the metaphor of Galileo Galilei (1564–1642) or Francis Bacon (1561–1626), was written in the language of mathematics, of geometry and number. This motivated an emphasis on mathematical description and mechanical explanation as important aspects of scientific method. Through figures such as Henry More and Ralph Cudworth, a neo-Platonic emphasis on the importance of metaphysical reflection on nature behind appearances, particularly regarding the spiritual as a complement to the purely mechanical, remained an important methodological thread of the Scientific Revolution (see the entries on Cambridge platonists ; Boyle ; Henry More ; Galileo ).

In Novum Organum (1620), Bacon was critical of the Aristotelian method for leaping from particulars to universals too quickly. The syllogistic form of reasoning readily mixed those two types of propositions. Bacon aimed at the invention of new arts, principles, and directions. His method would be grounded in methodical collection of observations, coupled with correction of our senses (and particularly, directions for the avoidance of the Idols, as he called them, kinds of systematic errors to which naïve observers are prone.) The community of scientists could then climb, by a careful, gradual and unbroken ascent, to reliable general claims.

Bacon’s method has been criticized as impractical and too inflexible for the practicing scientist. Whewell would later criticize Bacon in his System of Logic for paying too little attention to the practices of scientists. It is hard to find convincing examples of Bacon’s method being put in to practice in the history of science, but there are a few who have been held up as real examples of 16 th century scientific, inductive method, even if not in the rigid Baconian mold: figures such as Robert Boyle (1627–1691) and William Harvey (1578–1657) (see the entry on Bacon ).

It is to Isaac Newton (1642–1727), however, that historians of science and methodologists have paid greatest attention. Given the enormous success of his Principia Mathematica and Opticks , this is understandable. The study of Newton’s method has had two main thrusts: the implicit method of the experiments and reasoning presented in the Opticks, and the explicit methodological rules given as the Rules for Philosophising (the Regulae) in Book III of the Principia . [ 3 ] Newton’s law of gravitation, the linchpin of his new cosmology, broke with explanatory conventions of natural philosophy, first for apparently proposing action at a distance, but more generally for not providing “true”, physical causes. The argument for his System of the World ( Principia , Book III) was based on phenomena, not reasoned first principles. This was viewed (mainly on the continent) as insufficient for proper natural philosophy. The Regulae counter this objection, re-defining the aims of natural philosophy by re-defining the method natural philosophers should follow. (See the entry on Newton’s philosophy .)

To his list of methodological prescriptions should be added Newton’s famous phrase “ hypotheses non fingo ” (commonly translated as “I frame no hypotheses”.) The scientist was not to invent systems but infer explanations from observations, as Bacon had advocated. This would come to be known as inductivism. In the century after Newton, significant clarifications of the Newtonian method were made. Colin Maclaurin (1698–1746), for instance, reconstructed the essential structure of the method as having complementary analysis and synthesis phases, one proceeding away from the phenomena in generalization, the other from the general propositions to derive explanations of new phenomena. Denis Diderot (1713–1784) and editors of the Encyclopédie did much to consolidate and popularize Newtonianism, as did Francesco Algarotti (1721–1764). The emphasis was often the same, as much on the character of the scientist as on their process, a character which is still commonly assumed. The scientist is humble in the face of nature, not beholden to dogma, obeys only his eyes, and follows the truth wherever it leads. It was certainly Voltaire (1694–1778) and du Chatelet (1706–1749) who were most influential in propagating the latter vision of the scientist and their craft, with Newton as hero. Scientific method became a revolutionary force of the Enlightenment. (See also the entries on Newton , Leibniz , Descartes , Boyle , Hume , enlightenment , as well as Shank 2008 for a historical overview.)

Not all 18 th century reflections on scientific method were so celebratory. Famous also are George Berkeley’s (1685–1753) attack on the mathematics of the new science, as well as the over-emphasis of Newtonians on observation; and David Hume’s (1711–1776) undermining of the warrant offered for scientific claims by inductive justification (see the entries on: George Berkeley ; David Hume ; Hume’s Newtonianism and Anti-Newtonianism ). Hume’s problem of induction motivated Immanuel Kant (1724–1804) to seek new foundations for empirical method, though as an epistemic reconstruction, not as any set of practical guidelines for scientists. Both Hume and Kant influenced the methodological reflections of the next century, such as the debate between Mill and Whewell over the certainty of inductive inferences in science.

The debate between John Stuart Mill (1806–1873) and William Whewell (1794–1866) has become the canonical methodological debate of the 19 th century. Although often characterized as a debate between inductivism and hypothetico-deductivism, the role of the two methods on each side is actually more complex. On the hypothetico-deductive account, scientists work to come up with hypotheses from which true observational consequences can be deduced—hence, hypothetico-deductive. Because Whewell emphasizes both hypotheses and deduction in his account of method, he can be seen as a convenient foil to the inductivism of Mill. However, equally if not more important to Whewell’s portrayal of scientific method is what he calls the “fundamental antithesis”. Knowledge is a product of the objective (what we see in the world around us) and subjective (the contributions of our mind to how we perceive and understand what we experience, which he called the Fundamental Ideas). Both elements are essential according to Whewell, and he was therefore critical of Kant for too much focus on the subjective, and John Locke (1632–1704) and Mill for too much focus on the senses. Whewell’s fundamental ideas can be discipline relative. An idea can be fundamental even if it is necessary for knowledge only within a given scientific discipline (e.g., chemical affinity for chemistry). This distinguishes fundamental ideas from the forms and categories of intuition of Kant. (See the entry on Whewell .)

Clarifying fundamental ideas would therefore be an essential part of scientific method and scientific progress. Whewell called this process “Discoverer’s Induction”. It was induction, following Bacon or Newton, but Whewell sought to revive Bacon’s account by emphasising the role of ideas in the clear and careful formulation of inductive hypotheses. Whewell’s induction is not merely the collecting of objective facts. The subjective plays a role through what Whewell calls the Colligation of Facts, a creative act of the scientist, the invention of a theory. A theory is then confirmed by testing, where more facts are brought under the theory, called the Consilience of Inductions. Whewell felt that this was the method by which the true laws of nature could be discovered: clarification of fundamental concepts, clever invention of explanations, and careful testing. Mill, in his critique of Whewell, and others who have cast Whewell as a fore-runner of the hypothetico-deductivist view, seem to have under-estimated the importance of this discovery phase in Whewell’s understanding of method (Snyder 1997a,b, 1999). Down-playing the discovery phase would come to characterize methodology of the early 20 th century (see section 3 ).

Mill, in his System of Logic , put forward a narrower view of induction as the essence of scientific method. For Mill, induction is the search first for regularities among events. Among those regularities, some will continue to hold for further observations, eventually gaining the status of laws. One can also look for regularities among the laws discovered in a domain, i.e., for a law of laws. Which “law law” will hold is time and discipline dependent and open to revision. One example is the Law of Universal Causation, and Mill put forward specific methods for identifying causes—now commonly known as Mill’s methods. These five methods look for circumstances which are common among the phenomena of interest, those which are absent when the phenomena are, or those for which both vary together. Mill’s methods are still seen as capturing basic intuitions about experimental methods for finding the relevant explanatory factors ( System of Logic (1843), see Mill entry). The methods advocated by Whewell and Mill, in the end, look similar. Both involve inductive generalization to covering laws. They differ dramatically, however, with respect to the necessity of the knowledge arrived at; that is, at the meta-methodological level (see the entries on Whewell and Mill entries).

3. Logic of method and critical responses

The quantum and relativistic revolutions in physics in the early 20 th century had a profound effect on methodology. Conceptual foundations of both theories were taken to show the defeasibility of even the most seemingly secure intuitions about space, time and bodies. Certainty of knowledge about the natural world was therefore recognized as unattainable. Instead a renewed empiricism was sought which rendered science fallible but still rationally justifiable.

Analyses of the reasoning of scientists emerged, according to which the aspects of scientific method which were of primary importance were the means of testing and confirming of theories. A distinction in methodology was made between the contexts of discovery and justification. The distinction could be used as a wedge between the particularities of where and how theories or hypotheses are arrived at, on the one hand, and the underlying reasoning scientists use (whether or not they are aware of it) when assessing theories and judging their adequacy on the basis of the available evidence. By and large, for most of the 20 th century, philosophy of science focused on the second context, although philosophers differed on whether to focus on confirmation or refutation as well as on the many details of how confirmation or refutation could or could not be brought about. By the mid-20 th century these attempts at defining the method of justification and the context distinction itself came under pressure. During the same period, philosophy of science developed rapidly, and from section 4 this entry will therefore shift from a primarily historical treatment of the scientific method towards a primarily thematic one.

Advances in logic and probability held out promise of the possibility of elaborate reconstructions of scientific theories and empirical method, the best example being Rudolf Carnap’s The Logical Structure of the World (1928). Carnap attempted to show that a scientific theory could be reconstructed as a formal axiomatic system—that is, a logic. That system could refer to the world because some of its basic sentences could be interpreted as observations or operations which one could perform to test them. The rest of the theoretical system, including sentences using theoretical or unobservable terms (like electron or force) would then either be meaningful because they could be reduced to observations, or they had purely logical meanings (called analytic, like mathematical identities). This has been referred to as the verifiability criterion of meaning. According to the criterion, any statement not either analytic or verifiable was strictly meaningless. Although the view was endorsed by Carnap in 1928, he would later come to see it as too restrictive (Carnap 1956). Another familiar version of this idea is operationalism of Percy William Bridgman. In The Logic of Modern Physics (1927) Bridgman asserted that every physical concept could be defined in terms of the operations one would perform to verify the application of that concept. Making good on the operationalisation of a concept even as simple as length, however, can easily become enormously complex (for measuring very small lengths, for instance) or impractical (measuring large distances like light years.)

Carl Hempel’s (1950, 1951) criticisms of the verifiability criterion of meaning had enormous influence. He pointed out that universal generalizations, such as most scientific laws, were not strictly meaningful on the criterion. Verifiability and operationalism both seemed too restrictive to capture standard scientific aims and practice. The tenuous connection between these reconstructions and actual scientific practice was criticized in another way. In both approaches, scientific methods are instead recast in methodological roles. Measurements, for example, were looked to as ways of giving meanings to terms. The aim of the philosopher of science was not to understand the methods per se , but to use them to reconstruct theories, their meanings, and their relation to the world. When scientists perform these operations, however, they will not report that they are doing them to give meaning to terms in a formal axiomatic system. This disconnect between methodology and the details of actual scientific practice would seem to violate the empiricism the Logical Positivists and Bridgman were committed to. The view that methodology should correspond to practice (to some extent) has been called historicism, or intuitionism. We turn to these criticisms and responses in section 3.4 . [ 4 ]

Positivism also had to contend with the recognition that a purely inductivist approach, along the lines of Bacon-Newton-Mill, was untenable. There was no pure observation, for starters. All observation was theory laden. Theory is required to make any observation, therefore not all theory can be derived from observation alone. (See the entry on theory and observation in science .) Even granting an observational basis, Hume had already pointed out that one could not deductively justify inductive conclusions without begging the question by presuming the success of the inductive method. Likewise, positivist attempts at analyzing how a generalization can be confirmed by observations of its instances were subject to a number of criticisms. Goodman (1965) and Hempel (1965) both point to paradoxes inherent in standard accounts of confirmation. Recent attempts at explaining how observations can serve to confirm a scientific theory are discussed in section 4 below.

The standard starting point for a non-inductive analysis of the logic of confirmation is known as the Hypothetico-Deductive (H-D) method. In its simplest form, a sentence of a theory which expresses some hypothesis is confirmed by its true consequences. As noted in section 2 , this method had been advanced by Whewell in the 19 th century, as well as Nicod (1924) and others in the 20 th century. Often, Hempel’s (1966) description of the H-D method, illustrated by the case of Semmelweiss’ inferential procedures in establishing the cause of childbed fever, has been presented as a key account of H-D as well as a foil for criticism of the H-D account of confirmation (see, for example, Lipton’s (2004) discussion of inference to the best explanation; also the entry on confirmation ). Hempel described Semmelsweiss’ procedure as examining various hypotheses explaining the cause of childbed fever. Some hypotheses conflicted with observable facts and could be rejected as false immediately. Others needed to be tested experimentally by deducing which observable events should follow if the hypothesis were true (what Hempel called the test implications of the hypothesis), then conducting an experiment and observing whether or not the test implications occurred. If the experiment showed the test implication to be false, the hypothesis could be rejected. If the experiment showed the test implications to be true, however, this did not prove the hypothesis true. The confirmation of a test implication does not verify a hypothesis, though Hempel did allow that “it provides at least some support, some corroboration or confirmation for it” (Hempel 1966: 8). The degree of this support then depends on the quantity, variety and precision of the supporting evidence.

Another approach that took off from the difficulties with inductive inference was Karl Popper’s critical rationalism or falsificationism (Popper 1959, 1963). Falsification is deductive and similar to H-D in that it involves scientists deducing observational consequences from the hypothesis under test. For Popper, however, the important point was not the degree of confirmation that successful prediction offered to a hypothesis. The crucial thing was the logical asymmetry between confirmation, based on inductive inference, and falsification, which can be based on a deductive inference. (This simple opposition was later questioned, by Lakatos, among others. See the entry on historicist theories of scientific rationality. )

Popper stressed that, regardless of the amount of confirming evidence, we can never be certain that a hypothesis is true without committing the fallacy of affirming the consequent. Instead, Popper introduced the notion of corroboration as a measure for how well a theory or hypothesis has survived previous testing—but without implying that this is also a measure for the probability that it is true.

Popper was also motivated by his doubts about the scientific status of theories like the Marxist theory of history or psycho-analysis, and so wanted to demarcate between science and pseudo-science. Popper saw this as an importantly different distinction than demarcating science from metaphysics. The latter demarcation was the primary concern of many logical empiricists. Popper used the idea of falsification to draw a line instead between pseudo and proper science. Science was science because its method involved subjecting theories to rigorous tests which offered a high probability of failing and thus refuting the theory.

A commitment to the risk of failure was important. Avoiding falsification could be done all too easily. If a consequence of a theory is inconsistent with observations, an exception can be added by introducing auxiliary hypotheses designed explicitly to save the theory, so-called ad hoc modifications. This Popper saw done in pseudo-science where ad hoc theories appeared capable of explaining anything in their field of application. In contrast, science is risky. If observations showed the predictions from a theory to be wrong, the theory would be refuted. Hence, scientific hypotheses must be falsifiable. Not only must there exist some possible observation statement which could falsify the hypothesis or theory, were it observed, (Popper called these the hypothesis’ potential falsifiers) it is crucial to the Popperian scientific method that such falsifications be sincerely attempted on a regular basis.

The more potential falsifiers of a hypothesis, the more falsifiable it would be, and the more the hypothesis claimed. Conversely, hypotheses without falsifiers claimed very little or nothing at all. Originally, Popper thought that this meant the introduction of ad hoc hypotheses only to save a theory should not be countenanced as good scientific method. These would undermine the falsifiabililty of a theory. However, Popper later came to recognize that the introduction of modifications (immunizations, he called them) was often an important part of scientific development. Responding to surprising or apparently falsifying observations often generated important new scientific insights. Popper’s own example was the observed motion of Uranus which originally did not agree with Newtonian predictions. The ad hoc hypothesis of an outer planet explained the disagreement and led to further falsifiable predictions. Popper sought to reconcile the view by blurring the distinction between falsifiable and not falsifiable, and speaking instead of degrees of testability (Popper 1985: 41f.).

From the 1960s on, sustained meta-methodological criticism emerged that drove philosophical focus away from scientific method. A brief look at those criticisms follows, with recommendations for further reading at the end of the entry.

Thomas Kuhn’s The Structure of Scientific Revolutions (1962) begins with a well-known shot across the bow for philosophers of science:

History, if viewed as a repository for more than anecdote or chronology, could produce a decisive transformation in the image of science by which we are now possessed. (1962: 1)

The image Kuhn thought needed transforming was the a-historical, rational reconstruction sought by many of the Logical Positivists, though Carnap and other positivists were actually quite sympathetic to Kuhn’s views. (See the entry on the Vienna Circle .) Kuhn shares with other of his contemporaries, such as Feyerabend and Lakatos, a commitment to a more empirical approach to philosophy of science. Namely, the history of science provides important data, and necessary checks, for philosophy of science, including any theory of scientific method.

The history of science reveals, according to Kuhn, that scientific development occurs in alternating phases. During normal science, the members of the scientific community adhere to the paradigm in place. Their commitment to the paradigm means a commitment to the puzzles to be solved and the acceptable ways of solving them. Confidence in the paradigm remains so long as steady progress is made in solving the shared puzzles. Method in this normal phase operates within a disciplinary matrix (Kuhn’s later concept of a paradigm) which includes standards for problem solving, and defines the range of problems to which the method should be applied. An important part of a disciplinary matrix is the set of values which provide the norms and aims for scientific method. The main values that Kuhn identifies are prediction, problem solving, simplicity, consistency, and plausibility.

An important by-product of normal science is the accumulation of puzzles which cannot be solved with resources of the current paradigm. Once accumulation of these anomalies has reached some critical mass, it can trigger a communal shift to a new paradigm and a new phase of normal science. Importantly, the values that provide the norms and aims for scientific method may have transformed in the meantime. Method may therefore be relative to discipline, time or place

Feyerabend also identified the aims of science as progress, but argued that any methodological prescription would only stifle that progress (Feyerabend 1988). His arguments are grounded in re-examining accepted “myths” about the history of science. Heroes of science, like Galileo, are shown to be just as reliant on rhetoric and persuasion as they are on reason and demonstration. Others, like Aristotle, are shown to be far more reasonable and far-reaching in their outlooks then they are given credit for. As a consequence, the only rule that could provide what he took to be sufficient freedom was the vacuous “anything goes”. More generally, even the methodological restriction that science is the best way to pursue knowledge, and to increase knowledge, is too restrictive. Feyerabend suggested instead that science might, in fact, be a threat to a free society, because it and its myth had become so dominant (Feyerabend 1978).

An even more fundamental kind of criticism was offered by several sociologists of science from the 1970s onwards who rejected the methodology of providing philosophical accounts for the rational development of science and sociological accounts of the irrational mistakes. Instead, they adhered to a symmetry thesis on which any causal explanation of how scientific knowledge is established needs to be symmetrical in explaining truth and falsity, rationality and irrationality, success and mistakes, by the same causal factors (see, e.g., Barnes and Bloor 1982, Bloor 1991). Movements in the Sociology of Science, like the Strong Programme, or in the social dimensions and causes of knowledge more generally led to extended and close examination of detailed case studies in contemporary science and its history. (See the entries on the social dimensions of scientific knowledge and social epistemology .) Well-known examinations by Latour and Woolgar (1979/1986), Knorr-Cetina (1981), Pickering (1984), Shapin and Schaffer (1985) seem to bear out that it was social ideologies (on a macro-scale) or individual interactions and circumstances (on a micro-scale) which were the primary causal factors in determining which beliefs gained the status of scientific knowledge. As they saw it therefore, explanatory appeals to scientific method were not empirically grounded.

A late, and largely unexpected, criticism of scientific method came from within science itself. Beginning in the early 2000s, a number of scientists attempting to replicate the results of published experiments could not do so. There may be close conceptual connection between reproducibility and method. For example, if reproducibility means that the same scientific methods ought to produce the same result, and all scientific results ought to be reproducible, then whatever it takes to reproduce a scientific result ought to be called scientific method. Space limits us to the observation that, insofar as reproducibility is a desired outcome of proper scientific method, it is not strictly a part of scientific method. (See the entry on reproducibility of scientific results .)

By the close of the 20 th century the search for the scientific method was flagging. Nola and Sankey (2000b) could introduce their volume on method by remarking that “For some, the whole idea of a theory of scientific method is yester-year’s debate …”.

Despite the many difficulties that philosophers encountered in trying to providing a clear methodology of conformation (or refutation), still important progress has been made on understanding how observation can provide evidence for a given theory. Work in statistics has been crucial for understanding how theories can be tested empirically, and in recent decades a huge literature has developed that attempts to recast confirmation in Bayesian terms. Here these developments can be covered only briefly, and we refer to the entry on confirmation for further details and references.

Statistics has come to play an increasingly important role in the methodology of the experimental sciences from the 19 th century onwards. At that time, statistics and probability theory took on a methodological role as an analysis of inductive inference, and attempts to ground the rationality of induction in the axioms of probability theory have continued throughout the 20 th century and in to the present. Developments in the theory of statistics itself, meanwhile, have had a direct and immense influence on the experimental method, including methods for measuring the uncertainty of observations such as the Method of Least Squares developed by Legendre and Gauss in the early 19 th century, criteria for the rejection of outliers proposed by Peirce by the mid-19 th century, and the significance tests developed by Gosset (a.k.a. “Student”), Fisher, Neyman & Pearson and others in the 1920s and 1930s (see, e.g., Swijtink 1987 for a brief historical overview; and also the entry on C.S. Peirce ).

These developments within statistics then in turn led to a reflective discussion among both statisticians and philosophers of science on how to perceive the process of hypothesis testing: whether it was a rigorous statistical inference that could provide a numerical expression of the degree of confidence in the tested hypothesis, or if it should be seen as a decision between different courses of actions that also involved a value component. This led to a major controversy among Fisher on the one side and Neyman and Pearson on the other (see especially Fisher 1955, Neyman 1956 and Pearson 1955, and for analyses of the controversy, e.g., Howie 2002, Marks 2000, Lenhard 2006). On Fisher’s view, hypothesis testing was a methodology for when to accept or reject a statistical hypothesis, namely that a hypothesis should be rejected by evidence if this evidence would be unlikely relative to other possible outcomes, given the hypothesis were true. In contrast, on Neyman and Pearson’s view, the consequence of error also had to play a role when deciding between hypotheses. Introducing the distinction between the error of rejecting a true hypothesis (type I error) and accepting a false hypothesis (type II error), they argued that it depends on the consequences of the error to decide whether it is more important to avoid rejecting a true hypothesis or accepting a false one. Hence, Fisher aimed for a theory of inductive inference that enabled a numerical expression of confidence in a hypothesis. To him, the important point was the search for truth, not utility. In contrast, the Neyman-Pearson approach provided a strategy of inductive behaviour for deciding between different courses of action. Here, the important point was not whether a hypothesis was true, but whether one should act as if it was.

Similar discussions are found in the philosophical literature. On the one side, Churchman (1948) and Rudner (1953) argued that because scientific hypotheses can never be completely verified, a complete analysis of the methods of scientific inference includes ethical judgments in which the scientists must decide whether the evidence is sufficiently strong or that the probability is sufficiently high to warrant the acceptance of the hypothesis, which again will depend on the importance of making a mistake in accepting or rejecting the hypothesis. Others, such as Jeffrey (1956) and Levi (1960) disagreed and instead defended a value-neutral view of science on which scientists should bracket their attitudes, preferences, temperament, and values when assessing the correctness of their inferences. For more details on this value-free ideal in the philosophy of science and its historical development, see Douglas (2009) and Howard (2003). For a broad set of case studies examining the role of values in science, see e.g. Elliott & Richards 2017.

In recent decades, philosophical discussions of the evaluation of probabilistic hypotheses by statistical inference have largely focused on Bayesianism that understands probability as a measure of a person’s degree of belief in an event, given the available information, and frequentism that instead understands probability as a long-run frequency of a repeatable event. Hence, for Bayesians probabilities refer to a state of knowledge, whereas for frequentists probabilities refer to frequencies of events (see, e.g., Sober 2008, chapter 1 for a detailed introduction to Bayesianism and frequentism as well as to likelihoodism). Bayesianism aims at providing a quantifiable, algorithmic representation of belief revision, where belief revision is a function of prior beliefs (i.e., background knowledge) and incoming evidence. Bayesianism employs a rule based on Bayes’ theorem, a theorem of the probability calculus which relates conditional probabilities. The probability that a particular hypothesis is true is interpreted as a degree of belief, or credence, of the scientist. There will also be a probability and a degree of belief that a hypothesis will be true conditional on a piece of evidence (an observation, say) being true. Bayesianism proscribes that it is rational for the scientist to update their belief in the hypothesis to that conditional probability should it turn out that the evidence is, in fact, observed (see, e.g., Sprenger & Hartmann 2019 for a comprehensive treatment of Bayesian philosophy of science). Originating in the work of Neyman and Person, frequentism aims at providing the tools for reducing long-run error rates, such as the error-statistical approach developed by Mayo (1996) that focuses on how experimenters can avoid both type I and type II errors by building up a repertoire of procedures that detect errors if and only if they are present. Both Bayesianism and frequentism have developed over time, they are interpreted in different ways by its various proponents, and their relations to previous criticism to attempts at defining scientific method are seen differently by proponents and critics. The literature, surveys, reviews and criticism in this area are vast and the reader is referred to the entries on Bayesian epistemology and confirmation .

5. Method in Practice

Attention to scientific practice, as we have seen, is not itself new. However, the turn to practice in the philosophy of science of late can be seen as a correction to the pessimism with respect to method in philosophy of science in later parts of the 20 th century, and as an attempted reconciliation between sociological and rationalist explanations of scientific knowledge. Much of this work sees method as detailed and context specific problem-solving procedures, and methodological analyses to be at the same time descriptive, critical and advisory (see Nickles 1987 for an exposition of this view). The following section contains a survey of some of the practice focuses. In this section we turn fully to topics rather than chronology.

A problem with the distinction between the contexts of discovery and justification that figured so prominently in philosophy of science in the first half of the 20 th century (see section 2 ) is that no such distinction can be clearly seen in scientific activity (see Arabatzis 2006). Thus, in recent decades, it has been recognized that study of conceptual innovation and change should not be confined to psychology and sociology of science, but are also important aspects of scientific practice which philosophy of science should address (see also the entry on scientific discovery ). Looking for the practices that drive conceptual innovation has led philosophers to examine both the reasoning practices of scientists and the wide realm of experimental practices that are not directed narrowly at testing hypotheses, that is, exploratory experimentation.

Examining the reasoning practices of historical and contemporary scientists, Nersessian (2008) has argued that new scientific concepts are constructed as solutions to specific problems by systematic reasoning, and that of analogy, visual representation and thought-experimentation are among the important reasoning practices employed. These ubiquitous forms of reasoning are reliable—but also fallible—methods of conceptual development and change. On her account, model-based reasoning consists of cycles of construction, simulation, evaluation and adaption of models that serve as interim interpretations of the target problem to be solved. Often, this process will lead to modifications or extensions, and a new cycle of simulation and evaluation. However, Nersessian also emphasizes that

creative model-based reasoning cannot be applied as a simple recipe, is not always productive of solutions, and even its most exemplary usages can lead to incorrect solutions. (Nersessian 2008: 11)

Thus, while on the one hand she agrees with many previous philosophers that there is no logic of discovery, discoveries can derive from reasoned processes, such that a large and integral part of scientific practice is

the creation of concepts through which to comprehend, structure, and communicate about physical phenomena …. (Nersessian 1987: 11)

Similarly, work on heuristics for discovery and theory construction by scholars such as Darden (1991) and Bechtel & Richardson (1993) present science as problem solving and investigate scientific problem solving as a special case of problem-solving in general. Drawing largely on cases from the biological sciences, much of their focus has been on reasoning strategies for the generation, evaluation, and revision of mechanistic explanations of complex systems.

Addressing another aspect of the context distinction, namely the traditional view that the primary role of experiments is to test theoretical hypotheses according to the H-D model, other philosophers of science have argued for additional roles that experiments can play. The notion of exploratory experimentation was introduced to describe experiments driven by the desire to obtain empirical regularities and to develop concepts and classifications in which these regularities can be described (Steinle 1997, 2002; Burian 1997; Waters 2007)). However the difference between theory driven experimentation and exploratory experimentation should not be seen as a sharp distinction. Theory driven experiments are not always directed at testing hypothesis, but may also be directed at various kinds of fact-gathering, such as determining numerical parameters. Vice versa , exploratory experiments are usually informed by theory in various ways and are therefore not theory-free. Instead, in exploratory experiments phenomena are investigated without first limiting the possible outcomes of the experiment on the basis of extant theory about the phenomena.

The development of high throughput instrumentation in molecular biology and neighbouring fields has given rise to a special type of exploratory experimentation that collects and analyses very large amounts of data, and these new ‘omics’ disciplines are often said to represent a break with the ideal of hypothesis-driven science (Burian 2007; Elliott 2007; Waters 2007; O’Malley 2007) and instead described as data-driven research (Leonelli 2012; Strasser 2012) or as a special kind of “convenience experimentation” in which many experiments are done simply because they are extraordinarily convenient to perform (Krohs 2012).

5.2 Computer methods and ‘new ways’ of doing science

The field of omics just described is possible because of the ability of computers to process, in a reasonable amount of time, the huge quantities of data required. Computers allow for more elaborate experimentation (higher speed, better filtering, more variables, sophisticated coordination and control), but also, through modelling and simulations, might constitute a form of experimentation themselves. Here, too, we can pose a version of the general question of method versus practice: does the practice of using computers fundamentally change scientific method, or merely provide a more efficient means of implementing standard methods?

Because computers can be used to automate measurements, quantifications, calculations, and statistical analyses where, for practical reasons, these operations cannot be otherwise carried out, many of the steps involved in reaching a conclusion on the basis of an experiment are now made inside a “black box”, without the direct involvement or awareness of a human. This has epistemological implications, regarding what we can know, and how we can know it. To have confidence in the results, computer methods are therefore subjected to tests of verification and validation.

The distinction between verification and validation is easiest to characterize in the case of computer simulations. In a typical computer simulation scenario computers are used to numerically integrate differential equations for which no analytic solution is available. The equations are part of the model the scientist uses to represent a phenomenon or system under investigation. Verifying a computer simulation means checking that the equations of the model are being correctly approximated. Validating a simulation means checking that the equations of the model are adequate for the inferences one wants to make on the basis of that model.

A number of issues related to computer simulations have been raised. The identification of validity and verification as the testing methods has been criticized. Oreskes et al. (1994) raise concerns that “validiation”, because it suggests deductive inference, might lead to over-confidence in the results of simulations. The distinction itself is probably too clean, since actual practice in the testing of simulations mixes and moves back and forth between the two (Weissart 1997; Parker 2008a; Winsberg 2010). Computer simulations do seem to have a non-inductive character, given that the principles by which they operate are built in by the programmers, and any results of the simulation follow from those in-built principles in such a way that those results could, in principle, be deduced from the program code and its inputs. The status of simulations as experiments has therefore been examined (Kaufmann and Smarr 1993; Humphreys 1995; Hughes 1999; Norton and Suppe 2001). This literature considers the epistemology of these experiments: what we can learn by simulation, and also the kinds of justifications which can be given in applying that knowledge to the “real” world. (Mayo 1996; Parker 2008b). As pointed out, part of the advantage of computer simulation derives from the fact that huge numbers of calculations can be carried out without requiring direct observation by the experimenter/​simulator. At the same time, many of these calculations are approximations to the calculations which would be performed first-hand in an ideal situation. Both factors introduce uncertainties into the inferences drawn from what is observed in the simulation.

For many of the reasons described above, computer simulations do not seem to belong clearly to either the experimental or theoretical domain. Rather, they seem to crucially involve aspects of both. This has led some authors, such as Fox Keller (2003: 200) to argue that we ought to consider computer simulation a “qualitatively different way of doing science”. The literature in general tends to follow Kaufmann and Smarr (1993) in referring to computer simulation as a “third way” for scientific methodology (theoretical reasoning and experimental practice are the first two ways.). It should also be noted that the debates around these issues have tended to focus on the form of computer simulation typical in the physical sciences, where models are based on dynamical equations. Other forms of simulation might not have the same problems, or have problems of their own (see the entry on computer simulations in science ).

In recent years, the rapid development of machine learning techniques has prompted some scholars to suggest that the scientific method has become “obsolete” (Anderson 2008, Carrol and Goodstein 2009). This has resulted in an intense debate on the relative merit of data-driven and hypothesis-driven research (for samples, see e.g. Mazzocchi 2015 or Succi and Coveney 2018). For a detailed treatment of this topic, we refer to the entry scientific research and big data .

6. Discourse on scientific method

Despite philosophical disagreements, the idea of the scientific method still figures prominently in contemporary discourse on many different topics, both within science and in society at large. Often, reference to scientific method is used in ways that convey either the legend of a single, universal method characteristic of all science, or grants to a particular method or set of methods privilege as a special ‘gold standard’, often with reference to particular philosophers to vindicate the claims. Discourse on scientific method also typically arises when there is a need to distinguish between science and other activities, or for justifying the special status conveyed to science. In these areas, the philosophical attempts at identifying a set of methods characteristic for scientific endeavors are closely related to the philosophy of science’s classical problem of demarcation (see the entry on science and pseudo-science ) and to the philosophical analysis of the social dimension of scientific knowledge and the role of science in democratic society.

One of the settings in which the legend of a single, universal scientific method has been particularly strong is science education (see, e.g., Bauer 1992; McComas 1996; Wivagg & Allchin 2002). [ 5 ] Often, ‘the scientific method’ is presented in textbooks and educational web pages as a fixed four or five step procedure starting from observations and description of a phenomenon and progressing over formulation of a hypothesis which explains the phenomenon, designing and conducting experiments to test the hypothesis, analyzing the results, and ending with drawing a conclusion. Such references to a universal scientific method can be found in educational material at all levels of science education (Blachowicz 2009), and numerous studies have shown that the idea of a general and universal scientific method often form part of both students’ and teachers’ conception of science (see, e.g., Aikenhead 1987; Osborne et al. 2003). In response, it has been argued that science education need to focus more on teaching about the nature of science, although views have differed on whether this is best done through student-led investigations, contemporary cases, or historical cases (Allchin, Andersen & Nielsen 2014)

Although occasionally phrased with reference to the H-D method, important historical roots of the legend in science education of a single, universal scientific method are the American philosopher and psychologist Dewey’s account of inquiry in How We Think (1910) and the British mathematician Karl Pearson’s account of science in Grammar of Science (1892). On Dewey’s account, inquiry is divided into the five steps of

(i) a felt difficulty, (ii) its location and definition, (iii) suggestion of a possible solution, (iv) development by reasoning of the bearing of the suggestions, (v) further observation and experiment leading to its acceptance or rejection. (Dewey 1910: 72)

Similarly, on Pearson’s account, scientific investigations start with measurement of data and observation of their correction and sequence from which scientific laws can be discovered with the aid of creative imagination. These laws have to be subject to criticism, and their final acceptance will have equal validity for “all normally constituted minds”. Both Dewey’s and Pearson’s accounts should be seen as generalized abstractions of inquiry and not restricted to the realm of science—although both Dewey and Pearson referred to their respective accounts as ‘the scientific method’.

Occasionally, scientists make sweeping statements about a simple and distinct scientific method, as exemplified by Feynman’s simplified version of a conjectures and refutations method presented, for example, in the last of his 1964 Cornell Messenger lectures. [ 6 ] However, just as often scientists have come to the same conclusion as recent philosophy of science that there is not any unique, easily described scientific method. For example, the physicist and Nobel Laureate Weinberg described in the paper “The Methods of Science … And Those By Which We Live” (1995) how

The fact that the standards of scientific success shift with time does not only make the philosophy of science difficult; it also raises problems for the public understanding of science. We do not have a fixed scientific method to rally around and defend. (1995: 8)

Interview studies with scientists on their conception of method shows that scientists often find it hard to figure out whether available evidence confirms their hypothesis, and that there are no direct translations between general ideas about method and specific strategies to guide how research is conducted (Schickore & Hangel 2019, Hangel & Schickore 2017)

Reference to the scientific method has also often been used to argue for the scientific nature or special status of a particular activity. Philosophical positions that argue for a simple and unique scientific method as a criterion of demarcation, such as Popperian falsification, have often attracted practitioners who felt that they had a need to defend their domain of practice. For example, references to conjectures and refutation as the scientific method are abundant in much of the literature on complementary and alternative medicine (CAM)—alongside the competing position that CAM, as an alternative to conventional biomedicine, needs to develop its own methodology different from that of science.

Also within mainstream science, reference to the scientific method is used in arguments regarding the internal hierarchy of disciplines and domains. A frequently seen argument is that research based on the H-D method is superior to research based on induction from observations because in deductive inferences the conclusion follows necessarily from the premises. (See, e.g., Parascandola 1998 for an analysis of how this argument has been made to downgrade epidemiology compared to the laboratory sciences.) Similarly, based on an examination of the practices of major funding institutions such as the National Institutes of Health (NIH), the National Science Foundation (NSF) and the Biomedical Sciences Research Practices (BBSRC) in the UK, O’Malley et al. (2009) have argued that funding agencies seem to have a tendency to adhere to the view that the primary activity of science is to test hypotheses, while descriptive and exploratory research is seen as merely preparatory activities that are valuable only insofar as they fuel hypothesis-driven research.

In some areas of science, scholarly publications are structured in a way that may convey the impression of a neat and linear process of inquiry from stating a question, devising the methods by which to answer it, collecting the data, to drawing a conclusion from the analysis of data. For example, the codified format of publications in most biomedical journals known as the IMRAD format (Introduction, Method, Results, Analysis, Discussion) is explicitly described by the journal editors as “not an arbitrary publication format but rather a direct reflection of the process of scientific discovery” (see the so-called “Vancouver Recommendations”, ICMJE 2013: 11). However, scientific publications do not in general reflect the process by which the reported scientific results were produced. For example, under the provocative title “Is the scientific paper a fraud?”, Medawar argued that scientific papers generally misrepresent how the results have been produced (Medawar 1963/1996). Similar views have been advanced by philosophers, historians and sociologists of science (Gilbert 1976; Holmes 1987; Knorr-Cetina 1981; Schickore 2008; Suppe 1998) who have argued that scientists’ experimental practices are messy and often do not follow any recognizable pattern. Publications of research results, they argue, are retrospective reconstructions of these activities that often do not preserve the temporal order or the logic of these activities, but are instead often constructed in order to screen off potential criticism (see Schickore 2008 for a review of this work).

Philosophical positions on the scientific method have also made it into the court room, especially in the US where judges have drawn on philosophy of science in deciding when to confer special status to scientific expert testimony. A key case is Daubert vs Merrell Dow Pharmaceuticals (92–102, 509 U.S. 579, 1993). In this case, the Supreme Court argued in its 1993 ruling that trial judges must ensure that expert testimony is reliable, and that in doing this the court must look at the expert’s methodology to determine whether the proffered evidence is actually scientific knowledge. Further, referring to works of Popper and Hempel the court stated that

ordinarily, a key question to be answered in determining whether a theory or technique is scientific knowledge … is whether it can be (and has been) tested. (Justice Blackmun, Daubert v. Merrell Dow Pharmaceuticals; see Other Internet Resources for a link to the opinion)

But as argued by Haack (2005a,b, 2010) and by Foster & Hubner (1999), by equating the question of whether a piece of testimony is reliable with the question whether it is scientific as indicated by a special methodology, the court was producing an inconsistent mixture of Popper’s and Hempel’s philosophies, and this has later led to considerable confusion in subsequent case rulings that drew on the Daubert case (see Haack 2010 for a detailed exposition).

The difficulties around identifying the methods of science are also reflected in the difficulties of identifying scientific misconduct in the form of improper application of the method or methods of science. One of the first and most influential attempts at defining misconduct in science was the US definition from 1989 that defined misconduct as

fabrication, falsification, plagiarism, or other practices that seriously deviate from those that are commonly accepted within the scientific community . (Code of Federal Regulations, part 50, subpart A., August 8, 1989, italics added)

However, the “other practices that seriously deviate” clause was heavily criticized because it could be used to suppress creative or novel science. For example, the National Academy of Science stated in their report Responsible Science (1992) that it

wishes to discourage the possibility that a misconduct complaint could be lodged against scientists based solely on their use of novel or unorthodox research methods. (NAS: 27)

This clause was therefore later removed from the definition. For an entry into the key philosophical literature on conduct in science, see Shamoo & Resnick (2009).

The question of the source of the success of science has been at the core of philosophy since the beginning of modern science. If viewed as a matter of epistemology more generally, scientific method is a part of the entire history of philosophy. Over that time, science and whatever methods its practitioners may employ have changed dramatically. Today, many philosophers have taken up the banners of pluralism or of practice to focus on what are, in effect, fine-grained and contextually limited examinations of scientific method. Others hope to shift perspectives in order to provide a renewed general account of what characterizes the activity we call science.

One such perspective has been offered recently by Hoyningen-Huene (2008, 2013), who argues from the history of philosophy of science that after three lengthy phases of characterizing science by its method, we are now in a phase where the belief in the existence of a positive scientific method has eroded and what has been left to characterize science is only its fallibility. First was a phase from Plato and Aristotle up until the 17 th century where the specificity of scientific knowledge was seen in its absolute certainty established by proof from evident axioms; next was a phase up to the mid-19 th century in which the means to establish the certainty of scientific knowledge had been generalized to include inductive procedures as well. In the third phase, which lasted until the last decades of the 20 th century, it was recognized that empirical knowledge was fallible, but it was still granted a special status due to its distinctive mode of production. But now in the fourth phase, according to Hoyningen-Huene, historical and philosophical studies have shown how “scientific methods with the characteristics as posited in the second and third phase do not exist” (2008: 168) and there is no longer any consensus among philosophers and historians of science about the nature of science. For Hoyningen-Huene, this is too negative a stance, and he therefore urges the question about the nature of science anew. His own answer to this question is that “scientific knowledge differs from other kinds of knowledge, especially everyday knowledge, primarily by being more systematic” (Hoyningen-Huene 2013: 14). Systematicity can have several different dimensions: among them are more systematic descriptions, explanations, predictions, defense of knowledge claims, epistemic connectedness, ideal of completeness, knowledge generation, representation of knowledge and critical discourse. Hence, what characterizes science is the greater care in excluding possible alternative explanations, the more detailed elaboration with respect to data on which predictions are based, the greater care in detecting and eliminating sources of error, the more articulate connections to other pieces of knowledge, etc. On this position, what characterizes science is not that the methods employed are unique to science, but that the methods are more carefully employed.

Another, similar approach has been offered by Haack (2003). She sets off, similar to Hoyningen-Huene, from a dissatisfaction with the recent clash between what she calls Old Deferentialism and New Cynicism. The Old Deferentialist position is that science progressed inductively by accumulating true theories confirmed by empirical evidence or deductively by testing conjectures against basic statements; while the New Cynics position is that science has no epistemic authority and no uniquely rational method and is merely just politics. Haack insists that contrary to the views of the New Cynics, there are objective epistemic standards, and there is something epistemologically special about science, even though the Old Deferentialists pictured this in a wrong way. Instead, she offers a new Critical Commonsensist account on which standards of good, strong, supportive evidence and well-conducted, honest, thorough and imaginative inquiry are not exclusive to the sciences, but the standards by which we judge all inquirers. In this sense, science does not differ in kind from other kinds of inquiry, but it may differ in the degree to which it requires broad and detailed background knowledge and a familiarity with a technical vocabulary that only specialists may possess.

  • Aikenhead, G.S., 1987, “High-school graduates’ beliefs about science-technology-society. III. Characteristics and limitations of scientific knowledge”, Science Education , 71(4): 459–487.
  • Allchin, D., H.M. Andersen and K. Nielsen, 2014, “Complementary Approaches to Teaching Nature of Science: Integrating Student Inquiry, Historical Cases, and Contemporary Cases in Classroom Practice”, Science Education , 98: 461–486.
  • Anderson, C., 2008, “The end of theory: The data deluge makes the scientific method obsolete”, Wired magazine , 16(7): 16–07
  • Arabatzis, T., 2006, “On the inextricability of the context of discovery and the context of justification”, in Revisiting Discovery and Justification , J. Schickore and F. Steinle (eds.), Dordrecht: Springer, pp. 215–230.
  • Barnes, J. (ed.), 1984, The Complete Works of Aristotle, Vols I and II , Princeton: Princeton University Press.
  • Barnes, B. and D. Bloor, 1982, “Relativism, Rationalism, and the Sociology of Knowledge”, in Rationality and Relativism , M. Hollis and S. Lukes (eds.), Cambridge: MIT Press, pp. 1–20.
  • Bauer, H.H., 1992, Scientific Literacy and the Myth of the Scientific Method , Urbana: University of Illinois Press.
  • Bechtel, W. and R.C. Richardson, 1993, Discovering complexity , Princeton, NJ: Princeton University Press.
  • Berkeley, G., 1734, The Analyst in De Motu and The Analyst: A Modern Edition with Introductions and Commentary , D. Jesseph (trans. and ed.), Dordrecht: Kluwer Academic Publishers, 1992.
  • Blachowicz, J., 2009, “How science textbooks treat scientific method: A philosopher’s perspective”, The British Journal for the Philosophy of Science , 60(2): 303–344.
  • Bloor, D., 1991, Knowledge and Social Imagery , Chicago: University of Chicago Press, 2 nd edition.
  • Boyle, R., 1682, New experiments physico-mechanical, touching the air , Printed by Miles Flesher for Richard Davis, bookseller in Oxford.
  • Bridgman, P.W., 1927, The Logic of Modern Physics , New York: Macmillan.
  • –––, 1956, “The Methodological Character of Theoretical Concepts”, in The Foundations of Science and the Concepts of Science and Psychology , Herbert Feigl and Michael Scriven (eds.), Minnesota: University of Minneapolis Press, pp. 38–76.
  • Burian, R., 1997, “Exploratory Experimentation and the Role of Histochemical Techniques in the Work of Jean Brachet, 1938–1952”, History and Philosophy of the Life Sciences , 19(1): 27–45.
  • –––, 2007, “On microRNA and the need for exploratory experimentation in post-genomic molecular biology”, History and Philosophy of the Life Sciences , 29(3): 285–311.
  • Carnap, R., 1928, Der logische Aufbau der Welt , Berlin: Bernary, transl. by R.A. George, The Logical Structure of the World , Berkeley: University of California Press, 1967.
  • –––, 1956, “The methodological character of theoretical concepts”, Minnesota studies in the philosophy of science , 1: 38–76.
  • Carrol, S., and D. Goodstein, 2009, “Defining the scientific method”, Nature Methods , 6: 237.
  • Churchman, C.W., 1948, “Science, Pragmatics, Induction”, Philosophy of Science , 15(3): 249–268.
  • Cooper, J. (ed.), 1997, Plato: Complete Works , Indianapolis: Hackett.
  • Darden, L., 1991, Theory Change in Science: Strategies from Mendelian Genetics , Oxford: Oxford University Press
  • Dewey, J., 1910, How we think , New York: Dover Publications (reprinted 1997).
  • Douglas, H., 2009, Science, Policy, and the Value-Free Ideal , Pittsburgh: University of Pittsburgh Press.
  • Dupré, J., 2004, “Miracle of Monism ”, in Naturalism in Question , Mario De Caro and David Macarthur (eds.), Cambridge, MA: Harvard University Press, pp. 36–58.
  • Elliott, K.C., 2007, “Varieties of exploratory experimentation in nanotoxicology”, History and Philosophy of the Life Sciences , 29(3): 311–334.
  • Elliott, K. C., and T. Richards (eds.), 2017, Exploring inductive risk: Case studies of values in science , Oxford: Oxford University Press.
  • Falcon, Andrea, 2005, Aristotle and the science of nature: Unity without uniformity , Cambridge: Cambridge University Press.
  • Feyerabend, P., 1978, Science in a Free Society , London: New Left Books
  • –––, 1988, Against Method , London: Verso, 2 nd edition.
  • Fisher, R.A., 1955, “Statistical Methods and Scientific Induction”, Journal of The Royal Statistical Society. Series B (Methodological) , 17(1): 69–78.
  • Foster, K. and P.W. Huber, 1999, Judging Science. Scientific Knowledge and the Federal Courts , Cambridge: MIT Press.
  • Fox Keller, E., 2003, “Models, Simulation, and ‘computer experiments’”, in The Philosophy of Scientific Experimentation , H. Radder (ed.), Pittsburgh: Pittsburgh University Press, 198–215.
  • Gilbert, G., 1976, “The transformation of research findings into scientific knowledge”, Social Studies of Science , 6: 281–306.
  • Gimbel, S., 2011, Exploring the Scientific Method , Chicago: University of Chicago Press.
  • Goodman, N., 1965, Fact , Fiction, and Forecast , Indianapolis: Bobbs-Merrill.
  • Haack, S., 1995, “Science is neither sacred nor a confidence trick”, Foundations of Science , 1(3): 323–335.
  • –––, 2003, Defending science—within reason , Amherst: Prometheus.
  • –––, 2005a, “Disentangling Daubert: an epistemological study in theory and practice”, Journal of Philosophy, Science and Law , 5, Haack 2005a available online . doi:10.5840/jpsl2005513
  • –––, 2005b, “Trial and error: The Supreme Court’s philosophy of science”, American Journal of Public Health , 95: S66-S73.
  • –––, 2010, “Federal Philosophy of Science: A Deconstruction-and a Reconstruction”, NYUJL & Liberty , 5: 394.
  • Hangel, N. and J. Schickore, 2017, “Scientists’ conceptions of good research practice”, Perspectives on Science , 25(6): 766–791
  • Harper, W.L., 2011, Isaac Newton’s Scientific Method: Turning Data into Evidence about Gravity and Cosmology , Oxford: Oxford University Press.
  • Hempel, C., 1950, “Problems and Changes in the Empiricist Criterion of Meaning”, Revue Internationale de Philosophie , 41(11): 41–63.
  • –––, 1951, “The Concept of Cognitive Significance: A Reconsideration”, Proceedings of the American Academy of Arts and Sciences , 80(1): 61–77.
  • –––, 1965, Aspects of scientific explanation and other essays in the philosophy of science , New York–London: Free Press.
  • –––, 1966, Philosophy of Natural Science , Englewood Cliffs: Prentice-Hall.
  • Holmes, F.L., 1987, “Scientific writing and scientific discovery”, Isis , 78(2): 220–235.
  • Howard, D., 2003, “Two left turns make a right: On the curious political career of North American philosophy of science at midcentury”, in Logical Empiricism in North America , G.L. Hardcastle & A.W. Richardson (eds.), Minneapolis: University of Minnesota Press, pp. 25–93.
  • Hoyningen-Huene, P., 2008, “Systematicity: The nature of science”, Philosophia , 36(2): 167–180.
  • –––, 2013, Systematicity. The Nature of Science , Oxford: Oxford University Press.
  • Howie, D., 2002, Interpreting probability: Controversies and developments in the early twentieth century , Cambridge: Cambridge University Press.
  • Hughes, R., 1999, “The Ising Model, Computer Simulation, and Universal Physics”, in Models as Mediators , M. Morgan and M. Morrison (eds.), Cambridge: Cambridge University Press, pp. 97–145
  • Hume, D., 1739, A Treatise of Human Nature , D. Fate Norton and M.J. Norton (eds.), Oxford: Oxford University Press, 2000.
  • Humphreys, P., 1995, “Computational science and scientific method”, Minds and Machines , 5(1): 499–512.
  • ICMJE, 2013, “Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals”, International Committee of Medical Journal Editors, available online , accessed August 13 2014
  • Jeffrey, R.C., 1956, “Valuation and Acceptance of Scientific Hypotheses”, Philosophy of Science , 23(3): 237–246.
  • Kaufmann, W.J., and L.L. Smarr, 1993, Supercomputing and the Transformation of Science , New York: Scientific American Library.
  • Knorr-Cetina, K., 1981, The Manufacture of Knowledge , Oxford: Pergamon Press.
  • Krohs, U., 2012, “Convenience experimentation”, Studies in History and Philosophy of Biological and BiomedicalSciences , 43: 52–57.
  • Kuhn, T.S., 1962, The Structure of Scientific Revolutions , Chicago: University of Chicago Press
  • Latour, B. and S. Woolgar, 1986, Laboratory Life: The Construction of Scientific Facts , Princeton: Princeton University Press, 2 nd edition.
  • Laudan, L., 1968, “Theories of scientific method from Plato to Mach”, History of Science , 7(1): 1–63.
  • Lenhard, J., 2006, “Models and statistical inference: The controversy between Fisher and Neyman-Pearson”, The British Journal for the Philosophy of Science , 57(1): 69–91.
  • Leonelli, S., 2012, “Making Sense of Data-Driven Research in the Biological and the Biomedical Sciences”, Studies in the History and Philosophy of the Biological and Biomedical Sciences , 43(1): 1–3.
  • Levi, I., 1960, “Must the scientist make value judgments?”, Philosophy of Science , 57(11): 345–357
  • Lindley, D., 1991, Theory Change in Science: Strategies from Mendelian Genetics , Oxford: Oxford University Press.
  • Lipton, P., 2004, Inference to the Best Explanation , London: Routledge, 2 nd edition.
  • Marks, H.M., 2000, The progress of experiment: science and therapeutic reform in the United States, 1900–1990 , Cambridge: Cambridge University Press.
  • Mazzochi, F., 2015, “Could Big Data be the end of theory in science?”, EMBO reports , 16: 1250–1255.
  • Mayo, D.G., 1996, Error and the Growth of Experimental Knowledge , Chicago: University of Chicago Press.
  • McComas, W.F., 1996, “Ten myths of science: Reexamining what we think we know about the nature of science”, School Science and Mathematics , 96(1): 10–16.
  • Medawar, P.B., 1963/1996, “Is the scientific paper a fraud”, in The Strange Case of the Spotted Mouse and Other Classic Essays on Science , Oxford: Oxford University Press, 33–39.
  • Mill, J.S., 1963, Collected Works of John Stuart Mill , J. M. Robson (ed.), Toronto: University of Toronto Press
  • NAS, 1992, Responsible Science: Ensuring the integrity of the research process , Washington DC: National Academy Press.
  • Nersessian, N.J., 1987, “A cognitive-historical approach to meaning in scientific theories”, in The process of science , N. Nersessian (ed.), Berlin: Springer, pp. 161–177.
  • –––, 2008, Creating Scientific Concepts , Cambridge: MIT Press.
  • Newton, I., 1726, Philosophiae naturalis Principia Mathematica (3 rd edition), in The Principia: Mathematical Principles of Natural Philosophy: A New Translation , I.B. Cohen and A. Whitman (trans.), Berkeley: University of California Press, 1999.
  • –––, 1704, Opticks or A Treatise of the Reflections, Refractions, Inflections & Colors of Light , New York: Dover Publications, 1952.
  • Neyman, J., 1956, “Note on an Article by Sir Ronald Fisher”, Journal of the Royal Statistical Society. Series B (Methodological) , 18: 288–294.
  • Nickles, T., 1987, “Methodology, heuristics, and rationality”, in Rational changes in science: Essays on Scientific Reasoning , J.C. Pitt (ed.), Berlin: Springer, pp. 103–132.
  • Nicod, J., 1924, Le problème logique de l’induction , Paris: Alcan. (Engl. transl. “The Logical Problem of Induction”, in Foundations of Geometry and Induction , London: Routledge, 2000.)
  • Nola, R. and H. Sankey, 2000a, “A selective survey of theories of scientific method”, in Nola and Sankey 2000b: 1–65.
  • –––, 2000b, After Popper, Kuhn and Feyerabend. Recent Issues in Theories of Scientific Method , London: Springer.
  • –––, 2007, Theories of Scientific Method , Stocksfield: Acumen.
  • Norton, S., and F. Suppe, 2001, “Why atmospheric modeling is good science”, in Changing the Atmosphere: Expert Knowledge and Environmental Governance , C. Miller and P. Edwards (eds.), Cambridge, MA: MIT Press, 88–133.
  • O’Malley, M., 2007, “Exploratory experimentation and scientific practice: Metagenomics and the proteorhodopsin case”, History and Philosophy of the Life Sciences , 29(3): 337–360.
  • O’Malley, M., C. Haufe, K. Elliot, and R. Burian, 2009, “Philosophies of Funding”, Cell , 138: 611–615.
  • Oreskes, N., K. Shrader-Frechette, and K. Belitz, 1994, “Verification, Validation and Confirmation of Numerical Models in the Earth Sciences”, Science , 263(5147): 641–646.
  • Osborne, J., S. Simon, and S. Collins, 2003, “Attitudes towards science: a review of the literature and its implications”, International Journal of Science Education , 25(9): 1049–1079.
  • Parascandola, M., 1998, “Epidemiology—2 nd -Rate Science”, Public Health Reports , 113(4): 312–320.
  • Parker, W., 2008a, “Franklin, Holmes and the Epistemology of Computer Simulation”, International Studies in the Philosophy of Science , 22(2): 165–83.
  • –––, 2008b, “Computer Simulation through an Error-Statistical Lens”, Synthese , 163(3): 371–84.
  • Pearson, K. 1892, The Grammar of Science , London: J.M. Dents and Sons, 1951
  • Pearson, E.S., 1955, “Statistical Concepts in Their Relation to Reality”, Journal of the Royal Statistical Society , B, 17: 204–207.
  • Pickering, A., 1984, Constructing Quarks: A Sociological History of Particle Physics , Edinburgh: Edinburgh University Press.
  • Popper, K.R., 1959, The Logic of Scientific Discovery , London: Routledge, 2002
  • –––, 1963, Conjectures and Refutations , London: Routledge, 2002.
  • –––, 1985, Unended Quest: An Intellectual Autobiography , La Salle: Open Court Publishing Co..
  • Rudner, R., 1953, “The Scientist Qua Scientist Making Value Judgments”, Philosophy of Science , 20(1): 1–6.
  • Rudolph, J.L., 2005, “Epistemology for the masses: The origin of ‘The Scientific Method’ in American Schools”, History of Education Quarterly , 45(3): 341–376
  • Schickore, J., 2008, “Doing science, writing science”, Philosophy of Science , 75: 323–343.
  • Schickore, J. and N. Hangel, 2019, “‘It might be this, it should be that…’ uncertainty and doubt in day-to-day science practice”, European Journal for Philosophy of Science , 9(2): 31. doi:10.1007/s13194-019-0253-9
  • Shamoo, A.E. and D.B. Resnik, 2009, Responsible Conduct of Research , Oxford: Oxford University Press.
  • Shank, J.B., 2008, The Newton Wars and the Beginning of the French Enlightenment , Chicago: The University of Chicago Press.
  • Shapin, S. and S. Schaffer, 1985, Leviathan and the air-pump , Princeton: Princeton University Press.
  • Smith, G.E., 2002, “The Methodology of the Principia”, in The Cambridge Companion to Newton , I.B. Cohen and G.E. Smith (eds.), Cambridge: Cambridge University Press, 138–173.
  • Snyder, L.J., 1997a, “Discoverers’ Induction”, Philosophy of Science , 64: 580–604.
  • –––, 1997b, “The Mill-Whewell Debate: Much Ado About Induction”, Perspectives on Science , 5: 159–198.
  • –––, 1999, “Renovating the Novum Organum: Bacon, Whewell and Induction”, Studies in History and Philosophy of Science , 30: 531–557.
  • Sober, E., 2008, Evidence and Evolution. The logic behind the science , Cambridge: Cambridge University Press
  • Sprenger, J. and S. Hartmann, 2019, Bayesian philosophy of science , Oxford: Oxford University Press.
  • Steinle, F., 1997, “Entering New Fields: Exploratory Uses of Experimentation”, Philosophy of Science (Proceedings), 64: S65–S74.
  • –––, 2002, “Experiments in History and Philosophy of Science”, Perspectives on Science , 10(4): 408–432.
  • Strasser, B.J., 2012, “Data-driven sciences: From wonder cabinets to electronic databases”, Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences , 43(1): 85–87.
  • Succi, S. and P.V. Coveney, 2018, “Big data: the end of the scientific method?”, Philosophical Transactions of the Royal Society A , 377: 20180145. doi:10.1098/rsta.2018.0145
  • Suppe, F., 1998, “The Structure of a Scientific Paper”, Philosophy of Science , 65(3): 381–405.
  • Swijtink, Z.G., 1987, “The objectification of observation: Measurement and statistical methods in the nineteenth century”, in The probabilistic revolution. Ideas in History, Vol. 1 , L. Kruger (ed.), Cambridge MA: MIT Press, pp. 261–285.
  • Waters, C.K., 2007, “The nature and context of exploratory experimentation: An introduction to three case studies of exploratory research”, History and Philosophy of the Life Sciences , 29(3): 275–284.
  • Weinberg, S., 1995, “The methods of science… and those by which we live”, Academic Questions , 8(2): 7–13.
  • Weissert, T., 1997, The Genesis of Simulation in Dynamics: Pursuing the Fermi-Pasta-Ulam Problem , New York: Springer Verlag.
  • William H., 1628, Exercitatio Anatomica de Motu Cordis et Sanguinis in Animalibus , in On the Motion of the Heart and Blood in Animals , R. Willis (trans.), Buffalo: Prometheus Books, 1993.
  • Winsberg, E., 2010, Science in the Age of Computer Simulation , Chicago: University of Chicago Press.
  • Wivagg, D. & D. Allchin, 2002, “The Dogma of the Scientific Method”, The American Biology Teacher , 64(9): 645–646
How to cite this entry . Preview the PDF version of this entry at the Friends of the SEP Society . Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers , with links to its database.
  • Blackmun opinion , in Daubert v. Merrell Dow Pharmaceuticals (92–102), 509 U.S. 579 (1993).
  • Scientific Method at philpapers. Darrell Rowbottom (ed.).
  • Recent Articles | Scientific Method | The Scientist Magazine

al-Kindi | Albert the Great [= Albertus magnus] | Aquinas, Thomas | Arabic and Islamic Philosophy, disciplines in: natural philosophy and natural science | Arabic and Islamic Philosophy, historical and methodological topics in: Greek sources | Arabic and Islamic Philosophy, historical and methodological topics in: influence of Arabic and Islamic Philosophy on the Latin West | Aristotle | Bacon, Francis | Bacon, Roger | Berkeley, George | biology: experiment in | Boyle, Robert | Cambridge Platonists | confirmation | Descartes, René | Enlightenment | epistemology | epistemology: Bayesian | epistemology: social | Feyerabend, Paul | Galileo Galilei | Grosseteste, Robert | Hempel, Carl | Hume, David | Hume, David: Newtonianism and Anti-Newtonianism | induction: problem of | Kant, Immanuel | Kuhn, Thomas | Leibniz, Gottfried Wilhelm | Locke, John | Mill, John Stuart | More, Henry | Neurath, Otto | Newton, Isaac | Newton, Isaac: philosophy | Ockham [Occam], William | operationalism | Peirce, Charles Sanders | Plato | Popper, Karl | rationality: historicist theories of | Reichenbach, Hans | reproducibility, scientific | Schlick, Moritz | science: and pseudo-science | science: theory and observation in | science: unity of | scientific discovery | scientific knowledge: social dimensions of | simulations in science | skepticism: medieval | space and time: absolute and relational space and motion, post-Newtonian theories | Vienna Circle | Whewell, William | Zabarella, Giacomo

Copyright © 2021 by Brian Hepburn < brian . hepburn @ wichita . edu > Hanne Andersen < hanne . andersen @ ind . ku . dk >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2023 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

importance of a hypothesis in a scientific investigation


How the Scientific Method Works

  • Share Content on Facebook
  • Share Content on LinkedIn
  • Share Content on Flipboard
  • Share Content on Reddit
  • Share Content via Email

Importance of the Scientific Method

gregor mendel

The scientific method attempts to minimize the influence of bias or prejudice in the experimenter. Even the best-intentioned scientists can't escape bias. It results from personal beliefs, as well as cultural beliefs, which means any human filters information based on his or her own experience. Unfortunately, this filtering process can cause a scientist to prefer one outcome over another. For someone trying to solve a problem around the house, succumbing to these kinds of biases is not such a big deal. But in the scientific community, where results have to be reviewed and duplicated, bias must be avoided at all costs.

­T­hat's the job of the scientific method. It provides an objective, standardized approach to conducting experiments and, in doing so, improves their results. By using a standardized approach in their investigations, scientists can feel confident that they will stick to the facts and limit the influence of personal, preconceived notions. Even with such a rigorous methodology in place, some scientists still make mistakes. For example, they can mistake a hypothesis for an explanation of a phenomenon without performing experiments. Or they can fail to accurately account for errors, such as measurement errors. Or they can ignore data that does not support the hypothesis.

Gregor Mendel (1822-1884), an Austrian priest who studied the inheritance of traits in pea plants and helped pioneer the study of genetics, may have fallen victim to a kind of error known as confirmation bias . Confirmation bias is the tendency to see data that supports a hypothesis while ignoring data that does not. Some argue that Mendel obtained a certain result using a small sample size, then continued collecting and censoring data to make sure his original result was confirmed. Although subsequent experiments have proven Mendel's hypothesis, many people still question his methods of experimentation.

Most of the time, however, the scientific method works and works well. When a hypothesis or a group of related hypotheses have been confirmed through repeated experimental tests, it may become a theory , which can be thought of as the pot of gold at the end of the scientific method rainbow.

Theories are much broader in scope than hypotheses and hold enormous predictive power. The theory of relativity, for example, predicted the existence of black holes long before there was evidence to support the idea. It should be noted, however, that one of the goals of science is not to prove theories right, but to prove them wrong. When this happens, a theory must be modified or discarded altogether.

Please copy/paste the following text to properly cite this HowStuffWorks.com article:

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Registered Report
  • Open access
  • Published: 27 May 2024

Comparing researchers’ degree of dichotomous thinking using frequentist versus Bayesian null hypothesis testing

  • Jasmine Muradchanian   ORCID: orcid.org/0000-0002-2914-9197 1 ,
  • Rink Hoekstra 1 ,
  • Henk Kiers 1 ,
  • Dustin Fife 2 &
  • Don van Ravenzwaaij 1  

Scientific Reports volume  14 , Article number:  12120 ( 2024 ) Cite this article

Metrics details

  • Human behaviour
  • Neuroscience

A large amount of scientific literature in social and behavioural sciences bases their conclusions on one or more hypothesis tests. As such, it is important to obtain more knowledge about how researchers in social and behavioural sciences interpret quantities that result from hypothesis test metrics, such as p -values and Bayes factors. In the present study, we explored the relationship between obtained statistical evidence and the degree of belief or confidence that there is a positive effect in the population of interest. In particular, we were interested in the existence of a so-called cliff effect: A qualitative drop in the degree of belief that there is a positive effect around certain threshold values of statistical evidence (e.g., at p  = 0.05). We compared this relationship for p -values to the relationship for corresponding degrees of evidence quantified through Bayes factors, and we examined whether this relationship was affected by two different modes of presentation (in one mode the functional form of the relationship across values was implicit to the participant, whereas in the other mode it was explicit). We found evidence for a higher proportion of cliff effects in p -value conditions than in BF conditions (N = 139), but we did not get a clear indication whether presentation mode had an effect on the proportion of cliff effects.

Protocol registration

The stage 1 protocol for this Registered Report was accepted in principle on 2 June 2023. The protocol, as accepted by the journal, can be found at: https://doi.org/10.17605/OSF.IO/5CW6P .


In applied science, researchers typically conduct statistical tests to learn whether an effect of interest differs from zero. Such tests typically tend to quantify evidence by means of p -values (but see e.g., Lakens 1 who warns against such an interpretation of p -values). A Bayesian alternative to the p -value is the Bayes factor (BF), which is a tool used for quantifying statistical evidence in hypothesis testing 2 , 3 . P -values and BFs are related to one another 4 , with BFs being used much less frequently. Having two contrasting hypotheses (i.e., a null hypothesis, H 0 , and an alternative hypothesis, H 1 ), a p -value is the probability of getting a result as extreme or more extreme than the actual observed sample result, given that H 0 were true (and given that the assumptions hold). A BF on the other hand, quantifies the probability of the data given H 1 relative to the probability of the data given H 0 (called BF 10 3 ).

There is ample evidence that researchers often find it difficult to interpret quantities such as p -values 5 , 6 , 7 . Although there has been growing awareness of the dangers of misinterpreting p -values, these dangers seem to remain prevalent. One of the key reasons for these misinterpretations is that these concepts are not simple or intuitive, and the correct interpretation of them would require more cognitive effort. Because of this high cognitive demand academics have been using shortcut interpretations, which are simply wrong 6 . An example of such a misinterpretation is that the p -value would represent the probability of the null hypothesis being true 6 . Research is typically conducted in order to reduce uncertainty around the existence of an effect in the population of interest. To do this, we use measures such as p -values and Bayes factors as a tool. Hence, it might be interesting (especially given the mistakes that are made by researchers when interpreting quantities such as p -values) to study how these measures affect people’s beliefs regarding the existence of an effect in the population of interest, so one can study how outcomes like p -values and Bayes factors translate to subjective beliefs about the existence of an effect in practice.

One of the first studies that focused on how researchers interpret statistical quantities was conducted by Rosenthal and Gaito 8 , in which they specifically studied how researchers interpret p -values of varying magnitude. Nineteen researchers and graduate students at their psychology faculty were requested to indicate their degree of belief or confidence in 14 p -values, varying from 0.001 to 0.90, on a 6-point scale ranging from “5 extreme confidence or belief” to “0 complete absence of confidence or belief” 8 , pp. 33–34 . These individuals were shown p -values for sample sizes of 10 and 100. The authors wanted to measure the degree of belief or confidence in research findings as a function of associated p -values, but stated as such it is not really clear what is meant here. We assume that the authors actually wanted to assess degree of belief or confidence in the existence of an effect, given the p -value. Their findings suggested that subjects’ degree of belief or confidence appeared to be a decreasing exponential function of the p- value. Additionally, for any p -value, self-rated confidence was greater for the larger sample size (i.e., n  = 100). Furthermore, the authors argued in favor of the existence of a cliff effect around p  = 0.05, which refers to an abrupt drop in the degree of belief or confidence in a p -value just beyond the 0.05 level 8 , 9 . This finding has been confirmed in several subsequent studies 10 , 11 , 12 . The studies described so far have been focusing on the average, and have not taken individual differences into account.

The cliff effect suggests p -values invite dichotomous thinking, which according to some authors seems to be a common type of reasoning when interpreting p -values in the context of Null Hypothesis Significance Testing (NHST 13 ). The outcome of the significance test seems to be usually interpreted dichotomously such as suggested by studies focusing on the cliff effect 8 , 9 , 10 , 11 , 12 , 13 , where one makes a binary choice between rejecting or not rejecting a null hypothesis 14 . This practice has taken some academics away from the main task of finding out the size of the effect of interest and the level of precision with which it has been measured 5 . However, Poitevineau and Lecoutre 15 argued that the cliff effect around p  = 0.05 is probably overstated. According to them, previous studies paid insufficient attention to individual differences. To demonstrate this, they explored the individual data and found qualitative heterogeneity in the respondents’ answers. The authors identified three categories of functions based on 12 p -values: (1) a decreasing exponential curve, (2) a decreasing linear curve, and (3) an all-or-none curve representing a very high degree of confidence when p  ≤ 0.05 and quasi-zero confidence otherwise. Out of 18 participants, they found that the responses of 10 participants followed a decreasing exponential curve, 4 participants followed a decreasing linear curve, and 4 participants followed an all-or-none curve. The authors concluded that the cliff effect may be an artifact of averaging, resulting from the fact that a few participants have an all-or-none interpretation of statistical significance 15 .

Although NHST has been used frequently, it has been argued that it should be replaced by effect sizes, confidence intervals (CIs), and meta-analyses. Doing so may allegedly invite a shift from dichotomous thinking to estimation and meta-analytic thinking 14 . Lai et al. 13 studied whether using CIs rather than p -values would reduce the cliff effect, and thereby dichotomous thinking. Similar to the classification by Poitevineau and Lecoutre 15 , the responses were divided into three classes: decreasing exponential, decreasing linear, or all-or-none. In addition, Lai et al. 13 found patterns in the responses of some of the participants that corresponded with what they called a “moderate cliff model”, which refers to using statistical significance as both a decision-making criterion and a measure of evidence 13 .

In contrast to Poitevineau and Lecoutre 15 , Lai et al. 13 concluded that the cliff effect is probably not just a byproduct resulting from the all-or-none class, because the cliff models were accountable for around 21% of the responses in NHST interpretation and for around 33% of the responses in CI interpretation. Furthermore, a notable finding was that the cliff effect prevalence in CI interpretations was more than 50% higher than that of NHST 13 . Something similar was found in a study by Hoekstra, Johnson, and Kiers 16 . They also predicted that the cliff effect would be stronger for results presented in the NHST format compared to the CI format, and like Lai et al. 13 , they actually found more evidence of a cliff effect in the CI format compared to the NHST format 16 .

The studies discussed so far seem to provide evidence for the existence of a cliff effect around p  = 0.05. Table 1 shows an overview of evidence related to the cliff effect. Interestingly, in a recent study, Helske et al. 17 examined how various visualizations can aim in reducing the cliff effect when interpreting inferential statistics among researchers. They found that compared to textual representation of the CI with p -values and classic CI visualization, including more complex visual information to classic CI representation seemed to decrease the cliff effect (i.e., dichotomous interpretations 17 ).

Although Bayesian methods have become more popular within different scientific fields 18 , 19 , we know of no studies that have examined whether self-reported degree of belief of the existence of an effect when interpreting BFs by researchers results in a similar cliff effect to those obtained for p -values and CIs. Another matter that seems to be conspicuously absent in previous examinations of the cliff effect is a comparison between the presentation methods that are used to investigate the cliff effect. In some cliff effect studies the p -values were presented to the participants on separate pages 15 and in other cliff effect studies the p -values were presented on the same page 13 . It is possible that the cliff effect manifests itself in (some) researchers without explicit awareness. It is possible that for those researchers presenting p -values/Bayes factors in isolation would lead to a cliff effect, whereas presenting all p -values/Bayes factors at once would lead to a cognitive override. Perhaps when participants see their cliff effect, they might think that they should not think dichotomously, and might change their results to be more in line with how they believe they should think, thereby removing their cliff effect. To our knowledge, no direct comparison of p -values/Bayes factors in isolation and all p -values/Bayes factors at once has yet been conducted. Therefore, to see whether the method matters, both types of presentation modes will be included in the present study.

All of this gives rise to the following three research questions: (1) What is the relation between obtained statistical evidence and the degree of belief or confidence that there is a positive effect in the population of interest across participants? (2) What is the difference in this relationship when the statistical evidence is quantified through p -values versus Bayes factors? (3) What is the difference in this relationship when the statistical evidence is presented in isolation versus all at once?

In the present study, we will investigate the relationship between method (i.e., p -values and Bayes factors) and the degree of belief or confidence that there is a positive effect in the population of interest, with special attention for the cliff effect. We choose this specific wording (“positive effect in the population of interest”) as we believe that this way of phrasing is more specific than those used in previous cliff effect studies. We will examine the relationship between different levels of strength of evidence using p -values or corresponding Bayes factors and measure participants' degree of belief or confidence in the following two scenarios: (1) the scenario in which values will be presented in isolation (such that the functional form of the relationship across values is implicit to the participant) and (2) the scenario in which all values will be presented simultaneously (such that the functional form of the relationship across values is explicit to the participant).

In what follows, we will first describe the set-up of the present study. In the results section, we will explore the relationship between obtained statistical evidence and the degree of belief or confidence, and in turn, we will compare this relationship for p -values to the corresponding relationship for BFs. All of this will be done in scenarios in which researchers are either made aware or not made aware of the functional form of the relationship. In the discussion, we will discuss implications for applied researchers using p -values and/or BFs in order to quantify statistical evidence.

Ethics information

Our study protocol has been approved by the ethics committee of the University of Groningen and our study complies with all relevant ethical regulations of the University of Groningen. Informed consent will be obtained from all participants. As an incentive for participating, we will raffle 10 Amazon vouchers with a worth of 25USD among participants that successfully completed our study.

Sampling plan

Our target population will consist of researchers in the social and behavioural sciences who are at least somewhat familiar with interpreting Bayes factors. We will obtain our prospective sample by collecting the e-mail addresses of (approximately) 2000 corresponding authors from 20 different journals in social and behavioural sciences with the highest impact factor. Specifically, we will collect the e-mail addresses of 100 researchers who published an article in the corresponding journal in 2021. We will start with the first issue and continue until we have 100 e-mail addresses per journal. We will contact the authors by e-mail. In the e-mail we will mention that we are looking for researchers who are familiar with interpreting Bayes factors. If they are familiar with interpreting Bayes factors, then we will ask them to participate in our study. If they are not familiar with interpreting Bayes factors, then we will ask them to ignore our e-mail.

If the currently unknown response rate is too low to answer our research questions, we will collect additional e-mail addresses of corresponding authors from articles published in 2022 in the same 20 journals. Based on a projected response rate of 10%, we expect a final completion rate of 200 participants. This should be enough to obtain a BF higher than 10 in favor of an effect if the proportions differ by 0.2 (see section “ Planned analyses ” for details).

Materials and procedure

The relationship between the different magnitudes of p -values/BFs and the degree of belief or confidence will be examined in a scenario in which values will be presented in isolation and in a scenario in which the values will be presented simultaneously. This all will result in four different conditions: (1) p -value questions in the isolation scenario (isolated p -value), (2) BF questions in the isolation scenario (isolated BF), (3) p -value questions in the simultaneous scenario (all at once p -value), and (4) BF questions in the simultaneous scenario (all at once BF). To reduce boredom, and to try to avoid making underlying goals of the study too apparent, each participant will receive randomly one out of four scenarios (i.e., all at once p -value, all at once BF, isolated p -value, or isolated BF), so the study has a between-person design.

The participants will receive an e-mail with an anonymous Qualtrics survey link. The first page of the survey will consist of the informed consent. We will ask all participants to indicate their level of familiarity with both Bayes factors and p -values on a 3-point scale with “completely unfamiliar/somewhat familiar/very familiar” and we will include everyone who is at least somewhat familiar on both. To have a better picture of our sample population, we will include the following demographic variables in the survey: gender, main continent, career stage, and broad research area. Then we will randomly assign respondents to one of four conditions (see below for a detailed description). After completing the content-part of the survey, all respondents will receive a question about providing their e-mail address if they are interested in (1) being included in the random draw of the Amazon vouchers; or (2) receiving information on our study outcomes.

In the isolated p -value condition, the following fabricated experimental scenario will be presented:

“Suppose you conduct an experiment comparing two independent groups, with n = 250 in each group. The null hypothesis states that the population means of the two groups do not differ. The alternative hypothesis states that the population mean in group 1 is larger than the population mean in group 2. Suppose a two-sample t test was conducted and a one-sided p value calculated.”

Then a set of possible findings of the fabricated experiment will be presented at different pages. We varied the strength of evidence for the existence of a positive effect with the following ten p -values in a random order: 0.001, 0.002, 0.004, 0.008, 0.016, 0.032, 0.065, 0.131, 0.267, and 0.543. A screenshot of a part of the isolated p -value questions is presented in S1 in the Supplementary Information.

In the all at once BF condition, a fabricated experimental scenario will be presented identical to that in the isolated p -value condition, except the last part is replaced by:

“Suppose a Bayesian two-sample t test was conducted and a one-sided Bayes factor (BF) calculated, with the alternative hypothesis in the numerator and the null hypothesis in the denominator, denoted BF 10 .”

A set of possible findings of the fabricated experiment will be presented at the same page. These findings vary in terms of the strength of evidence for the existence of a positive effect, quantified with the following ten BF 10 values in the following order: 22.650, 12.008, 6.410, 3.449, 1.873, 1.027, 0.569, 0.317, 0.175, and 0.091. These BF values correspond one-on-one to the p -values presented in the isolated p -value condition (the R code for the findings of the fabricated experiment can be found on https://osf.io/sq3fp ). A screenshot of a part of the all at once BF questions can be found in S2 in the Supplementary Information.

In both conditions, the respondents will be asked to rate their degree of belief or confidence that there is a positive effect in the population of interest based on these findings on a scale ranging from 0 (completely convinced that there is no effect), through 50 (somewhat convinced that there is a positive effect), to 100 (completely convinced that there is a positive effect).

The other two conditions (i.e., isolated BF condition and the all at once p -value condition) will be the same as the previously described conditions. The only difference between these two conditions and the previously described conditions is that in the isolated BF condition, the findings of the fabricated experiment for the BF questions will be presented at different pages in a random order, and in the all at once p -value condition, the findings for the p -value questions will be presented at the same page in a non-random order.

To keep things as simple as possible for the participants, all fictitious scenarios will include a two-sample t test with either a one-tailed p -value or a BF. The total sample size will be large ( n  = 250 in each group) in order to have sufficiently large power to detect even small effects.

Planned analyses

Poitevineau and Lecoutre 15 have suggested the following three models for the relationships between the different levels of statistical evidence and researchers’ subjective belief that a non-zero effect exists: all-or-none ( y  =  a for p  < 0.05, y  =  b for p  ≥ 0.05), linear ( y  =  a  +  bp ), and exponential ( y  = exp( a  +  bp )). In addition, Lai et al. 13 have suggested the moderate cliff model (a more gradual version of all-or-none), which they did not define more specifically. In the study by Lai et al. 13 (Fig.  4 ), the panel that represents the moderate cliff seems to be a combination of the exponential and the all-or-none function. In the present study, we will classify responses as moderate cliff if we observe a steep drop in the degree of belief or confidence around a certain p -value/BF, while for the remaining p -values/BFs the decline in confidence is more gradual. So, for example, a combination of the decreasing linear and the all-or-none function will also be classified as moderate cliff in the present study. Plots of the four models with examples of reasonable choices for the parameters are presented in Fig.  1 (the R code for Fig.  1 can be found on https://osf.io/j6d8c ).

figure 1

Plots are shown for fictitious outcomes for the four models (all-or-none, linear, exponential, and moderate cliff). The x-axis represents the different p -values. In the two BF conditions, the x-axis represents the different BF values. The y-axis represents the proportion of degree of belief or confidence that there is a positive effect in the population of interest. Note that these are prototype responses; different variations on these response patterns are possible.

We will manually classify data for each participant for each scenario as one of the relationship models. We will do so by blinding the coders as to the conditions associated with the data. Specifically, author JM will organize the data from each of the four conditions and remove the p -value or BF labels. Subsequently, authors DvR and RH will classify the data independently from one another. In order to improve objectivity regarding the classification, authors DvR and RH will classify the data according to specific instructions that are constructed before collecting the data (see Appendix 1 ). After coding, we will compute Cohen’s kappa for these data. For each set of scores per condition per subject for which there was no agreement on classification, authors DvR and RH will try to reach consensus in a discussion of no longer than 5 min. If after this discussion no agreement is reached, then author DF will classify these data. If author DF will choose the same class as either DvR or RH, then the data will be classified accordingly. However, if author DF will choose another class, then the data will be classified in a so-called rest category. This rest category will also include data that extremely deviate from the four relationship models, and we will assess these data by running exploratory analyses. Before classifying the real data, we will conduct a small pilot study in order to provide authors DvR and RH with the possibility to practice classifying the data. In the Qualtrics survey, the respondents cannot continue with the next question without answering the current question. However, it might be possible that some of the respondents quit filling out the survey. The responses of the participants who did not answer all questions will be removed from the dataset. This means that we will use complete case analysis in order to deal with missing data, because we do not expect to find specific patterns in the missing values.

Our approach to answer Research Question 1 (RQ1; “What is the relation between obtained statistical evidence and the degree of belief or confidence that there is a positive effect in the population of interest across participants?”) will be descriptive in nature. We will explore the results visually, by assessing the four models (i.e., all-or-none, linear, exponential, and moderate cliff) in each of the four conditions (i.e., isolated p -value, all at once p -value, isolated BF, and all at once BF), followed by zooming in on the classification ‘cliff effect’. This means that we will compare the frequency of the four classification models with one another within each of the four conditions.

In order to answer Research Question 2 (RQ2; “What is the difference in this relationship when the statistical evidence is quantified through p -values versus Bayes factors?”), we will first combine categories as follows: the p -value condition will encompass the data from both the isolated and the all at once p -value conditions, and the BF condition will encompass the data from both the isolated and the all at once BF conditions. Furthermore, the cliff condition will encompass the all-or-none and the moderate cliff models, and the non-cliff condition will encompass the linear and the exponential models. This classification ensures that we distinguish between curves that reflect a sudden change in the relationship between the level of statistical evidence and the degree of confidence that a positive effect exists in the population of interest, and those that represent a gradual relationship between the level of statistical evidence and the degree of confidence. We will then compare the proportions of cases with a cliff in the p -value conditions to those in the BF conditions, and we will add inferential information for this comparison by means of a Bayesian chi square test on the 2 × 2 table ( p -value/BF x cliff/non-cliff), as will be specified below.

Finally, in order to answer Research Question 3 (RQ3; “What is the difference in this relationship when the statistical evidence is presented in isolation versus all at once?”), we will first combine categories again, as follows: the isolation condition will encompass the data from both the isolated p -value and the isolated BF conditions, and the all at once condition will encompass the data from both the all at once p -value and the all at once BF conditions. The cliff/non-cliff distinction is made analogous to the one employed for RQ2. We will then compare the proportions of cases with a cliff in the isolated conditions to those in the all at once conditions, and we will add inferential information for this comparison by means of a Bayesian chi square test on the 2 × 2 table (all at once/isolated x cliff/non-cliff), as will be specified below.

For both chi square tests, the null hypothesis states that there is no difference in the proportion of cliff classifications between the two conditions, and the alternative hypothesis states that there is a difference in the proportion of cliff classifications between the two conditions. Under the null hypothesis, we specify a single beta(1,1) prior for the proportion of cliff classifications and under the alternative hypothesis we specify two independent beta(1,1) priors for the proportion of cliff classifications 20 , 21 . A beta(1,1) prior is a flat or uniform prior from 0 to 1. The Bayes factor that will result from both chi square tests gives the relative evidence for the alternative hypothesis over the null hypothesis (BF 10 ) provided by the data. Both tests will be carried out in RStudio 22 (the R code for calculating the Bayes factors can be found on https://osf.io/5xbzt ). Additionally, the posterior of the difference in proportions will be provided (the R code for the posterior of the difference in proportions can be found on https://osf.io/3zhju ).

If, after having computed results on the obtained sample, we observe that our BFs are not higher than 10 or smaller than 0.1, we will expand our sample in the way explained at the end of section “Sampling Plan”. To see whether this approach will likely lead to useful results, we have conducted a Bayesian power simulation study for the case of population proportions of 0.2 and 0.4 (e.g., 20% cliff effect in the p -value group, and 40% cliff effect in the BF group) in order to determine how large the Bayesian power would be for reaching the BF threshold for a sample size of n  = 200. Our results show that for values 0.2 and 0.4 in both populations respectively, our estimated sample size of 200 participants (a 10% response rate) would lead to reaching a BF threshold 96% of the time, suggesting very high power under this alternative hypothesis. We have also conducted a Bayesian power simulation study for the case of population proportions of 0.3 (i.e., 30% cliff effect in the p -value group, and 30% cliff effect in the BF group) in order to determine how long sampling takes for a zero effect. The results show that for values of 0.3 in both populations, our estimated sample size of 200 participants would lead to reaching a BF threshold 7% of the time. Under the more optimistic scenario of a 20% response rate, a sample size of 400 participants would lead to reaching a BF threshold 70% of the time (the R code for the power can be found on https://osf.io/vzdce ). It is well known that it is harder to find strong evidence for the absence of an effect than for the presence of an effect 23 . In light of this, we deem a 70% chance of reaching a BF threshold under the null hypothesis given a 20% response rate acceptable. If, after sampling the first 2000 participants and factoring in the response rate, we have not reached either BF threshold, we will continue sampling participants in increments of 200 (10 per journal) until we reach a BF threshold or until we have an effective sample size of 400, or until we reach a total of 4000 participants.

In sum, RQ1 is exploratory in nature, so we will descriptively explore the patterns in our data. For RQ2, we will determine what proportion of applied researchers make a binary distinction regarding the existence of a positive effect in the population of interest, and we will test whether this binary distinction is different when research results are expressed in the p -value versus the BF condition. Finally, for RQ3, we will determine whether this binary distinction is different in the isolated versus all at once condition (see Table 2 for a summary of the study design).

Sampling process

We deviated from our preregistered sampling plan in the following ways: we collected the e-mail address of all corresponding authors who published in the 20 journals in social and behavioural sciences in 2021 and 2022 at the same time . In total, we contacted 3152 academics, and 89 of them completed our survey (i.e., 2.8% of the contacted academics). We computed the BFs based on the responses of these 89 academics, and it turned out that the BF for RQ2 was equal to BF 10  = 16.13 and the BF for RQ3 was equal to BF 10  = 0.39, so the latter was neither higher than 10 nor smaller than 0.1.

In order to reach at least 4000 potential participants (see “ Planned analyses ” section), we decided to collect additional e-mail addresses of corresponding authors from articles published in 2019 and 2020 in the same 20 journals. In total, we thus reached another 2247 academics (total N = 5399), and 50 of them completed our survey (i.e., 2.2% of the contacted academics, effective N = 139).

In light of the large number of academics we had contacted at this point, we decided to do an ‘interim power analysis’ to calculate the upper and lower bounds of the BF for RQ3 to see if it made sense to continue collecting data up to N = 200. The already collected data of 21 cliffs out of 63 in the isolated conditions and 13 out of 65 in the all-at-once conditions yields a Bayes factor of 0.8 (see “ Results ” section below). We analytically verified that by increasing the number of participants to a total of 200, the strongest possible pro-null evidence we can get given the data we already had would be BF 10  = 0.14, or BF 01  = 6.99 (for 21 cliffs out of 100 in both conditions). In light of this, our judgment was that it was not the best use of human resources to continue collecting data, so we proceeded with a final sample of N = 139.

To summarize our sampling procedure, we contacted 5399 academics in total. Via Qualtrics, 220 participants responded. After removing the responses of the participants who did not complete the content part of our survey (i.e., the questions about the p -values or BFs), 181 cases remained. After removing the cases who were completely unfamiliar with p -values, 177 cases remained. After removing the cases who were completely unfamiliar with BFs, 139 cases remained. Note that there were also many people who responded via e-mail informing us that they were not familiar with interpreting BFs. Since the Qualtrics survey was anonymous, it was impossible for us to know the overlap between people who contacted us via e-mail and via Qualtrics that they were unfamiliar with interpreting BFs.

We contacted a total number of 5399 participants. The total number of participants who filled out the survey completely was N = 139, so 2.6% of the total sample (note that this is a result of both response rate and our requirement that researchers needed to self-report familiarity with interpreting BFs). Our entire Qualtrics survey can be found on https://osf.io/6gkcj . Five “difficult to classify” pilot plots were created such that authors RH and DvR could practice before classifying the real data. These plots can be found on https://osf.io/ndaw6/ (see folder “Pilot plots”). Authors RH and DvR had a qualitative discussion about these plots; however, no adjustments were made to the classification protocol. We manually classified data for each participant for each scenario as one of the relationship models (i.e., all-or-none, moderate cliff, linear, and exponential). Author JM organized the data from each of the four conditions and removed the p -value or BF labels. Authors RH and DvR classified the data according to the protocol provided in Appendix 1 , and the plot for each participant (including the condition each participant was in and the model in which each participant was classified) can be found in Appendix 2 . After coding, Cohen’s kappa was determined for these data, which was equal to κ = 0.47. Authors RH and DvR independently reached the same conclusion for 113 out of 139 data sets (i.e., 81.3%). For the remaining 26 data sets, RH and DvR were able to reach consensus within 5 min per data set, as laid out in the protocol. In Fig.  2 , plots are provided which include the prototype lines as well as the actual responses plotted along with them. This way, all responses can be seen at once along with how they match up with the prototype response for each category. To have a better picture of our sample population, we included the following demographic variables in the survey: gender, main continent, career stage, and broad research area. The results are presented in Table 3 . Based on these results it appeared that most of the respondents who filled out our survey were male (71.2%), living in Europe (51.1%), had a faculty position (94.1%), and were working in the field of psychology (56.1%). The total responses (i.e., including the responses of the respondents who quit filling out our survey) were very similar to the responses of the respondents who did complete our survey.

figure 2

Plots including the prototype lines and the actual responses.

To answer RQ1 (“What is the relation between obtained statistical evidence and the degree of belief or confidence that there is a positive effect in the population of interest across participants?”), we compared the frequency of the four classification models (i.e., all-or-none, moderate cliff, linear, and exponential) with one another within each of the four conditions (i.e., all at once and isolated p -values, and all at once and isolated BFs). The results are presented in Table 4 . In order to enhance the interpretability of the results in Table 4 , we have plotted them in Fig.  3 .

figure 3

Plotted frequency of classification models within each condition.

We observe that within the all at once p -value condition, the cliff models accounted for a proportion of (0 + 11)/33 = 0.33 of the responses. The non-cliff models accounted for a proportion of (1 + 21)/33 = 0.67 of the responses. Looking at the isolated p -value condition, we can see that the cliff models accounted for a proportion of (1 + 15)/35 = 0.46 of the responses. The non-cliff models accounted for a proportion of (0 + 19)/35 = 0.54 of the responses. In the all at once BF condition, we observe that the cliff models accounted for a proportion of (2 + 0)/32 = 0.06 of the responses. The non-cliff models accounted for a proportion of (0 + 30)/32 = 0.94 of the responses. Finally, we observe that within the isolated BF condition, the cliff models accounted for a proportion of (2 + 3)/28 = 0.18 of the responses. The non-cliff models accounted for a proportion of (0 + 23)/28 = 0.82 of the responses.

Thus, we observed a higher proportion of cliff models in p -value conditions than in BF conditions (27/68 = 0.40 vs 7/60 = 0.12), and we observed a higher proportion of cliff models in isolated conditions than in all-at-once conditions (21/63 = 0.33 vs 13/65 = 0.20). Next, we conducted statistical inference to dive deeper into these observations.

To answer RQ2 (“What is the difference in this relationship when the statistical evidence is quantified through p -values versus Bayes factors?”), we compared the sample proportions mentioned above (27/68 = 0.40 and 7/60 = 0.12, respectively, with a difference between these proportions equal to 0.40–0.12 = 0.28), and we tested whether the proportion of cliff classifications in the p -value conditions differed from that in the BF conditions in the population by means of a Bayesian chi square test. For the chi square test, the null hypothesis was that there is no difference in the proportion of cliff classifications between the two conditions, and the alternative hypothesis was that there is a difference in the proportion of cliff classifications between the two conditions.

The BF that resulted from the chi square test was equal to BF 10  = 140.01 and gives the relative evidence for the alternative hypothesis over the null hypothesis provided by the data. This means that the data are 140.01 times more likely under the alternative hypothesis than under the null hypothesis: we found strong support for the alternative hypothesis that there is a difference in the proportion of cliff classifications between the p -value and BF condition. Inspection of Table 4 or Fig.  3 shows that the proportion of cliff classifications is higher in the p -value conditions.

Additionally, the posterior distribution of the difference in proportions is provided in Fig.  4 , and the 95% credible interval was found to be [0.13, 0.41]. This means that there is a 95% probability that the population parameter for the difference of proportions of cliff classifications between p -value conditions and BF conditions lies within this interval, given the evidence provided by the observed data.

figure 4

The posterior density of difference of proportions of cliff models in p -value conditions versus BF conditions.

To answer RQ3 (“What is the difference in this relationship when the statistical evidence is presented in isolation versus all at once?”), we compared the sample proportions mentioned above (21/63 = 0.33 vs 13/65 = 0.20, respectively with a difference between these proportions equal to 0.33–0.20 = 0.13), and we tested whether the proportion of cliff classifications in the all or none conditions differed from that in the isolated conditions in the population by means of a Bayesian chi square test analogous to the test above.

The BF that resulted from the chi square test was equal to BF 10  = 0.81, and gives the relative evidence for the alternative hypothesis over the null hypothesis provided by the data. This means that the data are 0.81 times more likely under the alternative hypothesis than under the null hypothesis: evidence on whether there is a difference in the proportion of cliff classifications between the isolation and all at once conditions is ambiguous.

Additionally, the posterior distribution of the difference in proportions is provided in Fig.  5 . The 95% credible interval is [− 0.28, 0.02].

figure 5

The posterior density of difference of proportions of cliff models in all at once conditions versus isolated conditions.

There were 11 respondents who provided responses that extremely deviated from the four relationship models, so they were included in the rest category, and were left out of the analyses. Eight of these were in the isolated BF condition, one was in the isolated p -value condition, one was in the all at once BF condition, and one was in the all at once p -value condition. For five of these, their outcomes resulted in a roughly decreasing trend with significant large bumps. For four of these, there were one or more considerable increases in the plotted outcomes. For two of these, the line was flat. All these graphs are available in Appendix 2 .

In the present study, we explored the relationship between obtained statistical evidence and the degree of belief or confidence that there is a positive effect in the population of interest. We were in particular interested in the existence of a cliff effect. We compared this relationship for p -values to the relationship for corresponding degrees of evidence quantified through Bayes factors, and we examined whether this relationship was affected by two different modes of presentation. In the isolated presentation mode a possible clear functional form of the relationship across values was not visible to the participants, whereas in the all-at-once presentation mode, such a functional form could easily be seen by the participants.

The observed proportions of cliff models was substantially higher for the p -values than for the BFs, and the credible interval as well as the high BF test value indicate that a (substantial) difference will also hold more generally at the population level. Based on our literature review (summarized in Table 1 ), we did not know of studies that have compared the prevalence of cliff effect when interpreting p -values to that when interpreting BFs, so we think that this part is new in the literature. However, our findings are consistent with previous literature regarding the presence of a cliff effect when using p -values. Although we observed a higher proportion of cliff models for isolated presentations than for all-at-once presentation, we did not get a clear indication from the present results whether or not, at the population level, these proportion differences will also hold. We believe that this comparison between the presentation methods that have been used to investigate the cliff effect is also new. In previous research, the p -values were presented on separate pages in some studies 15 , while in other studies the p -values were presented on the same page 13 .

We deviated from our preregistered sampling plan by collecting the e-mail addresses of all corresponding authors who published in the 20 journals in social and behavioural sciences in 2021 and 2022 simultaneously, rather than sequentially. We do not believe that this approach created any bias in our study results. Furthermore, we decided that it would not make sense to collect additional data (after approaching 5399 academics who published in 2019, 2020, 2021, and 2022 in the 20 journals) in order to reach an effective sample size of 200. Based on our interim power analysis, the strongest possible pro-null evidence we could get if we continued collecting data up to an effective sample size of 200 given the data we already had would be BF 10  = 0.14 or BF 01  = 6.99. Therefore, we decided that it would be unethical to continue collecting additional data.

There were several limitations in this study. Firstly, the response rate was very low. This was probably the case because many academics who we contacted mentioned that they were not familiar with interpreting Bayes factors. It is important to note that our findings apply only to researchers who are at least somewhat familiar with interpreting Bayes factors, and our sample does probably not represent the average researcher in the social and behavioural sciences. Indeed, it is well possible that people who are less familiar with Bayes factors (and possibly with statistics in general) would give responses that were even stronger in line with cliff models, because we expect that researchers who exhibit a cliff effect will generally have less statistical expertise or understanding: there is nothing special about certain p -value or Bayes factor thresholds that merits a qualitative drop in the perceived strength of evidence. Furthermore, a salient finding was that the proportion of graduate students was very small. In our sample, the proportion of graduate students showing a cliff effect is 25% and the proportion of more senior researchers showing a cliff effect is 23%. Although we see no clear difference in our sample, we cannot rule out that our findings might be different if the proportion of graduate students in our sample would be higher.

There were several limitations related to the survey. Some of the participants mentioned via e-mail that in the scenarios insufficient information was provided. For example, we did not provide effect sizes and any information about the research topic. We had decided to leave out this information to make sure that the participants could only focus on the p -values and the Bayes factors. Furthermore, the questions in our survey referred to posterior probabilities. A respondent noted that without being able to evaluate the prior plausibility of the rival hypotheses, the questions were difficult to answer. Although this observation is correct, we do think that many respondents think they can do this nevertheless.

The respondents could indicate their degree of belief or confidence that there is a positive effect in the population of interest based on the fictitious findings on a scale ranging from 0 (completely convinced that there is no effect), through 50 (somewhat convinced that there is a positive effect), to 100 (completely convinced that there is a positive effect). A respondent mentioned that it might be unclear where the midpoint is between somewhat convinced that there is no effect and somewhat convinced that there is a positive effect, so biasing the scale towards yes response. Another respondent mentioned that there was no possibility to indicate no confidence in either the null or the alternative hypothesis. Although this is true, we do not think that many participants experienced this as problematic.

In our exploratory analyses we observed that eight out of eleven unclassifiable responses were in the isolated BF condition. In our survey, the all at once and isolated presentation conditions did not only differ in the way the pieces of statistical evidence were presented, but they also differed in the order. In all at once, the different pieces were presented in sequential order, while in the isolated condition, they were presented in a random order. Perhaps this might be an explanation for why the isolated BF condition contained most of the unclassifiable responses. Perhaps academics are more familiar with single p -values and can more easily place them along a line of “possible values” even if they are presented out of order.

This study indicates that a substantial proportion of researchers who are at least somewhat familiar with interpreting BFs experience a sharp drop in confidence when an effect exists around certain p -values and to a much smaller extent around certain Bayes factor values. But how do people act on these beliefs? In a recent study by Muradchanian et al. 24 , it was shown that editors, reviewers, and authors alike are much less likely to accept for publication, endorse, and submit papers with non-significant results than with significant results, suggesting these believes about the existence of an effect translate into considering certain findings more publication-worthy.

Allowing for these caveats, our findings showed that cliff models were more prevalent when interpreting p -values than when interpreting BFs, based on a sample of academics who were at least somewhat familiar with interpreting BFs. However, the high prevalence of the non-cliff models (i.e., linear and exponential) implied that p -values do not necessarily entail dichotomous thinking for everyone. Nevertheless, it is important to note that the cliff models were still accountable for 37.5% of responses in p -values, whereas in BFs, the cliff models were only accountable for 12.3% of the responses.

We note that dichotomous thinking has a place in interpreting scientific evidence, for instance in the context of decision criteria (if the evidence is more compelling than some a priori agreed level, then we bring this new medicine to the market), or in the context of sampling plans (we stop collecting data once the evidence or level of certainty hits some a priori agreed level). However, we claim that it is not rational for someone’s subjective belief that some effect is non-zero to make a big jump around for example a p -value of 0.05 or a BF of 10, but not at any other point along the range of potential values.

Based on our findings, one might think replacing p -values with BFs might be sufficient to overcome dichotomous thinking. We think that this is probably too simplistic. We believe that rejecting or not rejecting a null hypothesis is probably so deep-seated in the academic culture that dichotomous thinking might become more and more prevalent in the interpretation of BFs in time. In addition to using tools such as p -values or BFs, we agree with Lai et al. 13 that several ways to overcome dichotomous thinking in p -values, BFs, etc. are to focus on teaching (future) academics to formulate research questions requiring quantitative answers such as, for example, evaluating the extent to which therapy A is superior to therapy B rather than only evaluating that therapy A is superior to therapy B, and adopting effect size estimation in addition to statistical hypotheses in both thinking and communication.

In light of the results regarding dichotomous thinking among researchers, future research can focus on, for example, the development of comprehensive teaching methods aimed at cultivating the skills necessary for formulating research questions that require quantitative answers. Pedagogical methods and curricula can be investigated that encourage adopting effect size estimation in addition to statistical hypotheses in both thinking and communication.

Data availability

The raw data are available within the OSF repository: https://osf.io/ndaw6/ .

Code availability

For the generation of the p -values and BFs, the R file “2022-11-04 psbfs.R” can be used; for Fig.  1 , the R file “2021-06-03 ProtoCliffPlots.R” can be used; for the posterior for the difference between the two proportions in RQ2 and RQ3, the R file “2022-02-17 R script posterior for difference between two proportions.R” can be used; for the Bayesian power simulation, the R file “2022-11-04 Bayes Power Sim Cliff.R” can be used; for calculating the Bayes factors in RQ2 and RQ3 the R file “2022-10-21 BFs RQ2 and RQ3.R” can be used; for the calculation of Cohen’s kappa, the R file “2023-07-23 Cohens kappa.R” can be used; for data preparation, the R file “2023-07-23 data preparation.R” can be used; for Fig.  2 , the R file “2024-03-11 data preparation including Fig.  2 .R” can be used; for the interim power analysis, the R file “2024-03-16 Interim power analysis.R” can be used; for Fig.  3 , the R file “2024-03-16 Plot for Table 4 R” can be used. The R codes were written in R version 2022.2.0.443, and are uploaded as part of the supplementary material. These R codes are made available within the OSF repository: https://osf.io/ndaw6/ .

Lakens, D. Why p-Values Should be Interpreted as p-Values and Not as Measures of Evidence [Blog Post] . http://daniellakens.blogspot.com/2021/11/why-p-values-should-be-interpreted-as-p.html . Accessed 20 Nov 2021.

Jeffreys, H. Theory of Probability (Clarendon Press, 1939).

Google Scholar  

van Ravenzwaaij, D. & Etz, A. Simulation studies as a tool to understand Bayes factors. Adv. Methods Pract. Psychol. Sci. 4 , 1–20. https://doi.org/10.1177/2515245920972624 (2021).

Article   Google Scholar  

Wetzels, R. et al. Statistical evidence in experimental psychology: An empirical comparison using 855 t tests. Perspect. Psychol. Sci. 6 , 291–298. https://doi.org/10.1177/1745691611406923 (2011).

Article   PubMed   Google Scholar  

Dhaliwal, S. & Campbell, M. J. Misinterpreting p -values in research. Austral. Med. J. 1 , 1–2. https://doi.org/10.4066/AMJ.2009.191 (2010).

Greenland, S. et al. Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. Eur. J. Epidemiol. 31 , 337–350. https://doi.org/10.1007/s10654-016-0149-3 (2016).

Article   PubMed   PubMed Central   Google Scholar  

Wasserstein, R. L. & Lazar, N. A. The ASA statement on p -values: context, process, and purpose. Am. Stat. 70 , 129–133. https://doi.org/10.1080/00031305.2016.1154108 (2016).

Article   MathSciNet   Google Scholar  

Rosenthal, R. & Gaito, J. The interpretation of levels of significance by psychological researchers. J. Psychol. Interdiscipl. Appl. 55 , 33–38. https://doi.org/10.1080/00223980.1963.9916596 (1963).

Rosenthal, R. & Gaito, J. Further evidence for the cliff effect in interpretation of levels of significance. Psychol. Rep. 15 , 570. https://doi.org/10.2466/pr0.1964.15.2.570 (1964).

Beauchamp, K. L. & May, R. B. Replication report: Interpretation of levels of significance by psychological researchers. Psychol. Rep. 14 , 272. https://doi.org/10.2466/pr0.1964.14.1.272 (1964).

Minturn, E. B., Lansky, L. M. & Dember, W. N. The Interpretation of Levels of Significance by Psychologists: A Replication and Extension. Quoted in Nelson, Rosenthal, & Rosnow, 1986. (1972).

Nelson, N., Rosenthal, R. & Rosnow, R. L. Interpretation of significance levels and effect sizes by psychological researchers. Am. Psychol. 41 , 1299–1301. https://doi.org/10.1037/0003-066X.41.11.1299 (1986).

Lai, J., Kalinowski, P., Fidler, F., & Cumming, G. Dichotomous thinking: A problem beyond NHST. in Data and Context in Statistics Education: Towards an Evidence Based Society , 1–4. http://icots.info/8/cd/pdfs/contributed/ICOTS8_C101_LAI.pdf (2010).

Cumming, G. Statistics education in the social and behavioural sciences: From dichotomous thinking to estimation thinking and meta-analytic thinking. in International Association of Statistical Education , 1–4 . https://www.stat.auckland.ac.nz/~iase/publications/icots8/ICOTS8_C111_CUMMING.pdf (2010).

Poitevineau, J. & Lecoutre, B. Interpretation of significance levels by psychological researchers: The .05 cliff effect may be overstated. Psychon. Bull. Rev. 8 , 847–850. https://doi.org/10.3758/BF03196227 (2001).

Article   CAS   PubMed   Google Scholar  

Hoekstra, R., Johnson, A. & Kiers, H. A. L. Confidence intervals make a difference: Effects of showing confidence intervals on inferential reasoning. Educ. Psychol. Meas. 72 , 1039–1052. https://doi.org/10.1177/0013164412450297 (2012).

Helske, J., Helske, S., Cooper, M., Ynnerman, A. & Besancon, L. Can visualization alleviate dichotomous thinking: Effects of visual representations on the cliff effect. IEEE Trans. Vis. Comput. Graph. 27 , 3379–3409. https://doi.org/10.1109/TVCG.2021.3073466 (2021).

van de Schoot, R., Winter, S. D., Ryan, O., Zondervan-Zwijnenburg, M. & Depaoli, S. A systematic review of Bayesian articles in psychology: The last 25 years. Psychol. Methods 22 , 217–239. https://doi.org/10.1037/met0000100 (2017).

Lartillot, N. & Philippe, H. Computing Bayes factors using thermodynamic integration. Syst. Biol. 55 , 195–207. https://doi.org/10.1080/10635150500433722 (2006).

Gunel, E. & Dickey, J. Bayes factors for independence in contingency tables. Biometrika 61 , 545–557. https://doi.org/10.2307/2334738 (1974).

Jamil, T. et al. Default, “Gunel and Dickey” Bayes factors for contingency tables. Behav. Res. Methods 49 , 638–652. https://doi.org/10.3758/s13428-016-0739-8 (2017).

RStudio Team. RStudio: Integrated Development Environment for R . RStudio, PBC. http://www.rstudio.com/ (2022).

van Ravenzwaaij, D. & Wagenmakers, E.-J. Advantages masquerading as “issues” in Bayesian hypothesis testing: A commentary on Tendeiro and Kiers (2019). Psychol. Methods 27 , 451–465. https://doi.org/10.1037/met0000415 (2022).

Muradchanian, J., Hoekstra, R., Kiers, H. & van Ravenzwaaij, D. The role of results in deciding to publish. MetaArXiv. https://doi.org/10.31222/osf.io/dgshk (2023).

Download references


We would like to thank Maximilian Linde for writing R code which we could use to collect the e-mail addresses of our potential participants. We would also like to thank Julia Bottesini and an anonymous reviewer for helping us improve the quality of our manuscript.

Author information

Authors and affiliations.

Behavioural and Social Sciences, University of Groningen, Groningen, The Netherlands

Jasmine Muradchanian, Rink Hoekstra, Henk Kiers & Don van Ravenzwaaij

Psychology, Rowan University, Glassboro, USA

Dustin Fife

You can also search for this author in PubMed   Google Scholar


J.M., R.H., H.K., D.F., and D.v.R. meet the following authorship conditions: substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data; or the creation of new software used in the work; or have drafted the work or substantively revised it; and approved the submitted version (and any substantially modified version that involves the author's contribution to the study); and agreed both to be personally accountable for the author's own contributions and to ensure that questions related to the accuracy or integrity of any part of the work, even ones in which the author was not personally involved, are appropriately investigated, resolved, and the resolution documented in the literature. J.M. participated in data/statistical analysis, participated in the design of the study, drafted the manuscript and critically revised the manuscript; R.H. participated in data/statistical analysis, participated in the design of the study, and critically revised the manuscript; H.K. participated in the design of the study, and critically revised the manuscript; D.F. participated in the design of the study, and critically revised the manuscript; D.v.R. participated in data/statistical analysis, participated in the design of the study, and critically revised the manuscript.

Corresponding author

Correspondence to Jasmine Muradchanian .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary information 1., supplementary information 2., supplementary information 3., supplementary information 4., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Muradchanian, J., Hoekstra, R., Kiers, H. et al. Comparing researchers’ degree of dichotomous thinking using frequentist versus Bayesian null hypothesis testing. Sci Rep 14 , 12120 (2024). https://doi.org/10.1038/s41598-024-62043-w

Download citation

Received : 07 June 2022

Accepted : 09 May 2024

Published : 27 May 2024

DOI : https://doi.org/10.1038/s41598-024-62043-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

importance of a hypothesis in a scientific investigation


  1. What is an Hypothesis

    importance of a hypothesis in a scientific investigation

  2. Hypothesis Meaning In Research Methodology

    importance of a hypothesis in a scientific investigation

  3. Understanding the importance of a research hypothesis

    importance of a hypothesis in a scientific investigation

  4. 😍 Importance of formulating a hypothesis. HOW TO: Defining Your

    importance of a hypothesis in a scientific investigation

  5. 13 Different Types of Hypothesis (2024)

    importance of a hypothesis in a scientific investigation

  6. What is a Hypothesis

    importance of a hypothesis in a scientific investigation


  1. Importance of Hypothesis Testing in Quality Management #statistics

  2. What Is A Hypothesis?

  3. What is the Role of Hypotheses in Scientific Investigations?

  4. Research Methodology Hypothesis : Meaning , Sources & Importance

  5. The Power of Hypothesis Testing: Unveiling Insights in Seconds! ⚖️🔬

  6. Importance of Hypothesis


  1. 1.1: Scientific Investigation

    Forming a Hypothesis. The next step in a scientific investigation is forming a hypothesis.A hypothesis is a possible answer to a scientific question, but it isn't just any answer. A hypothesis must be based on scientific knowledge, and it must be logical. A hypothesis also must be falsifiable. In other words, it must be possible to make observations that would disprove the hypothesis if it ...

  2. Scientific hypothesis

    Scientific hypothesis, idea that proposes an explanation for an observed phenomenon or narrow set of phenomena. ... The investigation of scientific hypotheses is an important component in the development of scientific theory. ... the latter is a broad general explanation that incorporates data from many different scientific investigations ...

  3. Scientific Hypotheses: Writing, Promoting, and Predicting Implications

    A snapshot analysis of citation activity of hypothesis articles may reveal interest of the global scientific community towards their implications across various disciplines and countries. As a prime example, Strachan's hygiene hypothesis, published in 1989,10 is still attracting numerous citations on Scopus, the largest bibliographic database ...

  4. Formulating Hypotheses for Different Study Designs

    Thus, hypothesis generation is an important initial step in the research workflow, reflecting accumulating evidence and experts' stance. In this article, we overview the genesis and importance of scientific hypotheses and their relevance in the era of the coronavirus disease 2019 (COVID-19) pandemic.

  5. On the scope of scientific hypotheses

    2. The scientific hypothesis. In this section, we will describe a functional and descriptive role regarding how scientists use hypotheses. Jeong & Kwon [] investigated and summarized the different uses the concept of 'hypothesis' had in philosophical and scientific texts.They identified five meanings: assumption, tentative explanation, tentative cause, tentative law, and prediction.

  6. The scientific method (article)

    The scientific method. At the core of biology and other sciences lies a problem-solving approach called the scientific method. The scientific method has five basic steps, plus one feedback step: Make an observation. Ask a question. Form a hypothesis, or testable explanation. Make a prediction based on the hypothesis.

  7. Biology and the scientific method review

    Meaning. Biology. The study of living things. Observation. Noticing and describing events in an orderly way. Hypothesis. A scientific explanation that can be tested through experimentation or observation. Controlled experiment. An experiment in which only one variable is changed.

  8. The Research Hypothesis: Role and Construction

    A hypothesis (from the Greek, foundation) is a logical construct, interposed between a problem and its solution, which represents a proposed answer to a research question. It gives direction to the investigator's thinking about the problem and, therefore, facilitates a solution. Unlike facts and assumptions (presumed true and, therefore, not ...

  9. 1.4 Scientific Investigations

    Other scientific investigations are experimental — for example, treating a cell with a drug while recording changes in the behavior of the cell. The flow chart below shows the typical steps followed in an experimental scientific investigation. The series of steps shown in the flow chart is frequently referred to as the scientific method.

  10. 1.1: Scientific Investigation

    Steps of the Scientific Method . The scientific method consists of the following steps: Making an observation; Asking a question based on that observation; Forming a logical AND testable answer to that question (stated in terms of a hypothesis); Designing a controlled experiment to see if the hypothesis is supported or rejected; Collecting, analyzing, and interpreting the data generated by the ...

  11. Science and Hypothesis

    Importance of a hypothesis for a research study. A hypothesis plays an important role in a scientific investigation or a research study. It provides provisional explanations (suggestions) to a research problem. It guides researchers in finding out the appropriate methodology to carry out the research tasks.

  12. Research Hypothesis: What It Is, Types + How to Develop?

    A research hypothesis helps test theories. A hypothesis plays a pivotal role in the scientific method by providing a basis for testing existing theories. For example, a hypothesis might test the predictive power of a psychological theory on human behavior. It serves as a great platform for investigation activities.

  13. What is a scientific hypothesis?

    A scientific hypothesis is a tentative, testable explanation for a phenomenon in the natural world. It's the initial building block in the scientific method.Many describe it as an "educated guess ...

  14. On the role of hypotheses in science

    INTRODUCTION. Philosophy of science and the theory of knowledge (epistemology) are important branches of philosophy. However, philosophy has over the centuries lost its dominant role it enjoyed in antiquity and became in Medieval Ages the maid of theology (ancilla theologiae) and after the rise of natural sciences and its technological applications many practising scientists and the general ...

  15. A blueprint for scientific investigations

    A scaffold for scientific investigations The process of science involves many layers of complexity, but the key points of that process are straightforward:. There are many routes into the process, including serendipity (e.g., being hit on the head by the proverbial apple), concern over a practical problem (e.g., finding a new treatment for diabetes), and a technological development (e.g., the ...

  16. Generating and Evaluating Scientific Evidence and Explanations

    The evidence-gathering phase of inquiry includes designing the investigation as well as carrying out the steps required to collect the data. Generating evidence entails asking questions, deciding what to measure, developing measures, collecting data from the measures, structuring the data, systematically documenting outcomes of the investigations, interpreting and evaluating the data, and ...

  17. The Role of Hypothesis in Scientific Investigation

    INVESTIGATION. ONE of the most intriguing phases of scientific procedure is the nature, role and origin of the hypothesis. Actually if one ex- amines any discussion of scientific method, he would find no hint at all concerning how to construct or find a " good " hypothesis. We are told that the hypothesis is necessary to " explain " facts, and ...

  18. Scientific Method: Definition and Examples

    The scientific method is a series of steps followed by scientific investigators to answer specific questions about the natural world. It involves making observations, formulating a hypothesis, and conducting scientific experiments. Scientific inquiry starts with an observation followed by the formulation of a question about what has been ...

  19. The Scientific Method: 5 Steps for Investigating Our World

    A hypothesis is a possible answer to a question. It is based on: their own observations, existing theories, and information they gather from other sources. Scientists use their hypothesis to make a prediction, a testable statement that describes what they think the outcome of an investigation will be. 3. Gather Data

  20. Scientific Method

    Science is an enormously successful human enterprise. The study of scientific method is the attempt to discern the activities by which that success is achieved. Among the activities often identified as characteristic of science are systematic observation and experimentation, inductive and deductive reasoning, and the formation and testing of ...

  21. What Is Scientific Investigation? (With Types and Steps)

    A scientific investigation is a process of finding the answer to a question using various research methods. An investigation usually begins when someone observes the world around them and asks questions to which they don't know the answer. Then, they make more observations or develop an experiment to test a hypothesis.

  22. Importance of the Scientific Method

    Gregor Johann Mendel, the Austrian priest, biologist and botanist whose work laid the foundation for the study of genetics. The scientific method attempts to minimize the influence of bias or prejudice in the experimenter. Even the best-intentioned scientists can't escape bias. It results from personal beliefs, as well as cultural beliefs ...

  23. Comparing researchers' degree of dichotomous thinking using ...

    A large amount of scientific literature in social and behavioural sciences bases their conclusions on one or more hypothesis tests. As such, it is important to obtain more knowledge about how ...