U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Dtsch Arztebl Int
  • v.107(44); 2010 Nov

Linear Regression Analysis

Astrid schneider.

1 Departrment of Medical Biometrics, Epidemiology, and Computer Sciences, Johannes Gutenberg University, Mainz, Germany

Gerhard Hommel

Maria blettner.

Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication.

This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience.

After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately.

The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.

The purpose of statistical evaluation of medical data is often to describe relationships between two variables or among several variables. For example, one would like to know not just whether patients have high blood pressure, but also whether the likelihood of having high blood pressure is influenced by factors such as age and weight. The variable to be explained (blood pressure) is called the dependent variable, or, alternatively, the response variable; the variables that explain it (age, weight) are called independent variables or predictor variables. Measures of association provide an initial impression of the extent of statistical dependence between variables. If the dependent and independent variables are continuous, as is the case for blood pressure and weight, then a correlation coefficient can be calculated as a measure of the strength of the relationship between them ( box 1 ).

Interpretation of the correlation coefficient (r)

Spearman’s coefficient:

Describes a monotone relationship

A monotone relationship is one in which the dependent variable either rises or sinks continuously as the independent variable rises.

Pearson’s correlation coefficient:

Describes a linear relationship

Interpretation/meaning:

Correlation coefficients provide information about the strength and direction of a relationship between two continuous variables. No distinction between the explaining variable and the variable to be explained is necessary:

  • r = ± 1: perfect linear and monotone relationship. The closer r is to 1 or –1, the stronger the relationship.
  • r = 0: no linear or monotone relationship
  • r < 0: negative, inverse relationship (high values of one variable tend to occur together with low values of the other variable)
  • r > 0: positive relationship (high values of one variable tend to occur together with high values of the other variable)

Graphical representation of a linear relationship:

Scatter plot with regression line

A negative relationship is represented by a falling regression line (regression coefficient b < 0), a positive one by a rising regression line (b > 0).

Regression analysis is a type of statistical evaluation that enables three things:

  • Description: Relationships among the dependent variables and the independent variables can be statistically described by means of regression analysis.
  • Estimation: The values of the dependent variables can be estimated from the observed values of the independent variables.
  • Prognostication: Risk factors that influence the outcome can be identified, and individual prognoses can be determined.

Regression analysis employs a model that describes the relationships between the dependent variables and the independent variables in a simplified mathematical form. There may be biological reasons to expect a priori that a certain type of mathematical function will best describe such a relationship, or simple assumptions have to be made that this is the case (e.g., that blood pressure rises linearly with age). The best-known types of regression analysis are the following ( table 1 ):

  • Linear regression,
  • Logistic regression, and
  • Cox regression.

The goal of this article is to introduce the reader to linear regression. The theory is briefly explained, and the interpretation of statistical parameters is illustrated with examples. The methods of regression analysis are comprehensively discussed in many standard textbooks ( 1 – 3 ).

Cox regression will be discussed in a later article in this journal.

Linear regression is used to study the linear relationship between a dependent variable Y (blood pressure) and one or more independent variables X (age, weight, sex).

The dependent variable Y must be continuous, while the independent variables may be either continuous (age), binary (sex), or categorical (social status). The initial judgment of a possible relationship between two continuous variables should always be made on the basis of a scatter plot (scatter graph). This type of plot will show whether the relationship is linear ( figure 1 ) or nonlinear ( figure 2 ).

An external file that holds a picture, illustration, etc.
Object name is Dtsch_Arztebl_Int-107-0776_001.jpg

A scatter plot showing a linear relationship

An external file that holds a picture, illustration, etc.
Object name is Dtsch_Arztebl_Int-107-0776_002.jpg

A scatter plot showing an exponential relationship. In this case, it would not be appropriate to compute a coefficient of determination or a regression line

Performing a linear regression makes sense only if the relationship is linear. Other methods must be used to study nonlinear relationships. The variable transformations and other, more complex techniques that can be used for this purpose will not be discussed in this article.

Univariable linear regression

Univariable linear regression studies the linear relationship between the dependent variable Y and a single independent variable X. The linear regression model describes the dependent variable with a straight line that is defined by the equation Y = a + b × X, where a is the y-intersect of the line, and b is its slope. First, the parameters a and b of the regression line are estimated from the values of the dependent variable Y and the independent variable X with the aid of statistical methods. The regression line enables one to predict the value of the dependent variable Y from that of the independent variable X. Thus, for example, after a linear regression has been performed, one would be able to estimate a person’s weight (dependent variable) from his or her height (independent variable) ( figure 3 ).

An external file that holds a picture, illustration, etc.
Object name is Dtsch_Arztebl_Int-107-0776_003.jpg

A scatter plot and the corresponding regression line and regression equation for the relationship between the dependent variable body weight (kg) and the independent variable height (m).

r = Pearsons’s correlation coefficient

R-squared linear = coefficient of determination

The slope b of the regression line is called the regression coefficient. It provides a measure of the contribution of the independent variable X toward explaining the dependent variable Y. If the independent variable is continuous (e.g., body height in centimeters), then the regression coefficient represents the change in the dependent variable (body weight in kilograms) per unit of change in the independent variable (body height in centimeters). The proper interpretation of the regression coefficient thus requires attention to the units of measurement. The following example should make this relationship clear:

In a fictitious study, data were obtained from 135 women and men aged 18 to 27. Their height ranged from 1.59 to 1.93 meters. The relationship between height and weight was studied: weight in kilograms was the dependent variable that was to be estimated from the independent variable, height in centimeters. On the basis of the data, the following regression line was determined: Y= –133.18 + 1.16 × X, where X is height in centimeters and Y is weight in kilograms. The y-intersect a = –133.18 is the value of the dependent variable when X = 0, but X cannot possibly take on the value 0 in this study (one obviously cannot expect a person of height 0 centimeters to weigh negative 133.18 kilograms). Therefore, interpretation of the constant is often not useful. In general, only values within the range of observations of the independent variables should be used in a linear regression model; prediction of the value of the dependent variable becomes increasingly inaccurate the further one goes outside this range.

The regression coefficient of 1.16 means that, in this model, a person’s weight increases by 1.16 kg with each additional centimeter of height. If height had been measured in meters, rather than in centimeters, the regression coefficient b would have been 115.91 instead. The constant a, in contrast, is independent of the unit chosen to express the independent variables. Proper interpretation thus requires that the regression coefficient should be considered together with the units of all of the involved variables. Special attention to this issue is needed when publications from different countries use different units to express the same variables (e.g., feet and inches vs. centimeters, or pounds vs. kilograms).

Figure 3 shows the regression line that represents the linear relationship between height and weight.

For a person whose height is 1.74 m, the predicted weight is 68.50 kg (y = –133.18 + 115.91 × 1.74 m). The data set contains 6 persons whose height is 1.74 m, and their weights vary from 63 to 75 kg.

Linear regression can be used to estimate the weight of any persons whose height lies within the observed range (1.59 m to 1.93 m). The data set need not include any person with this precise height. Mathematically it is possible to estimate the weight of a person whose height is outside the range of values observed in the study. However, such an extrapolation is generally not useful.

If the independent variables are categorical or binary, then the regression coefficient must be interpreted in reference to the numerical encoding of these variables. Binary variables should generally be encoded with two consecutive whole numbers (usually 0/1 or 1/2). In interpreting the regression coefficient, one should recall which category of the independent variable is represented by the higher number (e.g., 2, when the encoding is 1/2). The regression coefficient reflects the change in the dependent variable that corresponds to a change in the independent variable from 1 to 2.

For example, if one studies the relationship between sex and weight, one obtains the regression line Y = 47.64 + 14.93 × X, where X = sex (1 = female, 2 = male). The regression coefficient of 14.93 reflects the fact that men are an average of 14.93 kg heavier than women.

When categorical variables are used, the reference category should be defined first, and all other categories are to be considered in relation to this category.

The coefficient of determination, r 2 , is a measure of how well the regression model describes the observed data ( Box 2 ). In univariable regression analysis, r 2 is simply the square of Pearson’s correlation coefficient. In the particular fictitious case that is described above, the coefficient of determination for the relationship between height and weight is 0.785. This means that 78.5% of the variance in weight is due to height. The remaining 21.5% is due to individual variation and might be explained by other factors that were not taken into account in the analysis, such as eating habits, exercise, sex, or age.

Coefficient of determination (R-squared)

Definition:

  • n be the number of observations (e.g., subjects in the study)
  • ŷ i be the estimated value of the dependent variable for the i th observation, as computed with the regression equation
  • y i be the observed value of the dependent variable for the i th observation
  • y be the mean of all n observations of the dependent variable

The coefficient of determination is then defined

as follows:

In formal terms, the null hypothesis, which is the hypothesis that b = 0 (no relationship between variables, the regression coefficient is therefore 0), can be tested with a t-test. One can also compute the 95% confidence interval for the regression coefficient ( 4 ).

Multivariable linear regression

In many cases, the contribution of a single independent variable does not alone suffice to explain the dependent variable Y. If this is so, one can perform a multivariable linear regression to study the effect of multiple variables on the dependent variable.

In the multivariable regression model, the dependent variable is described as a linear function of the independent variables X i , as follows: Y = a + b1 × X1 + b2 × X 2 +…+ b n × X n . The model permits the computation of a regression coefficient b i for each independent variable X i ( box 3 ).

Regression line for a multivariable regression

Y= a + b 1 × X 1 + b 2 × X 2 + …+ b n × X n ,

Y = dependent variable

X i = independent variables

a = constant (y-intersect)

b i = regression coefficient of the variable X i

Example: regression line for a multivariable regression Y = –120.07 + 100.81 × X 1 + 0.38 × X 2 + 3.41 × X 3 ,

X 1 = height (meters)

X 2 = age (years)

X 3 = sex (1 = female, 2 = male)

Y = the weight to be estimated (kg)

Just as in univariable regression, the coefficient of determination describes the overall relationship between the independent variables X i (weight, age, body-mass index) and the dependent variable Y (blood pressure). It corresponds to the square of the multiple correlation coefficient, which is the correlation between Y and b 1 × X 1 + … + b n × X n .

It is better practice, however, to give the corrected coefficient of determination, as discussed in Box 2 . Each of the coefficients b i reflects the effect of the corresponding individual independent variable X i on Y, where the potential influences of the remaining independent variables on X i have been taken into account, i.e., eliminated by an additional computation. Thus, in a multiple regression analysis with age and sex as independent variables and weight as the dependent variable, the adjusted regression coefficient for sex represents the amount of variation in weight that is due to sex alone, after age has been taken into account. This is done by a computation that adjusts for age, so that the effect of sex is not confounded by a simultaneously operative age effect ( box 4 ).

Two important terms

  • Confounder (in non-randomized studies): an independent variable that is associated, not only with the dependent variable, but also with other independent variables. The presence of confounders can distort the effect of the other independent variables. Age and sex are frequent confounders.
  • Adjustment: a statistical technique to eliminate the influence of one or more confounders on the treatment effect. Example: Suppose that age is a confounding variable in a study of the effect of treatment on a certain dependent variable. Adjustment for age involves a computational procedure to mimic a situation in which the men and women in the data set were of the same age. This computation eliminates the influence of age on the treatment effect.

In this way, multivariable regression analysis permits the study of multiple independent variables at the same time, with adjustment of their regression coefficients for possible confounding effects between variables.

Multivariable analysis does more than describe a statistical relationship; it also permits individual prognostication and the evaluation of the state of health of a given patient. A linear regression model can be used, for instance, to determine the optimal values for respiratory function tests depending on a person’s age, body-mass index (BMI), and sex. Comparing a patient’s measured respiratory function with these computed optimal values yields a measure of his or her state of health.

Medical questions often involve the effect of a very large number of factors (independent variables). The goal of statistical analysis is to find out which of these factors truly have an effect on the dependent variable. The art of statistical evaluation lies in finding the variables that best explain the dependent variable.

One way to carry out a multivariable regression is to include all potentially relevant independent variables in the model (complete model). The problem with this method is that the number of observations that can practically be made is often less than the model requires. In general, the number of observations should be at least 20 times greater than the number of variables under study.

Moreover, if too many irrelevant variables are included in the model, overadjustment is likely to be the result: that is, some of the irrelevant independent variables will be found to have an apparent effect, purely by chance. The inclusion of irrelevant independent variables in the model will indeed allow a better fit with the data set under study, but, because of random effects, the findings will not generally be applicable outside of this data set ( 1 ). The inclusion of irrelevant independent variables also strongly distorts the determination coefficient, so that it no longer provides a useful index of the quality of fit between the model and the data ( Box 2 ).

In the following sections, we will discuss how these problems can be circumvented.

The selection of variables

For the regression model to be robust and to explain Y as well as possible, it should include only independent variables that explain a large portion of the variance in Y. Variable selection can be performed so that only such independent variables are included ( 1 ).

Variable selection should be carried out on the basis of medical expert knowledge and a good understanding of biometrics. This is optimally done as a collaborative effort of the physician-researcher and the statistician. There are various methods of selecting variables:

Forward selection

Forward selection is a stepwise procedure that includes variables in the model as long as they make an additional contribution toward explaining Y. This is done iteratively until there are no variables left that make any appreciable contribution to Y.

Backward selection

Backward selection, on the other hand, starts with a model that contains all potentially relevant independent variables. The variable whose removal worsens the prediction of the independent variable of the overall set of independent variables to the least extent is then removed from the model. This procedure is iterated until no dependent variables are left that can be removed without markedly worsening the prediction of the independent variable.

Stepwise selection

Stepwise selection combines certain aspects of forward and backward selection. Like forward selection, it begins with a null model, adds the single independent variable that makes the greatest contribution toward explaining the dependent variable, and then iterates the process. Additionally, a check is performed after each such step to see whether one of the variables has now become irrelevant because of its relationship to the other variables. If so, this variable is removed.

Block inclusion

There are often variables that should be included in the model in any case—for example, the effect of a certain form of treatment, or independent variables that have already been found to be relevant in prior studies. One way of taking such variables into account is their block inclusion into the model. In this way, one can combine the forced inclusion of some variables with the selective inclusion of further independent variables that turn out to be relevant to the explanation of variation in the dependent variable.

The evaluation of a regression model requires the performance of both forward and backward selection of variables. If these two procedures result in the selection of the same set of variables, then the model can be considered robust. If not, a statistician should be consulted for further advice.

The study of relationships between variables and the generation of risk scores are very important elements of medical research. The proper performance of regression analysis requires that a number of important factors should be considered and tested:

1. Causality

Before a regression analysis is performed, the causal relationships among the variables to be considered must be examined from the point of view of their content and/or temporal relationship. The fact that an independent variable turns out to be significant says nothing about causality. This is an especially relevant point with respect to observational studies ( 5 ).

2. Planning of sample size

The number of cases needed for a regression analysis depends on the number of independent variables and of their expected effects (strength of relationships). If the sample is too small, only very strong relationships will be demonstrable. The sample size can be planned in the light of the researchers’ expectations regarding the coefficient of determination (r 2 ) and the regression coefficient (b). Furthermore, at least 20 times as many observations should be made as there are independent variables to be studied; thus, if one wants to study 2 independent variables, one should make at least 40 observations.

3. Missing values

Missing values are a common problem in medical data. Whenever the value of either a dependent or an independent variable is missing, this particular observation has to be excluded from the regression analysis. If many values are missing from the dataset, the effective sample size will be appreciably diminished, and the sample may then turn out to be too small to yield significant findings, despite seemingly adequate advance planning. If this happens, real relationships can be overlooked, and the study findings may not be generally applicable. Moreover, selection effects can be expected in such cases. There are a number of ways to deal with the problem of missing values ( 6 ).

4. The data sample

A further important point to be considered is the composition of the study population. If there are subpopulations within it that behave differently with respect to the independent variables in question, then a real effect (or the lack of an effect) may be masked from the analysis and remain undetected. Suppose, for instance, that one wishes to study the effect of sex on weight, in a study population consisting half of children under age 8 and half of adults. Linear regression analysis over the entire population reveals an effect of sex on weight. If, however, a subgroup analysis is performed in which children and adults are considered separately, an effect of sex on weight is seen only in adults, and not in children. Subgroup analysis should only be performed if the subgroups have been predefined, and the questions already formulated, before the data analysis begins; furthermore, multiple testing should be taken into account ( 7 , 8 ).

5. The selection of variables

If multiple independent variables are considered in a multivariable regression, some of these may turn out to be interdependent. An independent variable that would be found to have a strong effect in a univariable regression model might not turn out to have any appreciable effect in a multivariable regression with variable selection. This will happen if this particular variable itself depends so strongly on the other independent variables that it makes no additional contribution toward explaining the dependent variable. For related reasons, when the independent variables are mutually dependent, different independent variables might end up being included in the model depending on the particular technique that is used for variable selection.

Linear regression is an important tool for statistical analysis. Its broad spectrum of uses includes relationship description, estimation, and prognostication. The technique has many applications, but it also has prerequisites and limitations that must always be considered in the interpretation of findings ( Box 5 ).

What special points require attention in the interpretation of a regression analysis?

  • How big is the study sample?
  • Is causality demonstrable or plausible, in view of the content or temporal relationship of the variables?
  • Has there been adjustment for potential confounding effects?
  • Is the inclusion of the independent variables that were used justified, in view of their content?
  • What is the corrected coefficient of determination (R-squared)?
  • Is the study sample homogeneous?
  • In what units were the potentially relevant independent variables reported?
  • Was a selection of the independent variables (potentially relevant independent variables) performed, and, if so, what kind of selection?
  • If a selection of variables was performed, was its result confirmed by a second selection of variables that was performed by a different procedure?
  • Are predictions of the dependent variable made on the basis of extrapolated data?

An external file that holds a picture, illustration, etc.
Object name is Dtsch_Arztebl_Int-107-0776_004.jpg

→ r 2 is the fraction of the overall variance that is explained. The closer the regression model’s estimated values ŷ i lie to the observed values y i , the nearer the coefficient of determination is to 1 and the more accurate the regression model is.

Meaning: In practice, the coefficient of determination is often taken as a measure of the validity of a regression model or a regression estimate. It reflects the fraction of variation in the Y-values that is explained by the regression line.

Problem: The coefficient of determination can easily be made artificially high by including a large number of independent variables in the model. The more independent variables one includes, the higher the coefficient of determination becomes. This, however, lowers the precision of the estimate (estimation of the regression coefficients b i ).

Solution: Instead of the raw (uncorrected) coefficient of determination, the corrected coefficient of determination should be given: the latter takes the number of explanatory variables in the model into account. Unlike the uncorrected coefficient of determination, the corrected one is high only if the independent variables have a sufficiently large effect.

Acknowledgments

Translated from the original German by Ethan Taub, MD

Conflict of interest statement

The authors declare that they have no conflict of interest as defined by the guidelines of the International Committee of Medical Journal Editors.

linear regression Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

Modeling the impact of some independent parameters on the syngas characteristics during plasma gasification of municipal solid waste using artificial neural network and stepwise linear regression methods

The effect of conflict and termination of employment on employee's work spirit.

This study aims to find out the conflict and termination of employment both partially and simultaneously have a significant effect on the morale of employees at PT. The benefits of Medan Technique and how much it affects. The method used in this research is quantitative method with several tests namely reliability analysis, classical assumption deviation test and linear regression. Based on the results of primary data regression processed using SPSS 20, multiple linear regression equations were obtained as follows: Y = 1,031 + 0.329 X1+ 0.712 X2.In part, the conflict variable (X1)has a significant effect on the employee's work spirit (Y) at PT. Medan Technical Benefits. This means that the hypothesis in this study was accepted, proven from the value of t calculate > t table (3,952 < 2,052). While the variable termination of employment (X2) has a significant influence on the work spirit of employees (Y) in PT. Medan Technical Benefits. This means that the hypothesis in this study was accepted, proven from the value of t calculate > t table (7,681 > 2,052). Simultaneously, variable conflict (X1) and termination of employment (X2) have a significant influence on the morale of employees (Y) in PT. Medan Technical Benefits. This means that the hypothesis in this study was accepted, as evidenced by the calculated F value > F table (221,992 > 3.35). Conflict variables (X1) and termination of employment (X2) were able to contribute an influence on employee morale variables (Y) of 94.3% while the remaining 5.7% was influenced by other variables not studied in this study. From the above conclusions, the author advises that employees and leaders should reduce prolonged conflict so that the spirit of work can increase. Leaders should be more selective in severing employment relationships so that decent employees are not dismissed unilaterally. Employees should work in a high spirit so that the company can see the quality that employees have.

Truncated $L^1$ Regularized Linear Regression: Theory and Algorithm

Hypothesis testing in high-dimensional linear regression: a normal reference scale-invariant test, new tests for high-dimensional linear regression based on random projection, asymptotic optimality of cp-type criteria in high-dimensional multivariate linear regression models, scaled partial envelope model in multivariate linear regression, implementation of a non-linear regression model in rolling bearing diagnostics, factors affecting employees’ job satisfaction at vinh long radio and television station.

The study’s objective is to determine factors impacting employees’ job satisfaction at Vinh Long Radio and Television (TV) Station. The authors used convenient sampling to collect data from 233 employees working at Vinh Long Radio and TV Station. The exploratory factor analysis and multivariable linear regression help the study find seven factors affecting job satisfaction. They include nature of work, training and promotion, income, leadership, colleagues, working environment, work pressure, and work autonomy. In which, work pressure has the most negative influence on job satisfaction of employees working at Vinh Long Radio and TV Station.

Pengaruh Produk Domestik Regional Bruto (PDRB) terhadap Tingkat Kemiskinan Kota Medan

This study has the benefit of analyzing the effect of regional gross domestic product on poverty in the city of Medan in 2010-2020. The research method used is a quantitative method with reference to a descriptive approach. The data used is time series data on economic growth and poverty at the Central Statistics Agency (BPS) of Medan City in 2010-2020. Data collection techniques used are journals, book documentation, and previous reports. The technique of analyzing the data uses simple linear regression analysis which is carried out to determine whether the model used is free from deviations from the classical assumption test. The equations obtained from the simple linear regression analysis test Y = 24576.325 – 0.365X and have the understanding that the GRDP variable (X) has a significant effect on Poverty (Y). Obtained a value of R2 (R square) of 0.556 with the understanding that the independent variable, namely GRDP, affects the variable of the poverty level in Medan City by 55.6%. Meanwhile, the remaining 44.4% are influenced by different independent variables and are not included in this study. For this reason, it can be concluded that when GRDP increases, it will have an impact on decreasing the value of Poverty in Medan City, and vice versa. Keywords: Gross Regional Domestic Product; Poverty; Medan city

Export Citation Format

Share document.

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Linear Regression Analyisis 2nd edition[George A.F.Seber,Alan J.Lee].pdf

Profile image of Nicko V.

Linear Regression Analysis, 2nd edition (Wiley Series in Probability and Statistics) George A. F. Seber, Alan J. Lee Year: 2003 Edition: 2 Language: en Pages: 582

Related Papers

Applied Linear Regression, Third Edition (Wiley Series in Probability and Statistics) Sanford Weisberg Year: 2005 Edition: 3 Language: en Pages: 336

linear regression research paper pdf

Rishu Singh

David Adeabah

Jamie DeCoster

Büşra Tüfek

Instructor's Manual William E. Griffiths, R.Carter Hill, Guay C. Lim, Simon yunho Cho, Simone Si-Yin Wong

Technometrics

Stanley Sclove

Anthony Onwuegbuzie

Richard Heiberger

Rachev/Probability

Zari Rachev

RELATED PAPERS

Journal of Systems Integration

Felicita Chromjaková

Veterinaria Noticias

Luiz Carlos Guilherme

The Ninth International Conference on Advanced Semiconductor Devices and Mircosystems

Gesualdo Donnarumma

Eli E L I Farazi

International Journal of Advanced Computer Science and Applications

Makedonya Sorunu

Kaya Bayraktar

Reidun Pommeresche

Revista Ciencias …

Mauri Martins Teixeira

Physical Review Letters

V. Kartvelishvili

European Journal of Epidemiology

Maureen Hatch

British Journal of Educational Studies

Paul Vermeer

Matematicki Vesnik

Francisco Gallego Lupiañez

Frontiers in public health

Miroslav Kjosevski

Archives of Pediatrics &amp; Adolescent Medicine

UW–Stout毕业证书 威大斯托特分校学位证

Antonio S Gago

hbgjfgf hyetgwerf

Irene SOLBES CANALES

Gokul krishnan

Klaas Smelik

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Subscribe to the PwC Newsletter

Join the community, edit social preview.

linear regression research paper pdf

Add a new code entry for this paper

Remove a code repository from this paper, mark the official implementation from paper authors, add a new evaluation result row, remove a task, add a method, remove a method, edit datasets, a gpu-accelerated bi-linear admm algorithm for distributed sparse machine learning.

25 May 2024  ·  Alireza Olama , Andreas Lundell , Jan Kronqvist , Elham Ahmadi , Eduardo Camponogara · Edit social preview

This paper introduces the Bi-linear consensus Alternating Direction Method of Multipliers (Bi-cADMM), aimed at solving large-scale regularized Sparse Machine Learning (SML) problems defined over a network of computational nodes. Mathematically, these are stated as minimization problems with convex local loss functions over a global decision vector, subject to an explicit $\ell_0$ norm constraint to enforce the desired sparsity. The considered SML problem generalizes different sparse regression and classification models, such as sparse linear and logistic regression, sparse softmax regression, and sparse support vector machines. Bi-cADMM leverages a bi-linear consensus reformulation of the original non-convex SML problem and a hierarchical decomposition strategy that divides the problem into smaller sub-problems amenable to parallel computing. In Bi-cADMM, this decomposition strategy is based on a two-phase approach. Initially, it performs a sample decomposition of the data and distributes local datasets across computational nodes. Subsequently, a delayed feature decomposition of the data is conducted on Graphics Processing Units (GPUs) available to each node. This methodology allows Bi-cADMM to undertake computationally intensive data-centric computations on GPUs, while CPUs handle more cost-effective computations. The proposed algorithm is implemented within an open-source Python package called Parallel Sparse Fitting Toolbox (PsFiT), which is publicly available. Finally, computational experiments demonstrate the efficiency and scalability of our algorithm through numerical benchmarks across various SML problems featuring distributed datasets.

Code Edit Add Remove Mark official

Tasks edit add remove, datasets edit, results from the paper edit add remove, methods edit add remove.

IMAGES

  1. (PDF) Linear Regression Analysis Using R for Research and Development

    linear regression research paper pdf

  2. (PDF) Multiple Linear regression model for predicting bidding price

    linear regression research paper pdf

  3. Regression Analysis Spss Interpretation Pdf

    linear regression research paper pdf

  4. (PDF) Linear Regression Analysis Part 14 of a Series on Evaluation of

    linear regression research paper pdf

  5. Basics of Linear Regression. All basic things you need to know about

    linear regression research paper pdf

  6. Multiple Linear Regression

    linear regression research paper pdf

VIDEO

  1. R: Linear Regression Basic Interpretation

  2. Statistics: Linear regression examples

  3. Introduction to Linear Models

  4. Linear Regression

  5. ML_V1: Introduction into Linear Regression Models

  6. Difference between linear regression and logistic regression #logisticregression #linearregression

COMMENTS

  1. (PDF) Linear regression analysis study

    Linear regression is a statistical procedure for calculating the value of a dependent variable from an independent variable. Linear regression measures the association between two variables. It is ...

  2. (PDF) Research on linear regression algorithm

    PDF | Linear regression is one of the most widely used predictive models in statistics and machine learning. This paper aims to comprehensively discuss... | Find, read and cite all the research ...

  3. (PDF) A Review on Linear Regression Comprehensive in ...

    simplest and most common machine learning algorithms. It is a. mathematical approach used to perform predictive analysis. Linear regression allows continuous/real or mathematical. variables ...

  4. PDF Multiple Linear Regression (2nd Edition) Mark Tranmer Jen Murphy Mark

    In both cases, we still use the term 'linear' because we assume that the response variable is directly related to a linear combination of the explanatory variables. The equation for multiple linear regression has the same form as that for simple linear regression but has more terms: = 0 +. 1 +. 2 + ⋯ +.

  5. PDF Using regression analysis to establish the relationship between home

    Home environment and reading achievement research has been largely dominated by a focus on early reading acquisition, while research on the relationship between home environments and reading success with preadolescents (Grades 4-6) has been largely overlooked. There are other limitations as well. Clarke and Kurtz-Costes (1997) argued that prior ...

  6. PDF Chapter 9 Simple Linear Regression

    9.1. THE MODEL BEHIND LINEAR REGRESSION 217 0 2 4 6 8 10 0 5 10 15 x Y Figure 9.1: Mnemonic for the simple regression model. than ANOVA. If the truth is non-linearity, regression will make inappropriate predictions, but at least regression will have a chance to detect the non-linearity.

  7. PDF Applied Linear Regression

    3.2 The Multiple Linear Regression Model, 55 3.3 Predictors and Regressors, 55 3.4 Ordinary Least Squares, 58 3.4.1 Data and Matrix Notation, 60 3.4.2 The Errors e, 61 3.4.3 Ordinary Least Squares Estimators, 61 ... 6.6.2 Why Most Published Research Findings Are False, 147 ...

  8. PDF LINEAR REGRESSION USING R

    to develop linear regression models. It uses a large, publicly available data set as a running example throughout the text and employs the R program-ming language environment as the computational engine for developing the models. This tutorial will not make you an expert in regression modeling, nor a complete programmer in R.

  9. Linear Regression Analysis

    Univariable linear regression. Univariable linear regression studies the linear relationship between the dependent variable Y and a single independent variable X. The linear regression model describes the dependent variable with a straight line that is defined by the equation Y = a + b × X, where a is the y-intersect of the line, and b is its ...

  10. PDF LINEAR MODELS IN STATISTICS

    11.2.1 A Bayesian Multiple Regression Model with a Conjugate Prior 280 11.2.2 Marginal Posterior Density of b 282 11.2.3 Marginal Posterior Densities of tand s2 284 11.3 Inference in Bayesian Multiple Linear Regression 285 11.3.1 Bayesian Point and Interval Estimates of Regression Coefficients 285 11.3.2 Hypothesis Tests for Regression ...

  11. PDF A Study on Multiple Linear Regression Analysis

    7(# 5. O ( &6 9. View metadata, citation and similar papers at core.ac.uk. brought to you by CORE. provided by Elsevier - Publisher Connector. Available online at www.sciencedirect.com.

  12. PDF Intro to Linear Regression

    Recall the slope-intercept form of a line, y = mx + b. For instance, in the red equation, m = 1 and. 2 b = 2. In the blue equation, m = 1 and b = 5. Review: slope-intercept form of a line. b is the y-intercept, or where the line crosses the y-axis. It is the predicted value of y when x = 0. m is the slope, which tells us the predicted increase ...

  13. PDF An Insight of Linear Regression Analysis

    this paper. The effectiveness of the model utility test in testing the significance of regression model is evaluated using simple linear regression model with the significance level α = 0.01, 0.025 and 0.05. The study in this paper shows that a regression model that is declared to be a significant model by using

  14. linear regression Latest Research Papers

    The method used in this research is quantitative method with several tests namely reliability analysis, classical assumption deviation test and linear regression. Based on the results of primary data regression processed using SPSS 20, multiple linear regression equations were obtained as follows: Y = 1,031 + 0.329 X1+ 0.712 X2.In part, the ...

  15. (PDF) Regression Analysis

    7.1 Introduction. Regression analysis is one of the most fr equently used tools in market resear ch. In its. simplest form, regression analys is allows market researchers to analyze rela tionships ...

  16. PDF Linear regression reporting practices for health researchers, a cross

    Methods. Reporting practices for linear regression were assessed in 95 randomly sampled published papers in the health field from PLOS ONE in 2019, which were randomly allocated to statisticians for post-publication review. The prevalence of reporting practices is described using frequencies, percentages, and Wilson 95% confidence intervals.

  17. Linear Regression Analysis on Net Income of an Agrochemical Company in

    Simple linear regression: Simple linear regression is a model with a single regressor x that has a. relationship with a response y that is a straight line. This simple linear regression. model can be expressed as. y = β0 + β1x + ε. where the intercept β0 and the slope β1 are unknown constants and ε is a random.

  18. PDF Lecture 9: Linear Regression

    Regression. Technique used for the modeling and analysis of numerical data. Exploits the relationship between two or more variables so that we can gain information about one of them through knowing values of the other. Regression can be used for prediction, estimation, hypothesis testing, and modeling causal relationships.

  19. (PDF) Linear Regression Analyisis 2nd edition[George A.F.Seber,Alan J

    Linear Regression Analysis, 2nd edition (Wiley Series in Probability and Statistics) George A. F. Seber, Alan J. Lee Year: 2003 Edition: 2 Language: en Pages: 582 ... .pdf. Linear Regression Analyisis 2nd edition[George A.F.Seber,Alan J.Lee].pdf. Nicko V. ... Applied Regression Analysis: A Research Tool, Second Edition. David Adeabah. Download ...

  20. PDF Linear Regression Comprehensive in Machine Learning: a Survey

    regression. Linear regression is used to find a linear relationship between one or more predictors. The linear regression has two types: simple regression and multiple regression (MLR). This paper discusses various works by different researchers on linear regression and polynomial regression and compares their performance using

  21. [PDF] Deep linear networks for regression are implicitly regularized

    This paper shows an implicit regularization towards flat minima: the sharpness of the minimizer is no more than a constant times the lower bound, which depends on the condition number of the data covariance matrix, but not on width or depth. The largest eigenvalue of the Hessian, or sharpness, of neural networks is a key quantity to understand their optimization dynamics. In this paper, we ...

  22. PDF Prediction of Indian GDP using Multiple Linear Regression and ...

    A. Linear Regression Linear Regression is a supervised learning-based machine learning technique. Variable correlation and forecasting are two of its most common applications. There are a variety of different regression models, each of which has a different number of independent variables and a different correlation between the dependent and ...

  23. Improving Stock Price Prediction using Linear Regression and Long Short

    Accurately predicting stock prices is a challenging task due to the volatility and nonlinearity of financial markets. This study presents research on improving stock price prediction by employing a combined approach of Linear Regression (LR) and Long Short-Term Memory (LSTM) models, referred to as LR-LSTM with comparisons with other effective models.

  24. (PDF) Machine Learning -Regression

    Regression methods are then discussed with fair length focusing on linear regression. We conclude the research with an application of a real-life regression problem. Example of association learning

  25. Water quality evaluation based on water quality index and multiple

    Based on the monitoring data from January to December 2019, the WQI comprehensive evaluation method was used to conduct multiple linear stepwise regression analysis, extract key indicators, and establish the WQI min model. The results show that according to the WQI comprehensive evaluation method, the WQI values of Hanyuan Lake are all above 90 ...

  26. Papers with Code

    This paper introduces the Bi-linear consensus Alternating Direction Method of Multipliers (Bi-cADMM), aimed at solving large-scale regularized Sparse Machine Learning (SML) problems defined over a network of computational nodes. ... such as sparse linear and logistic regression, sparse softmax regression, and sparse support vector machines ...

  27. (PDF) A Study on Multiple Linear Regression Analysis

    In this study, we used multiple linear regression (Uyanık and Guler, 2013; Yin et al., 2019) to investigate the associations between TCIi and 13 independent variables. The model is defined as ...

  28. A Registration Method of Overlap Aware Point Clouds Based on ...

    Transformer has recently become widely adopted in point cloud registration. Nevertheless, Transformer is unsuitable for handling dense point clouds due to resource constraints and the sheer volume of data. We propose a method for directly regressing the rigid relative transformation of dense point cloud pairs. Specifically, we divide the dense point clouds into blocks according to the down ...

  29. (PDF) Multiple Regression: Methodology and Applications

    This is paper presented a multiple linear regression model and logistic regression model, according to assumptions of both models. The paper depended on logistic regression model because the ...