# Multiple Linear Regression

Introduction

The unprecedented growth of organic food industry in the last two decades cannot be overstated. For instance, the industry expanded to \$81.6 billion in 2015 from \$17.9 billion in 2000 (Önel et al., 2019). In 2018, the global organic food sales amounted \$95 billion, with the United States accounting for the largest shares of retail sales (Wunsch, 2020). The US organic food sales were \$50.1 billion in 2020, representing a 4.6 percent growth from \$47.9 billion in 2019 (Gelski, 2019; Organic Trade Association, 2021). The growth is attributed to an increase in consumer demand for organic food. Consumer attraction to organic food is attached to their increasing health and environmental consciousness (Konuk, 2017). For these reasons, organic foods are no longer niche products but are available in multiple outlets convenient to different types of consumers. This implies that consumer demand is heterogeneous and depends on multiple factors, including demographic characteristics, perceived benefits, and price. Thus, interrogation of how these factors drive consumer demand is not only crucial to enabling food companies to develop marketing strategies, but also to predicting industry growth trends.

Regression Analysis

Regression analysis is a statistical technique that is applied to estimate association between two variables. One of the variables is the explained variable also as referred to as predicted, response or dependent variable (Wooldridge, 2013). The second variable is referred to as the predictor variable, also known as independent, explanatory, or regressor variable.

Relationships between variables can either be linear or non-linear which also determines the type of regression analysis that can be performed. Linear regression is the standard form of regression analysis used in estimating the association between scalar response and predictor variables. The basic form of linear regression is simple linear regression which is a two-variable linear regression model because it relates two variable; Y, the dependent variable, and X, the independent variable. An extension of the simple linear regression is multiple linear regression (MLR) that allows many factors to influence Y. The models are estimated using ordinary least squares (OLS) and, therefore, are based on several assumptions. Some of the assumptions include zero conditional mean, linear parameters, random sampling, and homoscedasticity (Wooldridge, 2013). These assumptions are testable in statistical analysis.

The present paper is interested in determining the influence of demographic factors on annual amount spent by consumer on organic food. It is expected that the amount spent is influenced by age, annual income, number of people in household, and gender of the consumer. In this case, more than one factor affects amount spent on organic food. Thus, MLR is the appropriate regression model.

The Regression Output Generated in Excel

The MS Excel package was used to perform MLR. The regression output is presented below.

 Coefficients Standard Error t Stat P-value Intercept -1932.11 978.54 -1.97 0.051 Age 14.12 11.78 1.20 0.233 Annual Income 0.02 0.00 6.33 0.000 X Variable 3 2222.51 153.25 14.50 0.000 X Variable 4 40.50 384.71 0.11 0.916

F (4, 119) = 66.11, p-value = 0.000; R-Squared = 0.6897

Interpretation of the Coefficient of Determination

The coefficient of determination, commonly called R-squared, is a goodness-of-fit measure of OLS regression. It summarizes how well the MLR fits the data. In other words, it measures how well the independent variables (X) explain the response variable (Y). Wooldridge (2013) defines R-squared as a measure of variation in the dependent variable that is explained by predictor variables. For the above regression output, age, annual income, number of people in the household, and gender explain about 69% of the variation in the annual amount of spent on organic food. That means that 31% of annual amount spent of organic food variations is left unexplained; that is, included in the error. However, the R-squared value is above 50%, meaning that the regression model has adequate explanatory power. Thus, MLR fits the data well.

Interpretation of the F-test

The F-test with 4 enumerator and 119 denominator degrees of freedom is 66.11. The associated p-value (p=0.000) is significant at 5% level. Therefore, the variables included in the model are significantly different from zero. Thus, MLR is appropriate in fitting the relationship between the annual amount spent on organic food and the demographic characteristics.

Interpretation of Slope and Significance of Coefficients

The advantage of MLR over simple linear regression is that it is amenable to ceteris paribus. The coefficient of age (b1=14.12), is positive but not statistically significant (p=0.233), suggesting that it had no influence of annual amount spent of organic food. The coefficient of annual income (b2=0.01) is positive and significant at 5% level. Holding other factors constant, an increase in annual income by \$1 increased spending on organic food by \$0.02. Additionally, the coefficient of number of people in the household (b3= 2222.51) was also positive. Every additional household member increased amount spent on organic food by \$2223 holding other factors constant. Like age, coefficient of gender (b4=40.50) was not statistically significant, meaning that being either male or female did not influence amount spent on organic food. Thus, while age and gender had no influence, annual income and number of people in household positively and significantly affected annual amount spent on organic food.

Substituted Regression Equation

The Excel regression output substitute in the regression model is written as follow:

Y = -1932.11 + 14.12*Age + 0.02*Annual income + 2222.51*No. of people in household + 40.50*gender

Annual Amount Spent on Organic Food

Amount = -1932.11 + 14.12*48.23 + 0.02*161006.6 + 2222.51*4.31+ 40.50*0.57

= 11571.133

The annual amount spent on organic food was \$11571.

Conclusion

The MLR coefficient of Age variable (14.12) is smaller than that of bivariate regression (26.29). The reduction is effect size changed the significant of age. That is, the age coefficient in MLR is not statistically significant as it were in simple linear regression. The dissimilarity in effect sizes and significance could be attributed to correlation with added variable(s) or with amount spent on organic food.