How to interpret interaction dummies of multiple categories and main effect

I have a panel data crosscountry regression with following structure ($y$ as a drug addiction rate of the country, $x$ as number of homeless of the country and $m$ as HIV infection rate of the country) and I categorize my countries in four world regions which I code as Dummys $D_1$, $D_2$, $D_3$ and the fourth region as reference category:

$y = b_1x + b_2m + b_3D_1m + b_4D_2m + b_5D_3m$ (1)

When I change my base category every coefficient and significance value except $b_1$ changes.

When I change my regression to:

$y = b_1x + b_3D_1m + b_4D_2m + b_5D_3m + b_6D_4m$ (2)

the coefficients in (2) are the same as $b_2$ in regression (1) with the same significance values depending on the reference category

Now I don't understand what I am seeing. the maineffect coefficient $b_2$ is the effect of the reference category and not the mean of the HIV infection rate effect? What does my main effect coefficient $b_2$ say? In regression (1) why does my significance values $b_3$, $b_4$, and $b_5$ change if I change my reference category and what does the significance of $b_3$, $b_4$, and $b_5$ mean regarding my main effect $b_2$? I am completely confused right now.

Best regards,
Rub_n

edited 5 hours ago

StatsStudent

4,45732041

asked 5 hours ago

Rub_n

New contributor

How are you modelling the error terms? What kind of model is this? Ordinary least squares? Logistic regression?
– StatsStudent
5 hours ago

Do you really have crosscountry data or is this supposed to be cross-sectional?
– StatsStudent
5 hours ago

I use an OLS regression with group and time fixed effects. Yes I have crosscountry data.
– Rub_n
4 hours ago

add a comment |

$y = b_1x + b_2m + b_3D_1m + b_4D_2m + b_5D_3m$ (1)

When I change my base category every coefficient and significance value except $b_1$ changes.

When I change my regression to:

$y = b_1x + b_3D_1m + b_4D_2m + b_5D_3m + b_6D_4m$ (2)

the coefficients in (2) are the same as $b_2$ in regression (1) with the same significance values depending on the reference category

Best regards,
Rub_n

edited 5 hours ago

StatsStudent

4,45732041

asked 5 hours ago

Rub_n

New contributor

How are you modelling the error terms? What kind of model is this? Ordinary least squares? Logistic regression?
– StatsStudent
5 hours ago

Do you really have crosscountry data or is this supposed to be cross-sectional?
– StatsStudent
5 hours ago

I use an OLS regression with group and time fixed effects. Yes I have crosscountry data.
– Rub_n
4 hours ago

add a comment |

$y = b_1x + b_2m + b_3D_1m + b_4D_2m + b_5D_3m$ (1)

When I change my base category every coefficient and significance value except $b_1$ changes.

When I change my regression to:

$y = b_1x + b_3D_1m + b_4D_2m + b_5D_3m + b_6D_4m$ (2)

the coefficients in (2) are the same as $b_2$ in regression (1) with the same significance values depending on the reference category

Best regards,
Rub_n

edited 5 hours ago

StatsStudent

4,45732041

asked 5 hours ago

Rub_n

New contributor

$y = b_1x + b_2m + b_3D_1m + b_4D_2m + b_5D_3m$ (1)

When I change my base category every coefficient and significance value except $b_1$ changes.

When I change my regression to:

$y = b_1x + b_3D_1m + b_4D_2m + b_5D_3m + b_6D_4m$ (2)

the coefficients in (2) are the same as $b_2$ in regression (1) with the same significance values depending on the reference category

Best regards,
Rub_n

regression mean interpretation categorical-encoding

edited 5 hours ago

StatsStudent

4,45732041

asked 5 hours ago

Rub_n

New contributor

edited 5 hours ago

StatsStudent

4,45732041

asked 5 hours ago

Rub_n

New contributor

edited 5 hours ago

StatsStudent

4,45732041

edited 5 hours ago

StatsStudent

4,45732041

edited 5 hours ago

StatsStudent

4,45732041

asked 5 hours ago

Rub_n

New contributor

asked 5 hours ago

Rub_n

asked 5 hours ago

Rub_n

New contributor

Rub_n is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

How are you modelling the error terms? What kind of model is this? Ordinary least squares? Logistic regression?
– StatsStudent
5 hours ago

Do you really have crosscountry data or is this supposed to be cross-sectional?
– StatsStudent
5 hours ago

I use an OLS regression with group and time fixed effects. Yes I have crosscountry data.
– Rub_n
4 hours ago

add a comment |

How are you modelling the error terms? What kind of model is this? Ordinary least squares? Logistic regression?
– StatsStudent
5 hours ago

Do you really have crosscountry data or is this supposed to be cross-sectional?
– StatsStudent
5 hours ago

I use an OLS regression with group and time fixed effects. Yes I have crosscountry data.
– Rub_n
4 hours ago

How are you modelling the error terms? What kind of model is this? Ordinary least squares? Logistic regression?
– StatsStudent
5 hours ago

Do you really have crosscountry data or is this supposed to be cross-sectional?
– StatsStudent
5 hours ago

I use an OLS regression with group and time fixed effects. Yes I have crosscountry data.
– Rub_n
4 hours ago

add a comment |

1 Answer
1

active

oldest

votes

Consider a model with only 3 regions and hence two dummies $D_1$ and $D_2$. Assume the data is crosscountry so $i=1,...,n$ are countries. Let the model equation be

$$y_{it} = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it} + epsilon_{it}$$

implying that the conditional expected rate of drug addiction is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it}$$

hence the model allows for different regions to have different marginal effects of HIV infection rate $m$ on drug addiction rate $y$ - so their drug addiction rate responds differently to change HIV infection rate compared to the reference region.

For the reference region $D_1=D_2=0$ the conditional effect reduces to

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it}$$

differentiating with respect to $m_{it}$ to get

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2$$

which is the marginal effect of HIV infection rate $m$ on drug addiction rate $y$ for contries in the reference region. An increase of one unit in HIV infection rate in a country $i$ from the reference region result in a change of $b_2$ units in the drug addiction rate of country $i$.

For countries from the region defined by $D_1=1$ and $D_2=0$ the conditional expectation is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3m_{it} $$

and the marginal effect

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2 + b_3$$

hence $b_3$ is the difference in the marginal effect of HIV infection rate $m$ on drug addiction rate $y$ for contries in region $D_1=1$ compared to the reference region, for which the marginal effect was simply $b_2$. Hence if $b_3$ is positive then it appears that countries from region $D_1=1$ reacts stronger changes in the HIV infection rate with respect to the drug addiction rate.

So $b_2$ measures the increase in drug addiction rate as a result of a 1 unit increase in the HIV infection rate $m$ for the countries in the reference region. An the values of $b_3$ changes when you change the reference because it is the difference the marginal effect between some region - here $D_1=1$ and the reference - and offcourse the difference depend on what the region is compared to. The significance of $b_3$ means that you can reject the null hypothesis that countries from region $D_1=1$ have the same marginal effect as countries from the reference region.

In the second model there is no reference category so now the coefficients $b_3,b_4,b_5$ and $b_6$ are region specific marginal effects (not differences in the marginal effect). The purpose of this model is that it will allow you to test for the significant marginal effect of HIV infection rate on drug addiction rate for each region simply by testing the significance of the coefficients. To test for differences between regions in this model you have to test differences in coefficients for example $H0: b_3 = b_4$, which can easily be performed as a Wald test for example. However in model (1) this comparison between regions in the responsiveness of drug addcition rate to HIV infection rate was performed simply by testing the significance of a coefficient.

edited 4 hours ago

answered 4 hours ago

Jesper Hybel

45829

Oh man, that really helps! So the significance of b2 is the significance of the effect of region 3 on my drug addiction rate?
– Rub_n
4 hours ago

b_2 measure the effect on drug addiction rate of a 1 unit increase in the HIV infection rate for countries belonging to the reference region. It's significance means it is significantly different from 0 therefore you can reject the null hypothesis that HIV infection rate do not affect drug addiction rate in countries from this region (I dont know what you define as region 3??)
– Jesper Hybel
4 hours ago

perfect thank you so much, so do i have a benefit of using regression (2) instead of doing four different regressions for each region except having a bigger sample size for the effect of x?
– Rub_n
4 hours ago

See edit of my repsonse last two paragraphs.
– Jesper Hybel
4 hours ago

pls. accept and upvote if you think the answer was helpful :)
– Jesper Hybel
4 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Rub_n is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f383994%2fhow-to-interpret-interaction-dummies-of-multiple-categories-and-main-effect%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Consider a model with only 3 regions and hence two dummies $D_1$ and $D_2$. Assume the data is crosscountry so $i=1,...,n$ are countries. Let the model equation be

$$y_{it} = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it} + epsilon_{it}$$

implying that the conditional expected rate of drug addiction is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it}$$

For the reference region $D_1=D_2=0$ the conditional effect reduces to

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it}$$

differentiating with respect to $m_{it}$ to get

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2$$

For countries from the region defined by $D_1=1$ and $D_2=0$ the conditional expectation is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3m_{it} $$

and the marginal effect

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2 + b_3$$

edited 4 hours ago

answered 4 hours ago

Jesper Hybel

45829

Oh man, that really helps! So the significance of b2 is the significance of the effect of region 3 on my drug addiction rate?
– Rub_n
4 hours ago

b_2 measure the effect on drug addiction rate of a 1 unit increase in the HIV infection rate for countries belonging to the reference region. It's significance means it is significantly different from 0 therefore you can reject the null hypothesis that HIV infection rate do not affect drug addiction rate in countries from this region (I dont know what you define as region 3??)
– Jesper Hybel
4 hours ago

perfect thank you so much, so do i have a benefit of using regression (2) instead of doing four different regressions for each region except having a bigger sample size for the effect of x?
– Rub_n
4 hours ago

See edit of my repsonse last two paragraphs.
– Jesper Hybel
4 hours ago

pls. accept and upvote if you think the answer was helpful :)
– Jesper Hybel
4 hours ago

add a comment |

Consider a model with only 3 regions and hence two dummies $D_1$ and $D_2$. Assume the data is crosscountry so $i=1,...,n$ are countries. Let the model equation be

$$y_{it} = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it} + epsilon_{it}$$

implying that the conditional expected rate of drug addiction is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it}$$

For the reference region $D_1=D_2=0$ the conditional effect reduces to

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it}$$

differentiating with respect to $m_{it}$ to get

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2$$

For countries from the region defined by $D_1=1$ and $D_2=0$ the conditional expectation is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3m_{it} $$

and the marginal effect

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2 + b_3$$

edited 4 hours ago

answered 4 hours ago

Jesper Hybel

45829

Oh man, that really helps! So the significance of b2 is the significance of the effect of region 3 on my drug addiction rate?
– Rub_n
4 hours ago

b_2 measure the effect on drug addiction rate of a 1 unit increase in the HIV infection rate for countries belonging to the reference region. It's significance means it is significantly different from 0 therefore you can reject the null hypothesis that HIV infection rate do not affect drug addiction rate in countries from this region (I dont know what you define as region 3??)
– Jesper Hybel
4 hours ago

perfect thank you so much, so do i have a benefit of using regression (2) instead of doing four different regressions for each region except having a bigger sample size for the effect of x?
– Rub_n
4 hours ago

See edit of my repsonse last two paragraphs.
– Jesper Hybel
4 hours ago

pls. accept and upvote if you think the answer was helpful :)
– Jesper Hybel
4 hours ago

add a comment |

Consider a model with only 3 regions and hence two dummies $D_1$ and $D_2$. Assume the data is crosscountry so $i=1,...,n$ are countries. Let the model equation be

$$y_{it} = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it} + epsilon_{it}$$

implying that the conditional expected rate of drug addiction is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it}$$

For the reference region $D_1=D_2=0$ the conditional effect reduces to

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it}$$

differentiating with respect to $m_{it}$ to get

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2$$

For countries from the region defined by $D_1=1$ and $D_2=0$ the conditional expectation is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3m_{it} $$

and the marginal effect

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2 + b_3$$

edited 4 hours ago

answered 4 hours ago

Jesper Hybel

45829

Consider a model with only 3 regions and hence two dummies $D_1$ and $D_2$. Assume the data is crosscountry so $i=1,...,n$ are countries. Let the model equation be

$$y_{it} = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it} + epsilon_{it}$$

implying that the conditional expected rate of drug addiction is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it}$$

For the reference region $D_1=D_2=0$ the conditional effect reduces to

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it}$$

differentiating with respect to $m_{it}$ to get

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2$$

For countries from the region defined by $D_1=1$ and $D_2=0$ the conditional expectation is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3m_{it} $$

and the marginal effect

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2 + b_3$$

edited 4 hours ago

answered 4 hours ago

Jesper Hybel

45829

edited 4 hours ago

answered 4 hours ago

Jesper Hybel

45829

answered 4 hours ago

Jesper Hybel

45829

answered 4 hours ago

Jesper Hybel

45829

Oh man, that really helps! So the significance of b2 is the significance of the effect of region 3 on my drug addiction rate?
– Rub_n
4 hours ago

b_2 measure the effect on drug addiction rate of a 1 unit increase in the HIV infection rate for countries belonging to the reference region. It's significance means it is significantly different from 0 therefore you can reject the null hypothesis that HIV infection rate do not affect drug addiction rate in countries from this region (I dont know what you define as region 3??)
– Jesper Hybel
4 hours ago

perfect thank you so much, so do i have a benefit of using regression (2) instead of doing four different regressions for each region except having a bigger sample size for the effect of x?
– Rub_n
4 hours ago

See edit of my repsonse last two paragraphs.
– Jesper Hybel
4 hours ago

pls. accept and upvote if you think the answer was helpful :)
– Jesper Hybel
4 hours ago

add a comment |

Oh man, that really helps! So the significance of b2 is the significance of the effect of region 3 on my drug addiction rate?
– Rub_n
4 hours ago

b_2 measure the effect on drug addiction rate of a 1 unit increase in the HIV infection rate for countries belonging to the reference region. It's significance means it is significantly different from 0 therefore you can reject the null hypothesis that HIV infection rate do not affect drug addiction rate in countries from this region (I dont know what you define as region 3??)
– Jesper Hybel
4 hours ago

perfect thank you so much, so do i have a benefit of using regression (2) instead of doing four different regressions for each region except having a bigger sample size for the effect of x?
– Rub_n
4 hours ago

See edit of my repsonse last two paragraphs.
– Jesper Hybel
4 hours ago

pls. accept and upvote if you think the answer was helpful :)
– Jesper Hybel
4 hours ago

Oh man, that really helps! So the significance of b2 is the significance of the effect of region 3 on my drug addiction rate?
– Rub_n
4 hours ago

b_2 measure the effect on drug addiction rate of a 1 unit increase in the HIV infection rate for countries belonging to the reference region. It's significance means it is significantly different from 0 therefore you can reject the null hypothesis that HIV infection rate do not affect drug addiction rate in countries from this region (I dont know what you define as region 3??)
– Jesper Hybel
4 hours ago

perfect thank you so much, so do i have a benefit of using regression (2) instead of doing four different regressions for each region except having a bigger sample size for the effect of x?
– Rub_n
4 hours ago

See edit of my repsonse last two paragraphs.
– Jesper Hybel
4 hours ago

pls. accept and upvote if you think the answer was helpful :)
– Jesper Hybel
4 hours ago

add a comment |

Rub_n is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Rub_n is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk