How is causation defined mathematically?











up vote
9
down vote

favorite
7












What is the mathematical definition of a causal relationship between two random variables?



Given a sample from the joint distribution of two random variables $X$ and $Y$, when would we say $X$ causes $Y$?



For context, I am reading this paper about causal discovery.










share|cite|improve this question









New contributor




Jane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 2




    As far as I can see causality is a scientific not mathematical concept. Can you edit to clarify?
    – mdewey
    15 hours ago






  • 2




    @mdewey I disagree. Causality can be cashed out in entirely formal terms. See e.g. my answer.
    – Kodiologist
    15 hours ago

















up vote
9
down vote

favorite
7












What is the mathematical definition of a causal relationship between two random variables?



Given a sample from the joint distribution of two random variables $X$ and $Y$, when would we say $X$ causes $Y$?



For context, I am reading this paper about causal discovery.










share|cite|improve this question









New contributor




Jane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 2




    As far as I can see causality is a scientific not mathematical concept. Can you edit to clarify?
    – mdewey
    15 hours ago






  • 2




    @mdewey I disagree. Causality can be cashed out in entirely formal terms. See e.g. my answer.
    – Kodiologist
    15 hours ago















up vote
9
down vote

favorite
7









up vote
9
down vote

favorite
7






7





What is the mathematical definition of a causal relationship between two random variables?



Given a sample from the joint distribution of two random variables $X$ and $Y$, when would we say $X$ causes $Y$?



For context, I am reading this paper about causal discovery.










share|cite|improve this question









New contributor




Jane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











What is the mathematical definition of a causal relationship between two random variables?



Given a sample from the joint distribution of two random variables $X$ and $Y$, when would we say $X$ causes $Y$?



For context, I am reading this paper about causal discovery.







machine-learning causality






share|cite|improve this question









New contributor




Jane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|cite|improve this question









New contributor




Jane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|cite|improve this question




share|cite|improve this question








edited 8 hours ago





















New contributor




Jane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 16 hours ago









Jane

514




514




New contributor




Jane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Jane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Jane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 2




    As far as I can see causality is a scientific not mathematical concept. Can you edit to clarify?
    – mdewey
    15 hours ago






  • 2




    @mdewey I disagree. Causality can be cashed out in entirely formal terms. See e.g. my answer.
    – Kodiologist
    15 hours ago
















  • 2




    As far as I can see causality is a scientific not mathematical concept. Can you edit to clarify?
    – mdewey
    15 hours ago






  • 2




    @mdewey I disagree. Causality can be cashed out in entirely formal terms. See e.g. my answer.
    – Kodiologist
    15 hours ago










2




2




As far as I can see causality is a scientific not mathematical concept. Can you edit to clarify?
– mdewey
15 hours ago




As far as I can see causality is a scientific not mathematical concept. Can you edit to clarify?
– mdewey
15 hours ago




2




2




@mdewey I disagree. Causality can be cashed out in entirely formal terms. See e.g. my answer.
– Kodiologist
15 hours ago






@mdewey I disagree. Causality can be cashed out in entirely formal terms. See e.g. my answer.
– Kodiologist
15 hours ago












2 Answers
2






active

oldest

votes

















up vote
5
down vote



accepted











What is the mathematical definition of a causal relationship between
two random variables?




Mathematically, a causal model consists of functional relationships between variables. For instance, consider the system of structural equations below:



$$
x = f_x(epsilon_{x})\
y = f_y(x, epsilon_{y})
$$



This means that $x$ functionally determines the value of $y$ (if you intervene on $x$ this changes the values of $y$) but not the other way around. Graphically, this is usually represented by $x rightarrow y$, which means that $x$ enters the structural equation of y. As an addendum, you can also express a causal model in terms of joint distributions of counterfactual variables, which is mathematically equivalent to functional models.




Given a sample from the joint distribution of two random variables X
and Y, when would we say X causes Y?




Sometimes (or most of the times) you do not have knowledge about the shape of the structural equations $f_{x}$, $f_y$, nor even whether $xrightarrow y$ or $y rightarrow x$. The only information you have is the joint probability distribution $p(y,x)$ (or samples from this distribution).



This leads to your question: when can I recover the direction of causality just from the data? Or, more precisely, when can I recover whether $x$ enters the structural equation of $y$ or vice-versa, just from the data?



Of course, without any fundamentally untestable assumptions about the causal model, this is impossible. The problem is that several different causal models can entail the same joint probability distribution of observed variables. The most common example is a causal linear system with gaussian noise.



But under some causal assumptions, this might be possible---and this is what the causal discovery literature works on. If you have no prior exposure to this topic, you might want to start from Elements of Causal Inference by Peters, Janzing and Scholkopf, as well as chapter 2 from Causality by Judea Pearl. We have a topic here on CV for references on causal discovery, but we don't have that many references listed there yet.



Therefore, there isn't just one answer to your question, since it depends on the assumptions one makes. The paper you mention cites some examples, such as assuming a linear model with non-gaussian noise. This case is known as LINGAN (short for linear non-gaussian acyclic model), here is an example in R:



library(pcalg)
set.seed(1234)
n <- 500
eps1 <- sign(rnorm(n)) * sqrt(abs(rnorm(n)))
eps2 <- runif(n) - 0.5
x2 <- 3 + eps2
x1 <- 0.9*x2 + 7 + eps1

# runs lingam
X <- cbind(x1, x2)
res <- lingam(X)
as(res, "amat")

# Adjacency Matrix 'amat' (2 x 2) of type ‘pag’:
# [,1] [,2]
# [1,] . .
# [2,] TRUE .


Notice here we have a linear causal model with non-gaussian noise where $x_2$ causes $x_1$ and lingam correctly recovers the causal direction. However, notice this depends critically on the LINGAM assumptions.



For the case of the paper you cite, they make this specific assumption (see their "postulate"):



If $xrightarrow y$ , the minimal description length of the mechanism mapping X to Y is independent of the value of X, whereas the minimal description length of the mechanism mapping Y to X is dependent on the value of Y.



Note this is an assumption. This is what we would call their "identification condition". Essentially, the postulate imposes restrictions on the joint distribution $p(x,y)$. That is, the postulate says that if $x rightarrow y$ certain restrictions holds in the data, and if $y rightarrow x$ other restrictions hold. These types of restrictions that have testable implications (impose constraints on $p(y,x)$) is what allows one to recover directionally from observational data.



As a final remark, causal discovery results are still very limited, and depend on strong assumptions, be careful when applying these on real world context.






share|cite|improve this answer



















  • 1




    Is there a chance you augment your answer to somehow include some simple examples with fake data please? For example, having read a bit of Elements of Causal Inference and viewed some of Peters' lectures, and a regression framework is commonly used to motivate the need for understanding the problem in detail (I am not even touching on their ICP work). I have the (maybe mistaken) impression that in your effort to move away from the RCM, your answers leave out all the actual tangible modelling machinery.
    – usεr11852
    7 hours ago










  • @usεr11852 I'm not sure I understand the context of your questions, do you want examples of causal discovery? There are several examples in the very paper Jane has provided. Also, I'm not sure I understand what you mean by "avoiding RCM and leaving out actual tangible modeling machinery", what tangible machinery are we missing in the causal discovery context here?
    – Carlos Cinelli
    7 hours ago








  • 1




    Apologies for the confusion, I do not care about examples from papers. I can cite other papers myself. (For example, Lopez-Paz et al. CVPR 2017 about their neural causation coefficient) What I care is for a simple numerical example with fake data that someone run in R (or your favourite language) and see what you mean. If you cite for example Peters' et al. book and they have small code snippets that hugely helpful (and occasionally use just lm) . We cannot all work around the Tuebingen datasets observational samples to get an idea of causal discovery! :)
    – usεr11852
    7 hours ago










  • @usεr11852 sure, including a fake example is trivial, I can include one using lingam in R. But would you care to explain what you meant by "avoiding RCM and leaving out actual tangible modeling machinery"?
    – Carlos Cinelli
    6 hours ago






  • 2




    @usεr11852 ok thanks for the feedback, I will try to include more code when appropriate. As a final remark, causal discovery results are still very limited, so people need to be very careful when applying these depending on context.
    – Carlos Cinelli
    6 hours ago


















up vote
4
down vote













There are a variety of approaches to formalizing causality (which is in keeping with substantial philosophical disagreement about causality that has been around for centuries). A popular one is in terms of potential outcomes. The potential-outcomes approach, called the Rubin causal model, supposes that for each causal state of affairs, there's a different random variable. So, $Y_1$ might be the distribution of possible outcomes from a clinical trial if a subject takes the study drug, and $Y_2$ might be the distribution if he takes the placebo. The causal effect is the difference between $Y_1$ and $Y_2$. If in fact $Y_1 = Y_2$, we could say that the treatment has no effect. Otherwise, we could say that the treatment condition causes the outcome.



Causal relationships between variables can also be represented with directional acylical graphs, which have a very different flavor but turn out to be mathematically equivalent to the Rubin model (Wasserman, 2004, section 17.8).



Wasserman, L. (2004). All of statistics: A concise course in statistical inference. New York, NY: Springer. ISBN 978-0-387-40272-7.






share|cite|improve this answer























  • thank you. what would be a test for it given a set of samples from joint distribution?
    – Jane
    15 hours ago






  • 3




    I am reading arxiv.org/abs/1804.04622. I haven't read its references. I am trying to understand what one means by causality based on observational data.
    – Jane
    14 hours ago








  • 1




    I'm sorry (-1), this is not what is being asked, you don't observe $Y_1$ nor $Y_2$, you observe a sample of factual variables $X$, $Y$. See the paper Jane has linked.
    – Carlos Cinelli
    9 hours ago






  • 2




    @Vimal:I understand the case where we have "interventional distributions". We don't have "interventional distributions" in this setting and that is what makes it harder to understand. In the motivating example in the paper they give something like $(x, y=x^3+epsilon)$. The conditional distribution of y given x is essentially the distribution of the noise $epsilon$ plus some translation, while that doesn't hold for the conditional distribution of x given y. I initiatively understand the example. I am trying to understand what is the general definition for observational discovery of causality.
    – Jane
    8 hours ago








  • 2




    @Jane for observational case (for your question), in general you cannot infer direction of causality purely mathematically, at least for the two variable case. For more variables, under additional (untestable) assumptions you could make a claim, but the conclusion can still be questioned. This discussion is very long in comments. :)
    – Vimal
    8 hours ago













Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






Jane is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f380962%2fhow-is-causation-defined-mathematically%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
5
down vote



accepted











What is the mathematical definition of a causal relationship between
two random variables?




Mathematically, a causal model consists of functional relationships between variables. For instance, consider the system of structural equations below:



$$
x = f_x(epsilon_{x})\
y = f_y(x, epsilon_{y})
$$



This means that $x$ functionally determines the value of $y$ (if you intervene on $x$ this changes the values of $y$) but not the other way around. Graphically, this is usually represented by $x rightarrow y$, which means that $x$ enters the structural equation of y. As an addendum, you can also express a causal model in terms of joint distributions of counterfactual variables, which is mathematically equivalent to functional models.




Given a sample from the joint distribution of two random variables X
and Y, when would we say X causes Y?




Sometimes (or most of the times) you do not have knowledge about the shape of the structural equations $f_{x}$, $f_y$, nor even whether $xrightarrow y$ or $y rightarrow x$. The only information you have is the joint probability distribution $p(y,x)$ (or samples from this distribution).



This leads to your question: when can I recover the direction of causality just from the data? Or, more precisely, when can I recover whether $x$ enters the structural equation of $y$ or vice-versa, just from the data?



Of course, without any fundamentally untestable assumptions about the causal model, this is impossible. The problem is that several different causal models can entail the same joint probability distribution of observed variables. The most common example is a causal linear system with gaussian noise.



But under some causal assumptions, this might be possible---and this is what the causal discovery literature works on. If you have no prior exposure to this topic, you might want to start from Elements of Causal Inference by Peters, Janzing and Scholkopf, as well as chapter 2 from Causality by Judea Pearl. We have a topic here on CV for references on causal discovery, but we don't have that many references listed there yet.



Therefore, there isn't just one answer to your question, since it depends on the assumptions one makes. The paper you mention cites some examples, such as assuming a linear model with non-gaussian noise. This case is known as LINGAN (short for linear non-gaussian acyclic model), here is an example in R:



library(pcalg)
set.seed(1234)
n <- 500
eps1 <- sign(rnorm(n)) * sqrt(abs(rnorm(n)))
eps2 <- runif(n) - 0.5
x2 <- 3 + eps2
x1 <- 0.9*x2 + 7 + eps1

# runs lingam
X <- cbind(x1, x2)
res <- lingam(X)
as(res, "amat")

# Adjacency Matrix 'amat' (2 x 2) of type ‘pag’:
# [,1] [,2]
# [1,] . .
# [2,] TRUE .


Notice here we have a linear causal model with non-gaussian noise where $x_2$ causes $x_1$ and lingam correctly recovers the causal direction. However, notice this depends critically on the LINGAM assumptions.



For the case of the paper you cite, they make this specific assumption (see their "postulate"):



If $xrightarrow y$ , the minimal description length of the mechanism mapping X to Y is independent of the value of X, whereas the minimal description length of the mechanism mapping Y to X is dependent on the value of Y.



Note this is an assumption. This is what we would call their "identification condition". Essentially, the postulate imposes restrictions on the joint distribution $p(x,y)$. That is, the postulate says that if $x rightarrow y$ certain restrictions holds in the data, and if $y rightarrow x$ other restrictions hold. These types of restrictions that have testable implications (impose constraints on $p(y,x)$) is what allows one to recover directionally from observational data.



As a final remark, causal discovery results are still very limited, and depend on strong assumptions, be careful when applying these on real world context.






share|cite|improve this answer



















  • 1




    Is there a chance you augment your answer to somehow include some simple examples with fake data please? For example, having read a bit of Elements of Causal Inference and viewed some of Peters' lectures, and a regression framework is commonly used to motivate the need for understanding the problem in detail (I am not even touching on their ICP work). I have the (maybe mistaken) impression that in your effort to move away from the RCM, your answers leave out all the actual tangible modelling machinery.
    – usεr11852
    7 hours ago










  • @usεr11852 I'm not sure I understand the context of your questions, do you want examples of causal discovery? There are several examples in the very paper Jane has provided. Also, I'm not sure I understand what you mean by "avoiding RCM and leaving out actual tangible modeling machinery", what tangible machinery are we missing in the causal discovery context here?
    – Carlos Cinelli
    7 hours ago








  • 1




    Apologies for the confusion, I do not care about examples from papers. I can cite other papers myself. (For example, Lopez-Paz et al. CVPR 2017 about their neural causation coefficient) What I care is for a simple numerical example with fake data that someone run in R (or your favourite language) and see what you mean. If you cite for example Peters' et al. book and they have small code snippets that hugely helpful (and occasionally use just lm) . We cannot all work around the Tuebingen datasets observational samples to get an idea of causal discovery! :)
    – usεr11852
    7 hours ago










  • @usεr11852 sure, including a fake example is trivial, I can include one using lingam in R. But would you care to explain what you meant by "avoiding RCM and leaving out actual tangible modeling machinery"?
    – Carlos Cinelli
    6 hours ago






  • 2




    @usεr11852 ok thanks for the feedback, I will try to include more code when appropriate. As a final remark, causal discovery results are still very limited, so people need to be very careful when applying these depending on context.
    – Carlos Cinelli
    6 hours ago















up vote
5
down vote



accepted











What is the mathematical definition of a causal relationship between
two random variables?




Mathematically, a causal model consists of functional relationships between variables. For instance, consider the system of structural equations below:



$$
x = f_x(epsilon_{x})\
y = f_y(x, epsilon_{y})
$$



This means that $x$ functionally determines the value of $y$ (if you intervene on $x$ this changes the values of $y$) but not the other way around. Graphically, this is usually represented by $x rightarrow y$, which means that $x$ enters the structural equation of y. As an addendum, you can also express a causal model in terms of joint distributions of counterfactual variables, which is mathematically equivalent to functional models.




Given a sample from the joint distribution of two random variables X
and Y, when would we say X causes Y?




Sometimes (or most of the times) you do not have knowledge about the shape of the structural equations $f_{x}$, $f_y$, nor even whether $xrightarrow y$ or $y rightarrow x$. The only information you have is the joint probability distribution $p(y,x)$ (or samples from this distribution).



This leads to your question: when can I recover the direction of causality just from the data? Or, more precisely, when can I recover whether $x$ enters the structural equation of $y$ or vice-versa, just from the data?



Of course, without any fundamentally untestable assumptions about the causal model, this is impossible. The problem is that several different causal models can entail the same joint probability distribution of observed variables. The most common example is a causal linear system with gaussian noise.



But under some causal assumptions, this might be possible---and this is what the causal discovery literature works on. If you have no prior exposure to this topic, you might want to start from Elements of Causal Inference by Peters, Janzing and Scholkopf, as well as chapter 2 from Causality by Judea Pearl. We have a topic here on CV for references on causal discovery, but we don't have that many references listed there yet.



Therefore, there isn't just one answer to your question, since it depends on the assumptions one makes. The paper you mention cites some examples, such as assuming a linear model with non-gaussian noise. This case is known as LINGAN (short for linear non-gaussian acyclic model), here is an example in R:



library(pcalg)
set.seed(1234)
n <- 500
eps1 <- sign(rnorm(n)) * sqrt(abs(rnorm(n)))
eps2 <- runif(n) - 0.5
x2 <- 3 + eps2
x1 <- 0.9*x2 + 7 + eps1

# runs lingam
X <- cbind(x1, x2)
res <- lingam(X)
as(res, "amat")

# Adjacency Matrix 'amat' (2 x 2) of type ‘pag’:
# [,1] [,2]
# [1,] . .
# [2,] TRUE .


Notice here we have a linear causal model with non-gaussian noise where $x_2$ causes $x_1$ and lingam correctly recovers the causal direction. However, notice this depends critically on the LINGAM assumptions.



For the case of the paper you cite, they make this specific assumption (see their "postulate"):



If $xrightarrow y$ , the minimal description length of the mechanism mapping X to Y is independent of the value of X, whereas the minimal description length of the mechanism mapping Y to X is dependent on the value of Y.



Note this is an assumption. This is what we would call their "identification condition". Essentially, the postulate imposes restrictions on the joint distribution $p(x,y)$. That is, the postulate says that if $x rightarrow y$ certain restrictions holds in the data, and if $y rightarrow x$ other restrictions hold. These types of restrictions that have testable implications (impose constraints on $p(y,x)$) is what allows one to recover directionally from observational data.



As a final remark, causal discovery results are still very limited, and depend on strong assumptions, be careful when applying these on real world context.






share|cite|improve this answer



















  • 1




    Is there a chance you augment your answer to somehow include some simple examples with fake data please? For example, having read a bit of Elements of Causal Inference and viewed some of Peters' lectures, and a regression framework is commonly used to motivate the need for understanding the problem in detail (I am not even touching on their ICP work). I have the (maybe mistaken) impression that in your effort to move away from the RCM, your answers leave out all the actual tangible modelling machinery.
    – usεr11852
    7 hours ago










  • @usεr11852 I'm not sure I understand the context of your questions, do you want examples of causal discovery? There are several examples in the very paper Jane has provided. Also, I'm not sure I understand what you mean by "avoiding RCM and leaving out actual tangible modeling machinery", what tangible machinery are we missing in the causal discovery context here?
    – Carlos Cinelli
    7 hours ago








  • 1




    Apologies for the confusion, I do not care about examples from papers. I can cite other papers myself. (For example, Lopez-Paz et al. CVPR 2017 about their neural causation coefficient) What I care is for a simple numerical example with fake data that someone run in R (or your favourite language) and see what you mean. If you cite for example Peters' et al. book and they have small code snippets that hugely helpful (and occasionally use just lm) . We cannot all work around the Tuebingen datasets observational samples to get an idea of causal discovery! :)
    – usεr11852
    7 hours ago










  • @usεr11852 sure, including a fake example is trivial, I can include one using lingam in R. But would you care to explain what you meant by "avoiding RCM and leaving out actual tangible modeling machinery"?
    – Carlos Cinelli
    6 hours ago






  • 2




    @usεr11852 ok thanks for the feedback, I will try to include more code when appropriate. As a final remark, causal discovery results are still very limited, so people need to be very careful when applying these depending on context.
    – Carlos Cinelli
    6 hours ago













up vote
5
down vote



accepted







up vote
5
down vote



accepted







What is the mathematical definition of a causal relationship between
two random variables?




Mathematically, a causal model consists of functional relationships between variables. For instance, consider the system of structural equations below:



$$
x = f_x(epsilon_{x})\
y = f_y(x, epsilon_{y})
$$



This means that $x$ functionally determines the value of $y$ (if you intervene on $x$ this changes the values of $y$) but not the other way around. Graphically, this is usually represented by $x rightarrow y$, which means that $x$ enters the structural equation of y. As an addendum, you can also express a causal model in terms of joint distributions of counterfactual variables, which is mathematically equivalent to functional models.




Given a sample from the joint distribution of two random variables X
and Y, when would we say X causes Y?




Sometimes (or most of the times) you do not have knowledge about the shape of the structural equations $f_{x}$, $f_y$, nor even whether $xrightarrow y$ or $y rightarrow x$. The only information you have is the joint probability distribution $p(y,x)$ (or samples from this distribution).



This leads to your question: when can I recover the direction of causality just from the data? Or, more precisely, when can I recover whether $x$ enters the structural equation of $y$ or vice-versa, just from the data?



Of course, without any fundamentally untestable assumptions about the causal model, this is impossible. The problem is that several different causal models can entail the same joint probability distribution of observed variables. The most common example is a causal linear system with gaussian noise.



But under some causal assumptions, this might be possible---and this is what the causal discovery literature works on. If you have no prior exposure to this topic, you might want to start from Elements of Causal Inference by Peters, Janzing and Scholkopf, as well as chapter 2 from Causality by Judea Pearl. We have a topic here on CV for references on causal discovery, but we don't have that many references listed there yet.



Therefore, there isn't just one answer to your question, since it depends on the assumptions one makes. The paper you mention cites some examples, such as assuming a linear model with non-gaussian noise. This case is known as LINGAN (short for linear non-gaussian acyclic model), here is an example in R:



library(pcalg)
set.seed(1234)
n <- 500
eps1 <- sign(rnorm(n)) * sqrt(abs(rnorm(n)))
eps2 <- runif(n) - 0.5
x2 <- 3 + eps2
x1 <- 0.9*x2 + 7 + eps1

# runs lingam
X <- cbind(x1, x2)
res <- lingam(X)
as(res, "amat")

# Adjacency Matrix 'amat' (2 x 2) of type ‘pag’:
# [,1] [,2]
# [1,] . .
# [2,] TRUE .


Notice here we have a linear causal model with non-gaussian noise where $x_2$ causes $x_1$ and lingam correctly recovers the causal direction. However, notice this depends critically on the LINGAM assumptions.



For the case of the paper you cite, they make this specific assumption (see their "postulate"):



If $xrightarrow y$ , the minimal description length of the mechanism mapping X to Y is independent of the value of X, whereas the minimal description length of the mechanism mapping Y to X is dependent on the value of Y.



Note this is an assumption. This is what we would call their "identification condition". Essentially, the postulate imposes restrictions on the joint distribution $p(x,y)$. That is, the postulate says that if $x rightarrow y$ certain restrictions holds in the data, and if $y rightarrow x$ other restrictions hold. These types of restrictions that have testable implications (impose constraints on $p(y,x)$) is what allows one to recover directionally from observational data.



As a final remark, causal discovery results are still very limited, and depend on strong assumptions, be careful when applying these on real world context.






share|cite|improve this answer















What is the mathematical definition of a causal relationship between
two random variables?




Mathematically, a causal model consists of functional relationships between variables. For instance, consider the system of structural equations below:



$$
x = f_x(epsilon_{x})\
y = f_y(x, epsilon_{y})
$$



This means that $x$ functionally determines the value of $y$ (if you intervene on $x$ this changes the values of $y$) but not the other way around. Graphically, this is usually represented by $x rightarrow y$, which means that $x$ enters the structural equation of y. As an addendum, you can also express a causal model in terms of joint distributions of counterfactual variables, which is mathematically equivalent to functional models.




Given a sample from the joint distribution of two random variables X
and Y, when would we say X causes Y?




Sometimes (or most of the times) you do not have knowledge about the shape of the structural equations $f_{x}$, $f_y$, nor even whether $xrightarrow y$ or $y rightarrow x$. The only information you have is the joint probability distribution $p(y,x)$ (or samples from this distribution).



This leads to your question: when can I recover the direction of causality just from the data? Or, more precisely, when can I recover whether $x$ enters the structural equation of $y$ or vice-versa, just from the data?



Of course, without any fundamentally untestable assumptions about the causal model, this is impossible. The problem is that several different causal models can entail the same joint probability distribution of observed variables. The most common example is a causal linear system with gaussian noise.



But under some causal assumptions, this might be possible---and this is what the causal discovery literature works on. If you have no prior exposure to this topic, you might want to start from Elements of Causal Inference by Peters, Janzing and Scholkopf, as well as chapter 2 from Causality by Judea Pearl. We have a topic here on CV for references on causal discovery, but we don't have that many references listed there yet.



Therefore, there isn't just one answer to your question, since it depends on the assumptions one makes. The paper you mention cites some examples, such as assuming a linear model with non-gaussian noise. This case is known as LINGAN (short for linear non-gaussian acyclic model), here is an example in R:



library(pcalg)
set.seed(1234)
n <- 500
eps1 <- sign(rnorm(n)) * sqrt(abs(rnorm(n)))
eps2 <- runif(n) - 0.5
x2 <- 3 + eps2
x1 <- 0.9*x2 + 7 + eps1

# runs lingam
X <- cbind(x1, x2)
res <- lingam(X)
as(res, "amat")

# Adjacency Matrix 'amat' (2 x 2) of type ‘pag’:
# [,1] [,2]
# [1,] . .
# [2,] TRUE .


Notice here we have a linear causal model with non-gaussian noise where $x_2$ causes $x_1$ and lingam correctly recovers the causal direction. However, notice this depends critically on the LINGAM assumptions.



For the case of the paper you cite, they make this specific assumption (see their "postulate"):



If $xrightarrow y$ , the minimal description length of the mechanism mapping X to Y is independent of the value of X, whereas the minimal description length of the mechanism mapping Y to X is dependent on the value of Y.



Note this is an assumption. This is what we would call their "identification condition". Essentially, the postulate imposes restrictions on the joint distribution $p(x,y)$. That is, the postulate says that if $x rightarrow y$ certain restrictions holds in the data, and if $y rightarrow x$ other restrictions hold. These types of restrictions that have testable implications (impose constraints on $p(y,x)$) is what allows one to recover directionally from observational data.



As a final remark, causal discovery results are still very limited, and depend on strong assumptions, be careful when applying these on real world context.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited 6 hours ago

























answered 8 hours ago









Carlos Cinelli

5,57442250




5,57442250








  • 1




    Is there a chance you augment your answer to somehow include some simple examples with fake data please? For example, having read a bit of Elements of Causal Inference and viewed some of Peters' lectures, and a regression framework is commonly used to motivate the need for understanding the problem in detail (I am not even touching on their ICP work). I have the (maybe mistaken) impression that in your effort to move away from the RCM, your answers leave out all the actual tangible modelling machinery.
    – usεr11852
    7 hours ago










  • @usεr11852 I'm not sure I understand the context of your questions, do you want examples of causal discovery? There are several examples in the very paper Jane has provided. Also, I'm not sure I understand what you mean by "avoiding RCM and leaving out actual tangible modeling machinery", what tangible machinery are we missing in the causal discovery context here?
    – Carlos Cinelli
    7 hours ago








  • 1




    Apologies for the confusion, I do not care about examples from papers. I can cite other papers myself. (For example, Lopez-Paz et al. CVPR 2017 about their neural causation coefficient) What I care is for a simple numerical example with fake data that someone run in R (or your favourite language) and see what you mean. If you cite for example Peters' et al. book and they have small code snippets that hugely helpful (and occasionally use just lm) . We cannot all work around the Tuebingen datasets observational samples to get an idea of causal discovery! :)
    – usεr11852
    7 hours ago










  • @usεr11852 sure, including a fake example is trivial, I can include one using lingam in R. But would you care to explain what you meant by "avoiding RCM and leaving out actual tangible modeling machinery"?
    – Carlos Cinelli
    6 hours ago






  • 2




    @usεr11852 ok thanks for the feedback, I will try to include more code when appropriate. As a final remark, causal discovery results are still very limited, so people need to be very careful when applying these depending on context.
    – Carlos Cinelli
    6 hours ago














  • 1




    Is there a chance you augment your answer to somehow include some simple examples with fake data please? For example, having read a bit of Elements of Causal Inference and viewed some of Peters' lectures, and a regression framework is commonly used to motivate the need for understanding the problem in detail (I am not even touching on their ICP work). I have the (maybe mistaken) impression that in your effort to move away from the RCM, your answers leave out all the actual tangible modelling machinery.
    – usεr11852
    7 hours ago










  • @usεr11852 I'm not sure I understand the context of your questions, do you want examples of causal discovery? There are several examples in the very paper Jane has provided. Also, I'm not sure I understand what you mean by "avoiding RCM and leaving out actual tangible modeling machinery", what tangible machinery are we missing in the causal discovery context here?
    – Carlos Cinelli
    7 hours ago








  • 1




    Apologies for the confusion, I do not care about examples from papers. I can cite other papers myself. (For example, Lopez-Paz et al. CVPR 2017 about their neural causation coefficient) What I care is for a simple numerical example with fake data that someone run in R (or your favourite language) and see what you mean. If you cite for example Peters' et al. book and they have small code snippets that hugely helpful (and occasionally use just lm) . We cannot all work around the Tuebingen datasets observational samples to get an idea of causal discovery! :)
    – usεr11852
    7 hours ago










  • @usεr11852 sure, including a fake example is trivial, I can include one using lingam in R. But would you care to explain what you meant by "avoiding RCM and leaving out actual tangible modeling machinery"?
    – Carlos Cinelli
    6 hours ago






  • 2




    @usεr11852 ok thanks for the feedback, I will try to include more code when appropriate. As a final remark, causal discovery results are still very limited, so people need to be very careful when applying these depending on context.
    – Carlos Cinelli
    6 hours ago








1




1




Is there a chance you augment your answer to somehow include some simple examples with fake data please? For example, having read a bit of Elements of Causal Inference and viewed some of Peters' lectures, and a regression framework is commonly used to motivate the need for understanding the problem in detail (I am not even touching on their ICP work). I have the (maybe mistaken) impression that in your effort to move away from the RCM, your answers leave out all the actual tangible modelling machinery.
– usεr11852
7 hours ago




Is there a chance you augment your answer to somehow include some simple examples with fake data please? For example, having read a bit of Elements of Causal Inference and viewed some of Peters' lectures, and a regression framework is commonly used to motivate the need for understanding the problem in detail (I am not even touching on their ICP work). I have the (maybe mistaken) impression that in your effort to move away from the RCM, your answers leave out all the actual tangible modelling machinery.
– usεr11852
7 hours ago












@usεr11852 I'm not sure I understand the context of your questions, do you want examples of causal discovery? There are several examples in the very paper Jane has provided. Also, I'm not sure I understand what you mean by "avoiding RCM and leaving out actual tangible modeling machinery", what tangible machinery are we missing in the causal discovery context here?
– Carlos Cinelli
7 hours ago






@usεr11852 I'm not sure I understand the context of your questions, do you want examples of causal discovery? There are several examples in the very paper Jane has provided. Also, I'm not sure I understand what you mean by "avoiding RCM and leaving out actual tangible modeling machinery", what tangible machinery are we missing in the causal discovery context here?
– Carlos Cinelli
7 hours ago






1




1




Apologies for the confusion, I do not care about examples from papers. I can cite other papers myself. (For example, Lopez-Paz et al. CVPR 2017 about their neural causation coefficient) What I care is for a simple numerical example with fake data that someone run in R (or your favourite language) and see what you mean. If you cite for example Peters' et al. book and they have small code snippets that hugely helpful (and occasionally use just lm) . We cannot all work around the Tuebingen datasets observational samples to get an idea of causal discovery! :)
– usεr11852
7 hours ago




Apologies for the confusion, I do not care about examples from papers. I can cite other papers myself. (For example, Lopez-Paz et al. CVPR 2017 about their neural causation coefficient) What I care is for a simple numerical example with fake data that someone run in R (or your favourite language) and see what you mean. If you cite for example Peters' et al. book and they have small code snippets that hugely helpful (and occasionally use just lm) . We cannot all work around the Tuebingen datasets observational samples to get an idea of causal discovery! :)
– usεr11852
7 hours ago












@usεr11852 sure, including a fake example is trivial, I can include one using lingam in R. But would you care to explain what you meant by "avoiding RCM and leaving out actual tangible modeling machinery"?
– Carlos Cinelli
6 hours ago




@usεr11852 sure, including a fake example is trivial, I can include one using lingam in R. But would you care to explain what you meant by "avoiding RCM and leaving out actual tangible modeling machinery"?
– Carlos Cinelli
6 hours ago




2




2




@usεr11852 ok thanks for the feedback, I will try to include more code when appropriate. As a final remark, causal discovery results are still very limited, so people need to be very careful when applying these depending on context.
– Carlos Cinelli
6 hours ago




@usεr11852 ok thanks for the feedback, I will try to include more code when appropriate. As a final remark, causal discovery results are still very limited, so people need to be very careful when applying these depending on context.
– Carlos Cinelli
6 hours ago












up vote
4
down vote













There are a variety of approaches to formalizing causality (which is in keeping with substantial philosophical disagreement about causality that has been around for centuries). A popular one is in terms of potential outcomes. The potential-outcomes approach, called the Rubin causal model, supposes that for each causal state of affairs, there's a different random variable. So, $Y_1$ might be the distribution of possible outcomes from a clinical trial if a subject takes the study drug, and $Y_2$ might be the distribution if he takes the placebo. The causal effect is the difference between $Y_1$ and $Y_2$. If in fact $Y_1 = Y_2$, we could say that the treatment has no effect. Otherwise, we could say that the treatment condition causes the outcome.



Causal relationships between variables can also be represented with directional acylical graphs, which have a very different flavor but turn out to be mathematically equivalent to the Rubin model (Wasserman, 2004, section 17.8).



Wasserman, L. (2004). All of statistics: A concise course in statistical inference. New York, NY: Springer. ISBN 978-0-387-40272-7.






share|cite|improve this answer























  • thank you. what would be a test for it given a set of samples from joint distribution?
    – Jane
    15 hours ago






  • 3




    I am reading arxiv.org/abs/1804.04622. I haven't read its references. I am trying to understand what one means by causality based on observational data.
    – Jane
    14 hours ago








  • 1




    I'm sorry (-1), this is not what is being asked, you don't observe $Y_1$ nor $Y_2$, you observe a sample of factual variables $X$, $Y$. See the paper Jane has linked.
    – Carlos Cinelli
    9 hours ago






  • 2




    @Vimal:I understand the case where we have "interventional distributions". We don't have "interventional distributions" in this setting and that is what makes it harder to understand. In the motivating example in the paper they give something like $(x, y=x^3+epsilon)$. The conditional distribution of y given x is essentially the distribution of the noise $epsilon$ plus some translation, while that doesn't hold for the conditional distribution of x given y. I initiatively understand the example. I am trying to understand what is the general definition for observational discovery of causality.
    – Jane
    8 hours ago








  • 2




    @Jane for observational case (for your question), in general you cannot infer direction of causality purely mathematically, at least for the two variable case. For more variables, under additional (untestable) assumptions you could make a claim, but the conclusion can still be questioned. This discussion is very long in comments. :)
    – Vimal
    8 hours ago

















up vote
4
down vote













There are a variety of approaches to formalizing causality (which is in keeping with substantial philosophical disagreement about causality that has been around for centuries). A popular one is in terms of potential outcomes. The potential-outcomes approach, called the Rubin causal model, supposes that for each causal state of affairs, there's a different random variable. So, $Y_1$ might be the distribution of possible outcomes from a clinical trial if a subject takes the study drug, and $Y_2$ might be the distribution if he takes the placebo. The causal effect is the difference between $Y_1$ and $Y_2$. If in fact $Y_1 = Y_2$, we could say that the treatment has no effect. Otherwise, we could say that the treatment condition causes the outcome.



Causal relationships between variables can also be represented with directional acylical graphs, which have a very different flavor but turn out to be mathematically equivalent to the Rubin model (Wasserman, 2004, section 17.8).



Wasserman, L. (2004). All of statistics: A concise course in statistical inference. New York, NY: Springer. ISBN 978-0-387-40272-7.






share|cite|improve this answer























  • thank you. what would be a test for it given a set of samples from joint distribution?
    – Jane
    15 hours ago






  • 3




    I am reading arxiv.org/abs/1804.04622. I haven't read its references. I am trying to understand what one means by causality based on observational data.
    – Jane
    14 hours ago








  • 1




    I'm sorry (-1), this is not what is being asked, you don't observe $Y_1$ nor $Y_2$, you observe a sample of factual variables $X$, $Y$. See the paper Jane has linked.
    – Carlos Cinelli
    9 hours ago






  • 2




    @Vimal:I understand the case where we have "interventional distributions". We don't have "interventional distributions" in this setting and that is what makes it harder to understand. In the motivating example in the paper they give something like $(x, y=x^3+epsilon)$. The conditional distribution of y given x is essentially the distribution of the noise $epsilon$ plus some translation, while that doesn't hold for the conditional distribution of x given y. I initiatively understand the example. I am trying to understand what is the general definition for observational discovery of causality.
    – Jane
    8 hours ago








  • 2




    @Jane for observational case (for your question), in general you cannot infer direction of causality purely mathematically, at least for the two variable case. For more variables, under additional (untestable) assumptions you could make a claim, but the conclusion can still be questioned. This discussion is very long in comments. :)
    – Vimal
    8 hours ago















up vote
4
down vote










up vote
4
down vote









There are a variety of approaches to formalizing causality (which is in keeping with substantial philosophical disagreement about causality that has been around for centuries). A popular one is in terms of potential outcomes. The potential-outcomes approach, called the Rubin causal model, supposes that for each causal state of affairs, there's a different random variable. So, $Y_1$ might be the distribution of possible outcomes from a clinical trial if a subject takes the study drug, and $Y_2$ might be the distribution if he takes the placebo. The causal effect is the difference between $Y_1$ and $Y_2$. If in fact $Y_1 = Y_2$, we could say that the treatment has no effect. Otherwise, we could say that the treatment condition causes the outcome.



Causal relationships between variables can also be represented with directional acylical graphs, which have a very different flavor but turn out to be mathematically equivalent to the Rubin model (Wasserman, 2004, section 17.8).



Wasserman, L. (2004). All of statistics: A concise course in statistical inference. New York, NY: Springer. ISBN 978-0-387-40272-7.






share|cite|improve this answer














There are a variety of approaches to formalizing causality (which is in keeping with substantial philosophical disagreement about causality that has been around for centuries). A popular one is in terms of potential outcomes. The potential-outcomes approach, called the Rubin causal model, supposes that for each causal state of affairs, there's a different random variable. So, $Y_1$ might be the distribution of possible outcomes from a clinical trial if a subject takes the study drug, and $Y_2$ might be the distribution if he takes the placebo. The causal effect is the difference between $Y_1$ and $Y_2$. If in fact $Y_1 = Y_2$, we could say that the treatment has no effect. Otherwise, we could say that the treatment condition causes the outcome.



Causal relationships between variables can also be represented with directional acylical graphs, which have a very different flavor but turn out to be mathematically equivalent to the Rubin model (Wasserman, 2004, section 17.8).



Wasserman, L. (2004). All of statistics: A concise course in statistical inference. New York, NY: Springer. ISBN 978-0-387-40272-7.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited 9 hours ago

























answered 15 hours ago









Kodiologist

16.3k22952




16.3k22952












  • thank you. what would be a test for it given a set of samples from joint distribution?
    – Jane
    15 hours ago






  • 3




    I am reading arxiv.org/abs/1804.04622. I haven't read its references. I am trying to understand what one means by causality based on observational data.
    – Jane
    14 hours ago








  • 1




    I'm sorry (-1), this is not what is being asked, you don't observe $Y_1$ nor $Y_2$, you observe a sample of factual variables $X$, $Y$. See the paper Jane has linked.
    – Carlos Cinelli
    9 hours ago






  • 2




    @Vimal:I understand the case where we have "interventional distributions". We don't have "interventional distributions" in this setting and that is what makes it harder to understand. In the motivating example in the paper they give something like $(x, y=x^3+epsilon)$. The conditional distribution of y given x is essentially the distribution of the noise $epsilon$ plus some translation, while that doesn't hold for the conditional distribution of x given y. I initiatively understand the example. I am trying to understand what is the general definition for observational discovery of causality.
    – Jane
    8 hours ago








  • 2




    @Jane for observational case (for your question), in general you cannot infer direction of causality purely mathematically, at least for the two variable case. For more variables, under additional (untestable) assumptions you could make a claim, but the conclusion can still be questioned. This discussion is very long in comments. :)
    – Vimal
    8 hours ago




















  • thank you. what would be a test for it given a set of samples from joint distribution?
    – Jane
    15 hours ago






  • 3




    I am reading arxiv.org/abs/1804.04622. I haven't read its references. I am trying to understand what one means by causality based on observational data.
    – Jane
    14 hours ago








  • 1




    I'm sorry (-1), this is not what is being asked, you don't observe $Y_1$ nor $Y_2$, you observe a sample of factual variables $X$, $Y$. See the paper Jane has linked.
    – Carlos Cinelli
    9 hours ago






  • 2




    @Vimal:I understand the case where we have "interventional distributions". We don't have "interventional distributions" in this setting and that is what makes it harder to understand. In the motivating example in the paper they give something like $(x, y=x^3+epsilon)$. The conditional distribution of y given x is essentially the distribution of the noise $epsilon$ plus some translation, while that doesn't hold for the conditional distribution of x given y. I initiatively understand the example. I am trying to understand what is the general definition for observational discovery of causality.
    – Jane
    8 hours ago








  • 2




    @Jane for observational case (for your question), in general you cannot infer direction of causality purely mathematically, at least for the two variable case. For more variables, under additional (untestable) assumptions you could make a claim, but the conclusion can still be questioned. This discussion is very long in comments. :)
    – Vimal
    8 hours ago


















thank you. what would be a test for it given a set of samples from joint distribution?
– Jane
15 hours ago




thank you. what would be a test for it given a set of samples from joint distribution?
– Jane
15 hours ago




3




3




I am reading arxiv.org/abs/1804.04622. I haven't read its references. I am trying to understand what one means by causality based on observational data.
– Jane
14 hours ago






I am reading arxiv.org/abs/1804.04622. I haven't read its references. I am trying to understand what one means by causality based on observational data.
– Jane
14 hours ago






1




1




I'm sorry (-1), this is not what is being asked, you don't observe $Y_1$ nor $Y_2$, you observe a sample of factual variables $X$, $Y$. See the paper Jane has linked.
– Carlos Cinelli
9 hours ago




I'm sorry (-1), this is not what is being asked, you don't observe $Y_1$ nor $Y_2$, you observe a sample of factual variables $X$, $Y$. See the paper Jane has linked.
– Carlos Cinelli
9 hours ago




2




2




@Vimal:I understand the case where we have "interventional distributions". We don't have "interventional distributions" in this setting and that is what makes it harder to understand. In the motivating example in the paper they give something like $(x, y=x^3+epsilon)$. The conditional distribution of y given x is essentially the distribution of the noise $epsilon$ plus some translation, while that doesn't hold for the conditional distribution of x given y. I initiatively understand the example. I am trying to understand what is the general definition for observational discovery of causality.
– Jane
8 hours ago






@Vimal:I understand the case where we have "interventional distributions". We don't have "interventional distributions" in this setting and that is what makes it harder to understand. In the motivating example in the paper they give something like $(x, y=x^3+epsilon)$. The conditional distribution of y given x is essentially the distribution of the noise $epsilon$ plus some translation, while that doesn't hold for the conditional distribution of x given y. I initiatively understand the example. I am trying to understand what is the general definition for observational discovery of causality.
– Jane
8 hours ago






2




2




@Jane for observational case (for your question), in general you cannot infer direction of causality purely mathematically, at least for the two variable case. For more variables, under additional (untestable) assumptions you could make a claim, but the conclusion can still be questioned. This discussion is very long in comments. :)
– Vimal
8 hours ago






@Jane for observational case (for your question), in general you cannot infer direction of causality purely mathematically, at least for the two variable case. For more variables, under additional (untestable) assumptions you could make a claim, but the conclusion can still be questioned. This discussion is very long in comments. :)
– Vimal
8 hours ago












Jane is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















Jane is a new contributor. Be nice, and check out our Code of Conduct.













Jane is a new contributor. Be nice, and check out our Code of Conduct.












Jane is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f380962%2fhow-is-causation-defined-mathematically%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Morgemoulin

Scott Moir

Souastre