Demystifying Dependence and Why it Is Important in Causal Inference and Causal Validation | by Graham Harrison

A step-by-step guide in understanding the concept of dependence and how to apply it to validate directed acyclic graphs using Python

Graham Harrison
Towards Data Science
Photo by Ana Municio on Unsplash

Causal Inference is an emergent branch of data science concerned with determining the cause-and-effect relationship between events and outcomes and it has the potential to significantly add to the value that machine learning can generate for organisations.

For example, a traditional machine learning algorithm can predict which loan customers are likely to default thereby enabling proactive intervention with customers. However, although this algorithm will be useful to reduce loan defaults, it will have no concept of why they occurred and whilst pro-active intervention is useful knowing the reasons for defaults would enable the underlying cause to be addressed. In this world pro-active intervention may no longer be necessary because the factors that lead to defaulting have been permanently cured.

This is the promise of Causal Inference and why it has the potential to deliver significant impact and outcomes to those organisations that can harness that potential.

There are a number of different approaches but the most common approach typically starts by augmenting the data with a “Directed Acyclic Graph” which encapsulates and visualises the causal relationships in the data and then uses causal inference techniques to ask “what-if” type questions.

The Problem

A Directed Acyclic Graph (DAG) that encapsulates the causal relationships in the data is typically constructed manually (or semi-manually) by data scientists and domain experts working together. Hence the DAG could be wrong which would invalidate any causal calculations leading to flawed conclusions and potentially incorrect decisions.

The Opportunity

A range of techniques exist for “Causal Validation” (the process of validating the DAG against the data) and if these techniques work they can minimise or eliminate errors in the DAG thereby ensuring that the…

Source link

Source link