Bandicoot notebook

bandicoot is an open-source python toolbox to analyze mobile phone metadata. For more information, see: http://cpg.doc.ic.ac.uk/bandicoot/

The source code of the notebook is available as demo.ipynb and a plain Python version as demo.py. You can download them from our repository on Github at https://github.com/computationalprivacy/bandicoot/tree/master/demo



Input files

In [1]:
# Records for the user 'ego'
!head -n 5 data/ego.csv
interaction,direction,correspondent_id,datetime,call_duration,antenna_id
text,in,A,2014-03-02 07:13:30,,1
text,in,E,2014-03-02 07:53:30,,1
text,in,E,2014-03-02 08:22:30,,2
text,out,D,2014-03-02 08:34:30,,3
In [2]:
# GPS locations of cell towers
!head -n 5 data/antennas.csv
antenna_id,latitude,longitude
1,42.366944,-71.083611
2,42.386722,-71.138778
3,42.3604,-71.087374
4,42.353917,-71.105

Loading a user

In [3]:
import bandicoot as bc

U = bc.read_csv('ego', 'data/', 'data/antennas.csv')
[x] 314 records from 2014-03-02 07:13:30 to 2014-04-14 12:04:37
[x] 7 contacts
[ ] No attribute stored
[x] 27 antennas
[ ] No recharges
[x] Has home
[x] Has texts
[x] Has calls
[ ] No network

Visualization

Export and serve an interactive visualization using:

bc.visualization.run(U)

or export only using:

bc.visualization.export(U, 'my-viz-path')
In [4]:
import os
viz_path = os.path.dirname(os.path.realpath(__name__)) + '/viz'

bc.visualization.export(U, viz_path);
Successfully exported the visualization to /tmp/bandicoot/demo/viz
In [5]:
from IPython.display import IFrame
IFrame("/files/viz/index.html", "100%", 700)
Out[5]:

Individual and spatial indicators

Using bandicoot, compute aggregated indicators from bc.individual and bc.spatial:

In [6]:
bc.individual.percent_initiated_conversations(U)
Out[6]:
{
    "allweek": {
        "allday": {
            "callandtext": {
                "mean": 0.3244815064488733,
                "std": 0.09659866096759165
            }
        }
    }
}
In [7]:
bc.spatial.number_of_antennas(U)
Out[7]:
{
    "allweek": {
        "allday": {
            "mean": 5.375,
            "std": 1.8666480653835098
        }
    }
}
In [8]:
bc.spatial.radius_of_gyration(U)
Out[8]:
{
    "allweek": {
        "allday": {
            "mean": 1.4503807789208683,
            "std": 0.8575480642906887
        }
    }
}

Let's play with indicators

The signature of the active_days indicators is:

bc.individual.active_days(user, groupby='week', interaction='callandtext', summary='default', split_week=False, split_day=False, filter_empty=True, datatype=None)

What does that mean?


The ‘groupby’ keyword


In [9]:
bc.individual.active_days(U)
Out[9]:
{
    "allweek": {
        "allday": {
            "callandtext": {
                "mean": 5.5,
                "std": 2.598076211353316
            }
        }
    }
}

The groupby keyword controls the aggregation:

  • groupby='week' to divide by week (by default),
  • groupby='month' to divide by month,
  • groupby=None to aggregate all values.
In [10]:
bc.individual.active_days(U, groupby='week')
Out[10]:
{
    "allweek": {
        "allday": {
            "callandtext": {
                "mean": 5.5,
                "std": 2.598076211353316
            }
        }
    }
}
In [11]:
bc.individual.active_days(U, groupby='month')
Out[11]:
{
    "allweek": {
        "allday": {
            "callandtext": {
                "mean": 22.0,
                "std": 8.0
            }
        }
    }
}
In [12]:
bc.individual.active_days(U, groupby=None)
Out[12]:
{
    "allweek": {
        "allday": {
            "callandtext": 44
        }
    }
}

The ‘summary’ keyword

Some indicators such as active_days returns one number. Others, such as duration_of_calls returns a distribution.

The summary keyword can take three values:

  • summary='default' to return mean and standard deviation,
  • summary='extended' for the second type of indicators, to return mean, sem, median, skewness and std of the distribution,
  • summary=None to return the full distribution.
In [13]:
bc.individual.call_duration(U)
Out[13]:
{
    "allweek": {
        "allday": {
            "call": {
                "mean": {
                    "mean": 3776.7093501036775,
                    "std": 1404.827412706482
                },
                "std": {
                    "mean": 1633.3931770157765,
                    "std": 689.2035500056488
                }
            }
        }
    }
}
In [14]:
bc.individual.call_duration(U, summary='extended')
Out[14]:
{
    "allweek": {
        "allday": {
            "call": {
                "mean": {
                    "mean": 3776.7093501036775,
                    "std": 1404.827412706482
                },
                "std": {
                    "mean": 1633.3931770157765,
                    "std": 689.2035500056488
                },
                "median": {
                    "mean": 3714.714285714286,
                    "std": 1532.9148671064283
                },
                "skewness": {
                    "mean": 0.12925073170191398,
                    "std": 0.48628300355189896
                },
                "kurtosis": {
                    "mean": 1.8063876957023484,
                    "std": 0.8998073161683097
                },
                "min": {
                    "mean": 1330.857142857143,
                    "std": 2200.2680634459994
                },
                "max": {
                    "mean": 6468.857142857143,
                    "std": 519.0475972040188
                }
            }
        }
    }
}
In [15]:
bc.individual.call_duration(U, summary=None)
Out[15]:
{
    "allweek": {
        "allday": {
            "call": [
                [
                    374,
                    1086,
                    1099,
                    1330,
                    2456,
                    3404,
                    4472,
                    5359,
                    5413,
                    6233
                ],
                [
                    594,
                    1927,
                    2072,
                    2258,
                    2854,
                    3286,
                    3552,
                    4202,
                    4689,
                    5142,
                    5689,
                    5752,
                    6429,
                    6891,
                    7082,
                    7123
                ],
                [
                    403,
                    539,
                    2109,
                    2726,
                    2871,
                    3609,
                    3782,
                    4154,
                    4240,
                    4666,
                    5658,
                    6392,
                    6541,
                    6674
                ],
                [
                    154,
                    267,
                    706,
                    1273,
                    1890,
                    3435,
                    3454,
                    3503,
                    3531,
                    3668,
                    4832,
                    4877,
                    4929,
                    5096,
                    5184,
                    6539,
                    7038
                ],
                [
                    140,
                    231,
                    620,
                    1853,
                    1937,
                    5728
                ],
                [
                    969,
                    1309,
                    1355,
                    1999,
                    2626,
                    3210,
                    4002,
                    4146,
                    4227,
                    4451,
                    5804
                ],
                [
                    6682
                ]
            ]
        }
    }
}

Splitting days and weeks

  • split_week divide records by 'all week', 'weekday', and 'weekend'.
  • split_day divide records by 'all day', 'day', and 'night'.
In [16]:
bc.individual.active_days(U, split_week=True, split_day=True)
Out[16]:
{
    "allweek": {
        "allday": {
            "callandtext": {
                "mean": 5.5,
                "std": 2.598076211353316
            }
        },
        "day": {
            "callandtext": {
                "mean": 5.5,
                "std": 2.598076211353316
            }
        },
        "night": {
            "callandtext": {
                "mean": 5.375,
                "std": 2.54644359843292
            }
        }
    },
    "weekday": {
        "allday": {
            "callandtext": {
                "mean": 4.428571428571429,
                "std": 1.3997084244475304
            }
        },
        "day": {
            "callandtext": {
                "mean": 4.428571428571429,
                "std": 1.3997084244475304
            }
        },
        "night": {
            "callandtext": {
                "mean": 4.428571428571429,
                "std": 1.3997084244475304
            }
        }
    },
    "weekend": {
        "allday": {
            "callandtext": {
                "mean": 1.8571428571428572,
                "std": 0.34992710611188266
            }
        },
        "day": {
            "callandtext": {
                "mean": 1.8571428571428572,
                "std": 0.34992710611188266
            }
        },
        "night": {
            "callandtext": {
                "mean": 1.7142857142857142,
                "std": 0.45175395145262565
            }
        }
    }
}

Exporting indicators

The function bc.utils.all computes automatically all indicators for a single user.

You can use the same keywords to group by week/month/all time range, or return extended statistics.

In [17]:
features = bc.utils.all(U, groupby=None)
In [18]:
features
Out[18]:
{
    "name": "ego",
    "reporting": {
        "antennas_path": "data/antennas.csv",
        "attributes_path": None,
        "recharges_path": None,
        "version": "0.6.0",
        "code_signature": "14bd4de76b8d90efdc2acb41e825b9a8914615ae",
        "groupby": None,
        "split_week": false,
        "split_day": false,
        "start_time": "2014-03-02 07:13:30",
        "end_time": "2014-04-14 12:04:37",
        "night_start": "19:00:00",
        "night_end": "07:00:00",
        "weekend": [
            6,
            7
        ],
        "number_of_records": 314,
        "number_of_antennas": 27,
        "number_of_recharges": 0,
        "bins": 1,
        "bins_with_data": 1,
        "bins_without_data": 0,
        "has_call": true,
        "has_text": true,
        "has_home": true,
        "has_recharges": false,
        "has_attributes": false,
        "has_network": false,
        "percent_records_missing_location": 0.0,
        "antennas_missing_locations": 0,
        "percent_outofnetwork_calls": 0,
        "percent_outofnetwork_texts": 0,
        "percent_outofnetwork_contacts": 0,
        "percent_outofnetwork_call_durations": 0,
        "ignored_records": {
            "all": 0,
            "interaction": 0,
            "direction": 0,
            "correspondent_id": 0,
            "datetime": 0,
            "call_duration": 0,
            "location": 0
        }
    },
    "active_days": {
        "allweek": {
            "allday": {
                "callandtext": 44
            }
        }
    },
    "number_of_contacts": {
        "allweek": {
            "allday": {
                "call": 7,
                "text": 7
            }
        }
    },
    "call_duration": {
        "allweek": {
            "allday": {
                "call": {
                    "mean": 3557.2933333333335,
                    "std": 2067.863501448348
                }
            }
        }
    },
    "percent_nocturnal": {
        "allweek": {
            "allday": {
                "call": 0.4266666666666667,
                "text": 0.4393305439330544
            }
        }
    },
    "percent_initiated_conversations": {
        "allweek": {
            "allday": {
                "callandtext": 0.3080357142857143
            }
        }
    },
    "percent_initiated_interactions": {
        "allweek": {
            "allday": {
                "call": 0.41333333333333333
            }
        }
    },
    "response_delay_text": {
        "allweek": {
            "allday": {
                "callandtext": {
                    "mean": 2310.0,
                    "std": 747.5961476626268
                }
            }
        }
    },
    "response_rate_text": {
        "allweek": {
            "allday": {
                "callandtext": 0.025806451612903226
            }
        }
    },
    "entropy_of_contacts": {
        "allweek": {
            "allday": {
                "call": 1.8626732085630935,
                "text": 1.8462551114653172
            }
        }
    },
    "balance_of_contacts": {
        "allweek": {
            "allday": {
                "call": {
                    "mean": 0.05904761904761905,
                    "std": 0.018662778992633737
                },
                "text": {
                    "mean": 0.04363419007770473,
                    "std": 0.01828697972597532
                }
            }
        }
    },
    "interactions_per_contact": {
        "allweek": {
            "allday": {
                "call": {
                    "mean": 10.714285714285714,
                    "std": 4.025429372458677
                },
                "text": {
                    "mean": 34.142857142857146,
                    "std": 15.027865274012724
                }
            }
        }
    },
    "interevent_time": {
        "allweek": {
            "allday": {
                "call": {
                    "mean": 49024.86486486487,
                    "std": 49455.92699110296
                },
                "text": {
                    "mean": 15683.474789915967,
                    "std": 16816.128561460955
                }
            }
        }
    },
    "percent_pareto_interactions": {
        "allweek": {
            "allday": {
                "call": 0.06666666666666667,
                "text": 0.02092050209205021
            }
        }
    },
    "percent_pareto_durations": {
        "allweek": {
            "allday": {
                "call": 0.06666666666666667
            }
        }
    },
    "number_of_interactions": {
        "allweek": {
            "allday": {
                "call": 75,
                "text": 239
            }
        }
    },
    "number_of_interaction_in": {
        "allweek": {
            "allday": {
                "call": 44,
                "text": 166
            }
        }
    },
    "number_of_interaction_out": {
        "allweek": {
            "allday": {
                "call": 31,
                "text": 73
            }
        }
    },
    "number_of_antennas": {
        "allweek": {
            "allday": 10
        }
    },
    "entropy_of_antennas": {
        "allweek": {
            "allday": 1.5002835799997023
        }
    },
    "percent_at_home": {
        "allweek": {
            "allday": 0.4738562091503268
        }
    },
    "radius_of_gyration": {
        "allweek": {
            "allday": 1.806537047440705
        }
    },
    "frequent_antennas": {
        "allweek": {
            "allday": 3
        }
    },
    "churn_rate": {
        "mean": 0.07391866292877006,
        "std": 0.05195576174609365
    }
}

Exporting in CSV and JSON

bandicoot supports exports in CSV and JSON format. Both to_csv and to_json functions require either a single feature dictionnary, or a list of dictionnaries (for multiple users).

In [19]:
bc.to_csv(features, 'demo_export_user.csv')
bc.to_json(features, 'demo_export_user.json')
Successfully exported 1 object(s) to demo_export_user.csv
Successfully exported 1 object(s) to demo_export_user.json
In [20]:
!head demo_export_user.csv

In [21]:
!head -n 15 demo_export_user.json
{
    "ego": {
        "name": "ego",
        "reporting": {
            "antennas_path": "data/antennas.csv",
            "attributes_path": null,
            "recharges_path": null,
            "version": "0.6.0",
            "code_signature": "14bd4de76b8d90efdc2acb41e825b9a8914615ae",
            "groupby": null,
            "split_week": false,
            "split_day": false,
            "start_time": "2014-03-02 07:13:30",
            "end_time": "2014-04-14 12:04:37",
            "night_start": "19:00:00",

Extending bandicoot

You can easily develop your indicator using the @grouping decorator. You only need to write a function taking as input a list of records and returning an integer or a list of integers (for a distribution). The @grouping decorator wraps the function and call it for each group of weeks.

In [22]:
from bandicoot.helper.group import grouping

@grouping(interaction='call')
def shortest_call(records):
    in_durations = (r.call_duration for r in records)
    return min(in_durations)
In [23]:
shortest_call(U)
Out[23]:
{
    "allweek": {
        "allday": {
            "call": {
                "mean": 1330.857142857143,
                "std": 2200.2680634459994
            }
        }
    }
}
In [24]:
shortest_call(U, split_day=True)
Out[24]:
{
    "allweek": {
        "allday": {
            "call": {
                "mean": 1330.857142857143,
                "std": 2200.2680634459994
            }
        },
        "day": {
            "call": {
                "mean": 1395.5714285714287,
                "std": 2186.820187078088
            }
        },
        "night": {
            "call": {
                "mean": 1182.1666666666667,
                "std": 908.7290422464895
            }
        }
    }
}