bandicoot is an open-source python toolbox to analyze mobile phone metadata. For more information, see: http://cpg.doc.ic.ac.uk/bandicoot/
The source code of the notebook is available as demo.ipynb
and a plain
Python version as demo.py
. You can download them from our repository on Github at https://github.com/computationalprivacy/bandicoot/tree/master/demo
# Records for the user 'ego'
!head -n 5 data/ego.csv
# GPS locations of cell towers
!head -n 5 data/antennas.csv
import bandicoot as bc
U = bc.read_csv('ego', 'data/', 'data/antennas.csv')
Export and serve an interactive visualization using:
bc.visualization.run(U)
or export only using:
bc.visualization.export(U, 'my-viz-path')
import os
viz_path = os.path.dirname(os.path.realpath(__name__)) + '/viz'
bc.visualization.export(U, viz_path);
from IPython.display import IFrame
IFrame("/files/viz/index.html", "100%", 700)
Using bandicoot, compute aggregated indicators from bc.individual
and bc.spatial
:
bc.individual.percent_initiated_conversations(U)
bc.spatial.number_of_antennas(U)
bc.spatial.radius_of_gyration(U)
The signature of the active_days
indicators is:
bc.individual.active_days(user, groupby='week', interaction='callandtext', summary='default', split_week=False, split_day=False, filter_empty=True, datatype=None)
What does that mean?
Weekly aggregation
By default, _bandicoot_ computes the indicators on a weekly basis and returns the average (mean) over all the weeks available and its standard deviation (std) in a nested dictionary.
bc.individual.active_days(U)
The groupby
keyword controls the aggregation:
groupby='week'
to divide by week (by default),groupby='month'
to divide by month,groupby=None
to aggregate all values.bc.individual.active_days(U, groupby='week')
bc.individual.active_days(U, groupby='month')
bc.individual.active_days(U, groupby=None)
Some indicators such as active_days returns one number. Others, such as duration_of_calls returns a distribution.
The summary keyword can take three values:
summary='default'
to return mean and standard deviation,summary='extended'
for the second type of indicators, to return mean, sem, median, skewness and std of the distribution,summary=None
to return the full distribution.bc.individual.call_duration(U)
bc.individual.call_duration(U, summary='extended')
bc.individual.call_duration(U, summary=None)
split_week
divide records by 'all week', 'weekday', and 'weekend'.split_day
divide records by 'all day', 'day', and 'night'.bc.individual.active_days(U, split_week=True, split_day=True)
The function bc.utils.all
computes automatically all indicators for a single user.
You can use the same keywords to group by week/month/all time range, or return extended statistics.
features = bc.utils.all(U, groupby=None)
features
bandicoot supports exports in CSV and JSON format. Both to_csv
and to_json
functions require either a single feature dictionnary, or a list of dictionnaries (for multiple users).
bc.to_csv(features, 'demo_export_user.csv')
bc.to_json(features, 'demo_export_user.json')
!head demo_export_user.csv
!head -n 15 demo_export_user.json
You can easily develop your indicator using the @grouping
decorator. You only need to write a function taking as input a list of records and returning an integer or a list of integers (for a distribution). The @grouping
decorator wraps the function and call it for each group of weeks.
from bandicoot.helper.group import grouping
@grouping(interaction='call')
def shortest_call(records):
in_durations = (r.call_duration for r in records)
return min(in_durations)
shortest_call(U)
shortest_call(U, split_day=True)