Occasionally, records in CDR and collected mobile phone metadata can be corrupted: wrong format, faulty files, empty periods of time, missing users, etc. bandicoot will not attempt to correct errors as this might lead to incorrect analysis. It will instead:
By default, read_csv()
reports six warnings to the standard output:
bandicoot will automatically remove faulty records and will report the number
of ignored records (also available in the User
Object):
>>> my_user.ignored_records
{'all': 5,
'call_duration': 3,
'correspondent_id': 0,
'datetime': 2,
'direction': 4,
'interaction': 0,
'location': 0}
In this example, six records were removed:
Warning
An ignored record with multiple faulty fields will be double counted and reported for each incorrect value. The total number of ignored records is reported in all, here 5.
bandicoot also offer the option to remove “duplicated records“ (same
correspondants, direction, date and time). The option drop_duplicates=True
in read_csv()
is not activated by default, as one user
might send multiple text messages in less than one minute (or less, depending
on the granularity of the data set).
The function all()
returns a nested dictionary containing all indicators, but also 39 reporting variables:
antennas_path
, attributes_path
, recharges_path
,start_time
, end_time
, night_start
, night_end
, weekend
with a list of days defining a weekend, number_of_records
, number_of_antennas
, number_of_recharges
…,percent_records_missing_location
, antennas_missing_locations
, and ignored_records
mentioned previously,percent_outofnetwork_calls
, percent_outofnetwork_texts
, percent_outofnetwork_contacts
, percent_outofnetwork_call_durations
,groupby
, split_week
, split_day
.