Datathon. Overview of the data. Zbigniew Smoreda, Cezary Ziemlicki SENSE/Orange Labs. Sc hool / Datathon

Documentos relacionados
Level 1 Spanish, 2013

PRÁCTICAS DE MODELADO Y OPTIMIZACIÓN

Panama - Encuesta de Niveles de Vida 2003

PROBLEMAS PARA LA CLASE DEL 20 DE FEBRERO DEL 2008

SAMPLE EXAMINATION BOOKLET

2017 Turkish Scholarships Graduate Program Applications

DANCE, CANTE, GUITAR AND RITMO&COMPÁS

TELEVISOR A COLORES MANUAL DE SERVICIO MODELO : CP-29C40P. ATENCIÓN Antes de dar servicio al chasis, lea las PRECAUCIONES DE SEGURIDAD en este manual.

Creating your Single Sign-On Account for the PowerSchool Parent Portal

TELMEX ANUNCIA ACUERDO PARA ADQUIRIR TV CABLE Y CABLE PACÍFICO EN COLOMBIA

Level 1 Spanish, 2013

Condiciones y Reglas de la presentación de Declaración Sumaria de Salida (EXS)

SIHI México, S. de R.L. de C.V. Pricing Guide

CONTROLADORA PARA PIXELS CONPIX

Presence in 4 continents with projects in 43 Countries. Collaborating with more than 50,000 PYMES

APPENDIX J/APÉNDICE J FORM FOR THE PREPARATION OF PROPOSALS FOR INTERCONNECTION OF DIGITAL NETWORKS IN THE CAR/SAM REGIONS

Ingreso a DatAcademy mediante Telefónica Accounts. Versiones: Español / Ingles Guía de usuario / User Guide

Pregunta 1 Suponga que una muestra de 35 observaciones es obtenida de una población con media y varianza. Entonces la se calcula como.

Level 1 Spanish, 2016

By plane: There are two airports close to Tarragona- Reus and Barcelona el Prat.

Práctica Autenticación y Autorización en GLITE

INSTRUCTIONS FOR FILLING OUT FINANCIAL AFFIDAVIT FOR CHILD SUPPORT INSTRUCCIONES PARA COMPLETAR EL AFIDÁVIT DE FINANZAS PARA MANUTENCIÓN DE MENORES

Functional Documentation HR Bundles. PeopleSoft HR Spain 9.0 Bundle#22: Functional Documentation Notes

Instituto de Estadísticas de Puerto Rico Estado Libre Asociado de Puerto Rico

Microsoft Access Diseño y manejo de tablas

HOMEWORK ASSIGNMENTS, TEST AND QUIZ DUE DATES: STUDY GUIDE and CLASS NOTES. NOV-04 TO NOV-22, 2016 SPANISH 2 PERIOD 7 S. DePastino

Section 6621 of the Internal Revenue Code establishes the interest rates on

In December, the number of property transfers registered is 124,616 properties, that is, 9.2% more than in the same month of 2013

74 Prime Time. conjetura Suposición acerca de un patrón o relación, basada en observaciones.

Comité de usuarios de la RES

/ INSCRIPCIÓN / / REGISTRATION S /

Informe estadísticas y proyecciones Second Life

Might. Área Lectura y Escritura. In order to understand the use of the modal verb might we will check some examples:

Level 1 Spanish, 2014

Programación lineal Optimización de procesos químicos DIQUIMA-ETSII

Info Research. Why Advertize with CazaPlano? Who is CazaPlano? Por que CazaPlano?.

1.Intereses Pagados por el Deudor Interest Paid by Borrower

F. Dr. Dominguez. Click here if your download doesn"t start automatically

Past Tense AR verbs. Guided Notes Packet. SpanishPlans.org

School Food and Nutrition Services Facilities Management Services

El Abecedario Financiero

Análisis de los Handicaps por Hoyo. SC de Golf 2009

TRANSFER FROM ALICANTE AIRPORT (EL ALTET) TO HOTEL CAP NEGRET IN ALTEA.

ESTUDIO DE LA VARIACION DEL COLOR ROJO DURANTE EL PERIODO DE COSECHA EN DOS CULTIVARES DE PIMIENTO EN LA ZONA DE TALCA.

1) Through the left navigation on the A Sweet Surprise mini- site. Launch A Sweet Surprise Inicia Una dulce sorpresa 2016

FIESTAS Y TRADICIONES POPULARES DE ALBACETE Y PROVINCIA

Viaje en Metro a las celebraciones de Año Nuevo!

Lessons Word list

TEXAS DEPARTMENT OF STATE HEALTH SERVICES

SISTEMA COMERCIAL DE PREVISIÓN DE TRÁFICO RESUMEN DEL PROYECTO

Módulo 12 Preposiciones de lugar, tiempo y modo

TRANSPORTE DEL AEROPUERTO

Reduction of Income Tax monthly payments Mexico

Directores: Doña Ana Bañuelos Vela y Don Francisco Fernández-Daza Mijares. Autor: José de Calasanz Bernal Barroso RESUMEN

Subject Pronouns. Compare these two sentences: o Carlos es mexicano. o Él es mexicano. Empecemos! Let s get started!

Deep Security 9 SP1 p3 Supported Linux Kernels

BEGINNING BAND PRACTICE JOURNAL #3 Also available online

Dicha medida podría ser irrevocable a decisión de la dirección de Volotea Virtual.

Los números. 0 cero 1 uno / un 2 dos 3 tres 4 cuatro. 6 seis 7 siete 8 ocho 9 nueve 10 diez 5 cinco

SISTEMA DE CONTROL LÓGICO PROGRAMABLE (PLC) SOBRE HARDWARE EMBEBIDO Y BAJO SISTEMA OPERATIVO LINUX

Country: Mexico Updated: January 27, 2013, 3:49 pm

UNA MUJER DE FE EXTRAORDINARIA: ENTREGA TODA TU VIDA A DIOS (SPANISH EDITION) BY JULIE CLINTON

Thyroid Scan. To Prepare

Greetings. Lists and TPR Sheets The Enlightened Elephant

In September, the number of property transfers registered is 123,642 properties, that is, 1.5% less than in the same month of the previous year

Highway Alternative A/Alternativa de la Autopista A

Marcos de Referencia para Sistemas de Coordenadas (CRS)

Lump Sum Final Check Contribution to Deferred Compensation

MANUAL EASYCHAIR. A) Ingresar su nombre de usuario y password, si ya tiene una cuenta registrada Ó

Ejercicios Input/Output 11 de Mayo, 2013

O. Orange Line: Halswell to Queenspark

Subject pronouns. In this presentation, we re going to look at the form and use of the subject pronouns in Spanish.

Registration for A Sweet Surprise Registro para Una dulce sorpresa

Nombre: Fecha: Período Día Avancemos: Lección preliminar: Parte 2. Lunes,, Miércoles,,,, Qué día fue ayer (yesterday)? Ayer fue

Michaelson Español 1

Robert T. Kiyosaki. Click here if your download doesn"t start automatically

ASI HABLO ZARATUSTRA UN LIBRO PARA TODOS Y PARA NADIE SPANISH EDITION

Market Highlights

Formulario de Postulación Universidad Católica Santo Toribio De Mogrovejo Estudiante de Intercambio Application Form / Exchange Student

Adjectives; Demonstrative

INDEX. To find the instructions that apply to your watch, please refer to the descriptions listed below:

EL DISEÑO INTELIGENTE DEBE SER ENSEÑADO EN LAS ESCUELAS?

Tareas programadas 1-5. Tareas programadas

Funi. ALQUILER y VENTA de BICICLETAS ELÉCTRICAS en OLOT / RENTAL and SALE of ELECTRIC BICYCLES in OLOT. RUTAS GUIADAS con GPS / TOURS with GPS

DANCE, CANTE, GUITAR AND RITMO&COMPÁS

Financial Affidavit for Child Support, DC 6:5(2) Declaración Jurada de Finanzas para Manutención de Menores, DC 6:5(2).

RECIBIRÁS EN TU CUENTA INSTITUCIONAL, UNA CARTA DE PREADJUDICACIÓN CON LOS DATOS DE TU MOVILIDAD. SI NO SON CORRECTOS, COMUNÍCALO A erasmus@upm.

Bachillerato Internacional Programa del Diploma Español, Año 2

ODIGO ONE Movilidad y atención personalizada en sus contactos con sus clientes. Front Office Services

Dos Palos Oro Loma Joint Unified School District Choice and SES

Terapia de campo de tumores en el tratamiento del glioblastoma

Análisis de los sistemas de control de la producción Kanban y Conwip bajo escenarios de reprocesado

BASE & WALL EASY REACH CABINET 170º Hinge Replacement

El límite mínimo para las cuentas comerciales grandes es de $2,000/mes por el uso del servicio.

How to use? Como se usa? Método de Cuatro Veces al día. Four Times a Day Method (after 2 weeks)

SPANISH WITH PAUL MINI COURSE 7

Do Now Question 1: Pregunta 1

Manual Tecnico Lavadora Whirlpool

(No. 96) (Approved September 10, 2009) AN ACT

FTP. Telnet. Ejemplos Tema 5.- Nivel de aplicación en Internet

Transcripción:

Datathon Overview of the data Zbigniew Smoreda, Cezary Ziemlicki SENSE/Orange Labs

Senegal data The datasets are based on Call Detail Records (CDR) of phone calls and text exchanges between more than 9 million of Sonatel s customers in Senegal collected for the period January 1 to December 31, 2013

datasets constructed 1. antenna-to-antenna traffic for 1,666 antenna towers on an hourly basis 2. fine-grained mobility data (antenna level) on a rolling 2- week basis with about 300,000 randomly sampled users 3. one year coarse-grained mobility data at 123 arrondissement level for about 150,000 randomly sampled users 2 & 3 with bandicoot behavioral indicators at individual level

one year of site-to-site traffic (voice) for 1666 sites on an hourly basis The files SET1V_M01.csv through SET1V_M12.csv contain monthly voice traffic between sites and are structured as follow: timestamp: day and hour considered in format YYYY-MM-DD HH (24 hours format) outgoing_site_id: id of site the call originated from incoming_site_id: id of site receiving the call number_of_calls: the total number of calls between these two sites during this hour total_call_duration: the total duration of all calls between these two sites during this hour (sec)

one year of site-to-site traffic (voice) for 1666 sites on an hourly basis The files SET1V_M01.csv through SET1V_M12.csv contain monthly voice traffic between sites and are structured as follow: timestamp: day and hour considered in format YYYY-MM-DD HH (24 hours format) outgoing_site_id: id of site the call originated from incoming_site_id: id of site receiving the call number_of_calls: the total number of calls between these two sites during this hour total_call_duration: the total duration of all calls between these two sites during this hour

one year of site-to-site traffic (sms) (for 1666 sites on an hourly basis) The files SET1S_M01.csv through SET1S_M12.csv contain monthly text traffic between sites and are structured as follow: timestamp: day and hour considered in format YYYY-MM-DD HH (24 hours format) outgoing_site_id: id of site the text originated from incoming_site_id: id of site receiving the text number_of_sms: the total number of texts between these two sites during this hour

one year of site-to-site traffic (sms) (for 1666 sites on an hourly basis) The files SET1S_M01.csv through SET1S_M12.csv contain monthly text traffic between sites and are structured as follow: timestamp: day and hour considered in format YYYY-MM-DD HH (24 hours format) outgoing_site_id: id of site the text originated from incoming_site_id: id of site receiving the text number_of_sms: the total number of texts between these two sites during this hour

antenna-towers coordinates The SITE_ARR_LATLON.csv file contains the latitude and longitude of the site (and the arrondissement code)

fine-grained mobility data (site level) for 1666 sites The files SET2_P01.csv through SET2_P25.csv contain the 25 two-week periods for 300,000 randomly selected users detailed records: user_id: selected user random id timestamp: 24 h format YYYY-MM-DD HH:M0:00 site_id: id of the antenna site Note: The second digits of the minutes and all the seconds of the timestamps have been replaced with zeros; Only users having more than 75% days with interactions in the considered period can be selected.

fine-grained mobility data (site level) for 1666 sites The files SET2_P01.csv through SET2_P25.csv contain the 25 two-week periods for 300,000 randomly selected users detailed records: user_id: selected user random id timestamp: 24 h format YYYY-MM-DD HH:M0:00 site_id: id of the antenna site

coarse-grained individual mobility data at arrondissement level One year of coarse-grained (123 arrondissement level) mobility data for about 150,000 randomly sampled users only users having more than 75% days with interactions yearly were selected

coarse-grained individual mobility data at arrondissement level The files SET3_M01.csv through SET3_M12.csv contain data of 150,000 randomly sampled users month by month: user_id: selected user random id timestamp: 24 h format YYYY-MM-DD HH:M0:00 arrondissement_id: arrondissement code Note: The second digits of the minutes and all the seconds of the timestamps have been replaced with zeros

coarse-grained individual mobility data at arrondissement level The files SET3_M01.csv through SET3_M12.csv contain data of 150,000 randomly sampled users month by month: user_id: selected user random id timestamp: 24 h format YYYY-MM-DD HH:M0:00 arrondissement_id: arrondissement code

geographic information The arrondissement shapefile is provided (SHAPEFILE_SENEGAL.zip) as well as a summary table (senegal_arr_centroids.csv) with geometrical center of the arrondissement coordinates, its code, and the names of region, department, municipality/city and arrondissement Centroid_Lon,Centroid_Lat,arr_id,reg,dept,cav,arr -17.446025848,14.747966068,1,DAKAR,DAKAR,VILLE DE DAKAR,PARCELLES ASSAINIES -17.485963486,14.736288483,2,DAKAR,DAKAR,VILLE DE DAKAR,ALMADIES -17.440683318,14.714470627,3,DAKAR,DAKAR,VILLE DE DAKAR,GRAND DAKAR -17.449171084,14.681392965,4,DAKAR,DAKAR,VILLE DE DAKAR,DAKAR PLATEAU

supplementary individual indicators Mobility datasets (2 & 3) are supplemented with behavioral indicators computed from CDRs using the bandicoot toolbox The indicators are computed, for every user: for fine-grained mobility over the course of the two weeks - INDICATORS_SET2_P01.csv through INDICATORS_SET2_P25.csv on a monthly basis for coarse-grained mobility data - INDICATORS_SET3_M01.csv through INDICATORS_SET3_M12.csv

behavioral indicators ( http://bandicoot.mit.edu/docs/ ) active_days_callandtext_mean (user s active days mean) active_days_callandtext_sem (user s active days standard error of the mean) duration_of_calls_mean_mean duration_of_calls_mean_sem entropy_of_contacts_call_mean entropy_of_contacts_call_sem entropy_of_contacts_text_mean entropy_of_contacts_text_sem entropy_of_contacts_callandtext_mean entropy_of_contacts_callandtext_sem entropy_places_callandtext_mean entropy_places_callandtext_sem interactions_per_contact_callandtext_mean_mean interactions_per_contact_callandtext_mean_sem interactions_per_contact_call_mean_mean interactions_per_contact_call_mean_sem interevents_callandtext_mean_mean interevents_callandtext_mean_sem interevents_call_mean_mean interevents_call_mean_sem interevents_text_mean_mean interevents_text_mean_sem..

Virtual machines note the data files are compressed to save workspace datathon2@vc1-vm1:~$ ll /data/ total 24 drwxr-xr-x 6 root root 4096 Mar 27 15:37./ drwxr-xr-x 23 root root 4096 Mar 23 10:54../ drwxr-xr-x 2 root root 4096 Mar 23 10:55 ContextData/ drwxr-xr-x 2 root root 4096 Mar 27 12:23 SET1/ drwxr-xr-x 2 root root 4096 Mar 27 15:41 SET2/ drwxr-xr-x 2 root root 4096 Mar 27 12:43 SET3/ datathon2@vc1-vm1:~$ ll -h /data/contextdata/ total 1.4M drwxr-xr-x 2 root root 4.0K Mar 23 10:55./ drwxr-xr-x 6 root root 4.0K Mar 27 15:37../ -rw-r--r-- 1 root root 150K Mar 23 10:55 bandicoot_v01.pdf -rw-r--r-- 1 root root 200K Mar 23 10:55 D4D_Senegal.pdf -rw-r--r-- 1 root root 9.9K Mar 23 10:55 senegal_arr_centroids.csv -rw-r--r-- 1 root root 3.6K Mar 23 10:55 SENEGAL_ARR_V2.csv -rw-r--r-- 1 root root 994K Mar 23 10:55 Shapefile_Senegal_V2.zip -rw-r--r-- 1 root root 46K Mar 23 10:55 SITE_ARR_LONLAT.CSV see also: http://www.d4d.orange.com/en/partners-resources/resources

Virtual machines datathon2@vc1-vm1:~$ ll -h /data/set1 total 8.8G drwxr-xr-x 2 root root 4.0K Mar 27 12:23./ drwxr-xr-x 6 root root 4.0K Mar 27 15:37../ -rw-r--r-- 1 root root 170M Mar 23 10:56 SET1S_01.CSV.gz -rw-r--r-- 1 root root 144M Mar 23 10:56 SET1S_02.CSV.gz -rw-r--r-- 1 root root 153M Mar 23 10:56 SET1S_03.CSV.gz -rw-r--r-- 1 root root 513M Mar 23 11:01 SET1V_01.CSV.gz -rw-r--r-- 1 root root 471M Mar 23 11:01 SET1V_02.CSV.gz -rw-r--r-- 1 root root 521M Mar 23 11:02 SET1V_03.CSV.gz. datathon2@vc1-vm1:~$ ll -h /data/set2 total 3.6G drwxr-xr-x 2 root root 4.0K Mar 27 15:41./ drwxr-xr-x 6 root root 4.0K Mar 27 15:37../ -rw-r--r-- 1 root root 936 Mar 27 15:37 INDICATORS_SET2_HEADERS.CSV -rw-r--r-- 1 root root 20M Mar 27 15:37 INDICATORS_SET2_P01.CSV.gz -rw-r--r-- 1 root root 20M Mar 27 15:37 INDICATORS_SET2_P02.CSV.gz -rw-r--r-- 1 root root 20M Mar 27 15:37 INDICATORS_SET2_P03.CSV.gz -rw-r--r-- 1 root root 121M Mar 27 15:38 SET2_P01.CSV.gz -rw-r--r-- 1 root root 117M Mar 27 15:38 SET2_P02.CSV.gz -rw-r--r-- 1 root root 120M Mar 27 15:38 SET2_P03.CSV.gz.

Virtual machines datathon2@vc1-vm1:~$ ll -h /data/set3 total 2.1G drwxr-xr-x 2 root root 4.0K Mar 27 12:43./ drwxr-xr-x 6 root root 4.0K Mar 27 15:37../ -rw-r--r-- 1 root root 287 Mar 23 11:06 INDICATORS_SET3_HEADERS.CSV.gz -rw-r--r-- 1 root root 9.8M Mar 23 11:06 INDICATORS_SET3_M01.CSV.gz -rw-r--r-- 1 root root 9.7M Mar 23 11:06 INDICATORS_SET3_M02.CSV.gz -rw-r--r-- 1 root root 9.8M Mar 23 11:06 INDICATORS_SET3_M03.CSV.gz -rw-r--r-- 1 root root 152M Mar 23 11:08 SET3_M01.CSV.gz -rw-r--r-- 1 root root 135M Mar 23 11:08 SET3_M02.CSV.gz -rw-r--r-- 1 root root 153M Mar 23 11:09 SET3_M03.CSV.gz -rw-r--r-- 1 root root 150M Mar 23 11:09 SET3_M04.CSV.gz -rw-r--r-- 1 root root 162M Mar 23 11:09 SET3_M05.CSV.gz -rw-r--r-- 1 root root 157M Mar 23 11:10 SET3_M06.CSV.gz note the data files are compressed to save workspace

merci