Introducción a la Inteligencia de Negocios con Técnicas de la Inteligencia Computacional

Documentos relacionados

Los futuros desafíos de la Inteligencia de Negocios. Richard Weber Departamento de Ingeniería Industrial Universidad de Chile

Datamining Introducción

CURSO/GUÍA PRÁCTICA GESTIÓN EMPRESARIAL DE LA INFORMACIÓN.

Diseño ergonómico o diseño centrado en el usuario?

Agustiniano Ciudad Salitre School Computer Science Support Guide Second grade First term

Universidad Nueva Esparta Facultad de Ciencias de la Informática Escuela de Computación

Este proyecto tiene como finalidad la creación de una aplicación para la gestión y explotación de los teléfonos de los empleados de una gran compañía.

NubaDat An Integral Cloud Big Data Platform. Ricardo Jimenez-Peris

MINERIA DE DATOS Y Descubrimiento del Conocimiento

U.P.A.E.P. Sistemas Empresariales IT

SISTEMA DE GESTIÓN Y ANÁLISIS DE PUBLICIDAD EN TELEVISIÓN

PROBLEMAS PARA LA CLASE DEL 20 DE FEBRERO DEL 2008

Título del Proyecto: Sistema Web de gestión de facturas electrónicas.

MANUAL EASYCHAIR. A) Ingresar su nombre de usuario y password, si ya tiene una cuenta registrada Ó

Fundamentos y Aplicaciones Prácticas del Descubrimiento de Conocimiento en Bases de Datos Guía docente

XII JICS 25 y 26 de noviembre de 2010

La política de NGSA, abarca todas las funciones que participan en la recepción y el cumplimiento de peticiónes de nuestros clientes.

ENSIA 605 Inteligencia de Negocios y Minería de Datos

SAFETY ROAD SHOW 2015 Paul Teboul Co Chairman HST México

INTELIGENCIA DE NEGOCIO

La importancia del ecommerce en la transformación e internacionalización de un grupo mayorista

Sistemas de Información Gerencial ii. Los SI en la empresa

Human-Centered Approaches to Data Mining: Where does Culture meet Statistics?

Kuapay, Inc. Seminario Internacional Modernización de los medios de pago en Chile

Business Intelligence (Inteligencia de Negocios) Bases de Datos Masivas (11088) Universidad Nacional de Luján

Universidad de Guadalajara

INTELIGENCIA DE NEGOCIO

Tesis de Maestría titulada

La siguiente generación de Datawarehouse : Más allá del Data Warehouse permitir information on demand. IBM Information Management

Where are Chilean companies hiring?

Pontificia Universidad Católica de Chile Escuela de Ingeniería Departamento de Ingeniería Industrial y de Sistemas. Datamining Técnicas

Chattanooga Motors - Solicitud de Credito

ÍNDICE. Introducción... Capítulo 1. Inteligencia de negocios y sistemas de información. Informes... 1

MS_6234 Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

Diseño de un directorio Web de diseñadores gráficos, ilustradores y fotógrafos.

PROGRAMA ANALÍTICO DE ASIGNATURA

una solución para mejorar la toma de decisiones Performance Management Reporting & Analysis Data Warehouse

Sistemas de Información 12/13 La organización de datos e información

Contratación e Integración de Personal

Universidad de Castilla-La Mancha Departamento de Informática El Diseño de Software para la Gestión del Conocimiento

Aprendizaje Automático y Data Mining. Bloque IV DATA MINING

Learning Masters. Early: Force and Motion

Connection from School to Home Science Grade 5 Unit 1 Living Systems

Modelado del comportamiento del tipo de cambio peso-dólar mediante redes neuronales diferenciales

ARTÍCULOS Y REFLEXIONES

Gestión de la Información

INNOVACIÓN Tecnologías de información La nueva Era. Javier Cordero Torres Presidente y Director General Oracle México Febrero 27, 2015

Minería de Datos. Vallejos, Sofia

Sistemas de impresión y tamaños mínimos Printing Systems and minimum sizes

Inteligencia de Negocios. Tablero de Comando

INSTITUTO POLITÉCNICO NACIONAL

UNIVERSIDAD TECNOLÓGICA EQUINOCCIAL DIRECCIÓN GENERAL DE POSGRADOS MAGÍSTER EN GERENCIA DE NEGOCIOS. Trabajo de grado para la obtención del título de:

ARIS Solution for Governance, Risk & Compliance Management. Ensure Business Compliance

Edgar Quiñones. HHRR: Common Sense Does Not Mean Business. Objective

SISTEMA CONTROL DE ACCESOS A EDIFICIOS MEDIANTE TARJETAS CRIPTOGRÁFICAS Y TARJETAS DE RADIOFRECUENCIA (RFID)

ADAPTACIÓN DE REAL TIME WORKSHOP AL SISTEMA OPERATIVO LINUX

English Literacy Success Team, e3 Civic High October 30, 2014

BOOK OF ABSTRACTS LIBRO DE RESÚMENES

IWG-101: Introducción a la Ingeniería. Departamento de Informática, UTFSM 1

Summer Reading Program. June 1st - August 10th, 2015

Mineria de datos y su aplicación en web mining data Redes de computadores I ELO 322

IT Power Camp 3: Project Management with Microsoft Project and PMI

La ayuda practica de hoy para los CIO s y responsables de servicio

Certificado de Asistente de Oficina

Powered by RELEASE NOTES. CSS Business Intelligence. Versión Build

Guía Docente 2015/2016

Visión global del KDD

Artículos de Minería de Datos de Dataprix Introducción a la minería de datos

El Abecedario Financiero

Instalación: Instalación de un agente en una máquina cliente y su registro en el sistema.

Inteligencia de Negocios. Cecilia Ruz

Trading & Investment In Banking

Diseño e Implementación de un Sistema para la Segmentación de Clientes de una Operadora Celular

Data Mining Técnicas y herramientas

El fenómeno Big Data y los títulos en Estadística en España.

Advanced Corporate Finance

Introducción. Growing on the CRM industry during Significant. decrease 4% Ns/nc 2% Slight decrease 4% Remains 5% Significant.

Los cambios del borrador ISO 14001:2015

Overview of Data Warehousing / Business Intelligence with SQL Server

DISEÑO E IMPLEMENTACIÓN DE SOLUCIONES BUSINESS INTELLIGENCE CON SQL SERVER 2012

ERP s Universitarios: soluciones, experiencias y tendencias. CrueTIC Universidad de La Rioja

Área Académica: Sistemas Computacionales. Tema: Arquitectura de un sistema de almacén de datos. Profesor: Mtro Felipe de Jesús Núñez Cárdenas

SharePoint. Conference Perú 2011

Por qué ExecuTrain? Por qué ExecuTrain? Modalidad de servicio

Behavior Modeling State Diagrams

Karem Jaquelin Peralta Venegas RESUMEN

Oficina de Convenciones y Visitantes de Hidalgo A.C. PLAN DE MARKETING

Resultados de Marketing Directo Utilizando Conceptos de Segmentación. RFM: Recency, Frequency, Monetary Value.

GENERAL INFORMATION Project Description

Base de datos II Facultad de Ingeniería. Escuela de computación.

Sistemas de Información para la Gestión. Unidad 3 Aplicaciones de Sistemas

Web analytics y recomendaciones inteligentes. Jose Yáñez Especialista Soluciones Enterprise Marketing Manager #START013, 6 Noviembre 2012

MS_20696 Managing Enterprise Devices and Apps using System Center Configuration Manager

Maestrías en Londres, Orbis Estudios. [Nombre del autor] [Dirección de la compañía]

SISTEMAS DE INFORMACION GERENCIAL LIC.PATRICIA PALACIOS ZULETA

Minería de Datos. Vallejos, Sofia

Por qué ExecuTrain? Por qué ExecuTrain? Modalidad de servicio

Microsoft Solutions Framework - CMMI. Luis Fraile MVP Team System lfraile@lfraile.net

Innovación empresarial disciplina DevOps

Transcripción:

Introducción a la Inteligencia de Negocios con Técnicas de la Inteligencia Computacional Richard Weber Departamento de Ingeniería Industrial Universidad de Chile rweber@dii.uchile.cl

Contenido de la presentación Introducción y Motivación Inteligencia de Negocios: Definición y Vista General Inteligencia Computacional: Las Principales Técnicas Aplicaciones de Data Mining con Inteligencia Computacional Herramientas de Data Mining Conclusiones y Perspectivas

El Vértigo de la Inteligencia de Negocios CRM: Customer Relationship Management (Gestión de la relación con el cliente) CMR:??? Data Warehouse / Data Mart Inteligencia de Negocios (Business Intelligence) OLAP: Online Analytical Processing Data Mining: Minería de datos Knowledge Management Inteligencia Artificial Balanced Scorecard KPI: Key Performance Indicators

KPI: Key Performance Indicators Definición KPIs, or key performance indicators help organizations achieve organizational goals through the definition and measurement of progress. The key indicators are agreed upon by an organization and are indicators which can be measured that will reflect success factors. The KPIs selected must reflect the organization's goals, they must be key to its success, and they must be measurable. Key performance indicators usually are long-term considerations for an organization. http://www.webopedia.com/term/k/kpi.html

Balanced Scorecard Definición Balanced Scorecard The balanced scorecard is a strategic management system used to drive performance and accountability throughout the organization.the scorecard balances traditional performance measures with more forward-looking indicators in four key dimensions:» Financial» Integration/Operational Excellence» Employees» Customers The Balanced Scorecard is an organizational framework for implementing and managing strategy at all levels of an enterprise by linking objectives, initiatives, and measures to an organization s strategy. The scorecard provides an enterprise view of an organization s overall performance. It integrates financial measures with other key performance indicators around customer perspectives, internal business processes, and organizational growth, learning, and innovation. http://www.leanadvisors.com/lean/glossary/definition.cfm/word/balanced%20scorecard.cfm

Inteligencia Artificial Definición Artificial Intelligence The branch of computer science concerned with making computers behave like humans. The term was coined in 1956 by John McCarthy at the Massachusetts Institute of Technology. Artificial intelligence includes: games playing: programming computers to play games such as chess and checkers expert systems : programming computers to make decisions in real-life situations (for example, some expert systems help doctors diagnose diseases based on symptoms) natural language : programming computers to understand natural human languages neural networks : Systems that simulate intelligence by attempting to reproduce the types of physical connections that occur in animal brains robotics : programming computers to see and hear and react to other sensory stimuli http://www.webopedia.com/term/a/artificial_intelligence.html

Knowledge Management Definición Knowledge Management is the explicit and systematic management of vital knowledge - and its associated processes of creation, organization, diffusion, use and exploitation. Explicit - Surfacing assumptions; codifying that which is known Systematic - Leaving things to serendipity will not achieve the benefits Vital Knowledge - You need to focus; you don't have unlimited resources Processes - Knowledge management is a set of activities with its own tools and techniques http://www.skyrme.com/resource/kmbasics.htm

CRM Definición CRM Short for customer relationship management. CRM entails all aspects of interaction a company has with its customer, whether it be sales or service related. Computerization has changed the way companies are approaching their CRM strategies because it has also changed consumer buying behavior. With each new advance in technology, especially the proliferation of self-service channels like the Web and WAP phones, more of the relationship is being managed electronically. Organizations are therefore looking for ways to personalize online experiences (a process also referred to as mass customization) through tools such as help-desk software, e-mail organizers and Web development apps. http://www.webopedia.com/term/c/crm.html

Business Intelligence Definición Business Intelligence The term Business Intelligence (BI) represents the tools and systems that play a key role in the strategic planning process of the corporation. These systems allow a company to gather, store, access and analyze corporate data to aid in decision-making. Generally these systems will illustrate business intelligence in the areas of customer profiling, customer support, market research, market segmentation, product profitability, statistical analysis, and inventory and distribution analysis to name a few. http://www.webopedia.com/term/b/business_intelligence.html

Minería de datos para la Inteligencia de Negocios: Motivación Costos para guardar datos: 30.0 25.0 20.0 15.0 10.0 5.0 0.0 1990 1992 1994 1996 1998 2000 2002 Costos de un disco duro (US-$) / Capacidad (MB) Fuente: http://www.sims.berkeley.edu/research/projects/how-much-info-2003/

Minería de datos para la Inteligencia de Negocios: Motivación Disponibilidad de datos: 16000 14000 12000 10000 8000 6000 4000 2000 0 1995 1996 1997 1998 1999 2000 2001 2002 2003 Capacidad de nuevos discos duros (PB) Fuente: http://www.sims.berkeley.edu/research/projects/how-much-info-2003/

Generación de datos The World Wide Web contains about 170 terabytes of information on its surface; in volume this is seventeen times the size of the Library of Congress print collections. Instant messaging generates five billion messages a day (750GB), or 274 Terabytes a year. Email generates about 400,000 terabytes of new information each year worldwide. Fuente: http://www.sims.berkeley.edu/research/projects/how-much-info-2003/ Código Barra RFID: Radio Frequency Identification

Data Warehouse Definición Data Warehouse: Abbreviated DW, a collection of data designed to support management decision making. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time. Development of a data warehouse includes development of systems to extract data from operating systems plus installation of a warehouse database system that provides managers flexible access to the data. The term data warehousing generally refers to the combination of many different databases across an entire enterprise. Contrast with data mart. Fuente: http://www.webopedia.com/term/d/data_warehouse.html

Arquitectura de un Data Warehouse Datos Información Decisión Datos operacionales Herramientas de Data Mining Información detallada Resumen Datos externos Meta Datos Herramienta de OLAP Fuente: Anahory, Murray (1997): Data Warehousing in the Real World.

OLAP Definición OLAP Short for Online Analytical Processing, a category of software tools that provides analysis of data stored in a database. OLAP tools enable users to analyze different dimensions of multidimensional data. For example, it provides time series and trend analysis views. OLAP often is used in data mining. The chief component of OLAP is the OLAP server, which sits between a client and a database management systems (DBMS). The OLAP server understands how data is organized in the database and has special functions for analyzing the data. There are OLAP servers available for nearly all the major database systems. http://www.webopedia.com/term/o/olap.html

Navegación en un cubo OLAP Producto P1 Drill down: profundizar una dimensión U1 Tiempo Ubicación

Motivaciones para Almacenar Datos Razones iniciales: En telecomunicación: Facturación de llamadas En supermercados: Gestión del inventario En bancos: Manejo de cuentas En empresas de producción: Control de procesos Potenciales: En telecomunicación: Detección de fraude En supermercados: Asociación de ventas En bancos: Segmentación de clientes En empresas de producción Mantenimiento preventivo

Idea básica y potenciales de data mining Empresas y Organizaciones tienen gran cantidad de datos almacenados. Los datos disponibles contienen información importante. La información está escondida en los datos. Data mining puede encontrar información nueva y potencialmente útil en los datos

Proceso de KDD Knowledge Discovery in Databases Transformación Data Mining Preprocesamiento Selección Patrones Datos Datos seleccionados Datos preprocesados Datos transformados Interpretación y Evaluación KDD es el proceso no-trivial de identificar patrones previamente desconocidos, válidos, nuevos, potencialmente útiles y comprensibles dentro de los datos

Potenciales de Data Mining - 1

Potenciales de Data Mining - 2

Aplicaciones de Data Mining Customer Relationship Management Segmentación de clientes Database Marketing Predicción de compra Retención de clientes Predicción de fuga Detección de Fraude Tarjetas de crédito Uso de teléfonos (celulares) Predicción de series de tiempo

Inteligencia Computacional: Las principales técnicas Redes neuronales Lógica difusa Algoritmos genéticos

Computational Intelligence 1943 Neural Networks (McCulloch, Pitts) 1965 Fuzzy Logic (Zadeh) 1974 GA (Schwefel) 1993 First IEEE Conference joining FL, NN and GA (USA) 1993 First EUFIT Conference (Europe) 1996 First Online Workshop on Soft Computing 2001 First International Workshop on Hybrid Intelligent Systems (HIS'01) 2002 First International NAISO Congress on Neuro Fuzzy Technologies

Métodos de Data Mining Estadística Agrupamiento (Clustering) Análisis Discriminante Redes Neuronales Árboles de Decisión Reglas de Asociación Bayesian (Belief) Networks Support Vector Machines (SVM)

Redes Neuronales natural artificial Neurona Conexiones con pesos

Neuronas Artificiales Neuronas Verdaderas sinapsis Núcleo Neuronas Artificiales Dendritas Axon Cuerpo Celular x 1 (t) x 2 (t) w 2 w 1 a(t) y y=f(a) o(t+1) x n (t) w n w 0 a

Perceptron (1962) Generalización y formalización de las redes neuronales. o 1 o 2 o p x 1 x 2 x 3 x n o i n ( a ) = f w x = f i k = 0 ik k i =1, K, p

Perceptron la falla La función XOR (exclusive or): x 1 x 2 y 0 0 0 0 1 1 1 0 1 1 1 0 1 0 x 2 0 1 x 1 Minsky, Papert (1969)

Multilayer Perceptron (MLP) La mayoría de las aplicaciones de redes neuronales están referidas a MLP o i n = f W j f n j= 0 k = 0 w ik x k Es una función no lineal, de una combinación lineal de funciones nolineales de funciones de combinaciones lineales de los datos de entrada; => Clasificación y Regresión no lineal!! f 2 3 = ' j= 1 i= 1 ( x) G( w G( w x + b ) + b 1) 1 j ji i j '

Backpropagation un ejemplo r=3 G ( w jixi + bi ) G ( w' jg( w jixi + bi ) + b w 11 i= 1 n=2 j= 1 i= 1 w s=1 w 21 w 11 x 12 p w 22 w o p y p 12 w 13 3 w 23 ' Δ w = η x δ p ji i pj 2 3 δp = p p 1i ij j j i i= 1 j= 1 3 ' G ( w ji xi b j ) δ pw' 1 j i= 1 δ pj = + 3 Δ p w 2 3 1 j 1 j = ηg( w jixi + bi ) i= 1 δ ' ' ( y o ) G'( w G( w x + b ) + b) p )

Base de lógica difusa Cliente joven μ (A) 1 Función de pertenencia Variable lingüística 30 36 42 Edad

Agrupamiento con lógica difusa x3 x15 1 0 x6 x12 1 0 x2 x5 x7 x8 x9 x11 x14 1 1 1 1 0 0 0 x4 x10 1 0 x1 Cluster Centres =^ x13 1 Cluster Centres =^ 0 Butterfly Grupos estrictos.86.14.14.86.94.06.06.94.97 X.99.86.50.14.01 X.03.03.01 X.14.50.86.99.97 X.94.06.06.94.86 Cluster Centres =^.14.14 Cluster Centres =^.86 Grupo difuso 1 Grupo difuso 2

Agrupamiento con Lógica Difusa Algoritmo: Fuzzy c-means (FCM) n objetos, c clases u i,j = grado de pertenencia de objeto i a clase j (i=1,..., n; j=1,..., c) U = (u i,j ) i,j u i,j [0,1]; u i,j = 1; i = 1,..., n Función objetivo: min (u i,j ) m d 2 (x i, c j ) x i : objeto i; c j : centro de clase j; d 2 (x i, c j ): distancia entre x i y c j m : parámetro difuso (1<m< )

1. Determina una matriz U con u i,j [0,1]; =1 2. Determina los centros de las clases: c j = 3. Actualiza los grados de pertenencia: u i,j = U k = matriz en iteración k 4. Criterio para detener: U k+1 -U k < ε Algoritmo: Fuzzy c-means (FCM) = c k k i j i m c x d c x d 1 1 2 ), ( ), ( 1 = = n i j i n i i j i m m u x u 1, 1, = c j j u i 1,

Genetic Algorithms Description Inspired by evolution (Darwin). Represent possible solutions to a problem Genetic algorithms generate a population of genes (possible solutions) and make them evolve to obtain better genes (better solutions). Based on the principle of Survival of the fittest

Segmentación de Clientes Banco Producto 1 Producto n????? Requerimientos Requerimientos Clientes Qué producto para qué cliente?

Principales áreas de aplicación de Web Usage Mining

Identifying web usage behavior of bank customers Sandro Araya 1), Mariano Silva 2), Richard Weber 3) 1) BCI Bank, Santiago, Chile 2) webmining.cl, Santiago, Chile 3) Department of Industrial Engineering, Universidad de Chile, Santiago, Chile Araya, S., Silva, M., Weber, R. (2004): A Methodology for Web Usage Mining and its Application to Target Group Identification. Fuzzy Sets and Systems 148, No. 1, 139-152

BCI - Banco de Crédito e Inversiones (www.bci.cl) Founded in 1937 Started Virtual Bank in 1996 10,000+ Internet transactions daily

Process of knowledge discovery in databases (KDD) Interpretation Evaluation Transformation Data Mining Pre-processing Selection Patterns Data selected data pre-processed processed data transformed data

Methodology of Web Mining Combination of KDD process and Web Traffic analysis Log Files Sessions Transformed data Clean logs Integrated data Pattern Rules Clusters Selection Preprocessing Transformation Data Mining Interpretación

Current situation Registed Visitors of Virtual Bank (Traditional) Bank Customers 41,563 navigating customers 142,133 customers still not visitors of the web site

Questions Virtual bank Traditional bank How do my navigating customers behave? Are there segments of typical visitors? Is it possible to identify heavy users? Are there customers that look like heavy users? How can I convert these twins of heavy users to users of my web site?

Two step approach Virtual bank Traditional bank Clustering of navigating customers Determine profile of heavy users => Fuzzy Clustering Search for (traditional) customers that have a profile similar to that of heavy users Marketing campaign directed to these twins of heavy users => Neural Network

Results of Segmentation Class Age Trx Web N of Cases % Cases (years) Class L1 38 25 9130 22.0% Class L2 29 26 4277 10.3% Class M1 58 31 4599 11.1% Class M2 47 32 11829 28.5% Class H 34 141 11728 28.2% TOTAL 41563 100.0%

Neural networks (Multilayer Perceptron) connection with weights Neuron Input Layer Hidden Layer Output Layer

DataEngine Data acquisition Visualisation Data preprocessing Graphical Macro Language Fuzzy and Neural Modelling DataEngine ADL for (int i=0; i<=99; i++) { a[i]=0.0; b[i]=0.0; c[i]=0.0; } www.dataengine.de

Identification of twins with Neural networks Architecture of the Multilayer Perceptron: Number of input neurons: 6, corresponding to the attributes: age, gender, civil status, education, income, and antiquity. Number of neurons in the hidden layer: 12 (transfer function: sigmoid) Number of output neurons: 5, corresponding to the 5 classes of customers: H, L1, L2, M1 and M2.

Neural Network Results Class Selected Cases % Cases L1 32,602 22.9% L2 25,216 17.7% M1 35,805 25.2% M2 18,608 13.1% H 29,902 21.0% TOTAL 142,133 100.0%

Marketing Campaign Received Did not receive Total mailing mailing Customers from class H 11,567 18,335 29,902 Other customers 15,806 96,425 112,231 Total 27,373 114,760 142,133

Gains Chart Percentage of new customers Advanced selection 100% Random selection Percentage of 100% total customers

Marketing Campaign Week New visitors from class H that received the mailing New visitors from class H that did not receive the mailing New visitors from class H (total) 13 737 256 993 14 153 264 417 15 114 212 326 16 101 204 305

Marketing Campaign Results New visitors from class H that received the mailing Week Semana New Visitors Clientes nuevos 13 737 14 153 15 114 16 101 TOTAL 1,105 Response rate Twins = 1.105 11.0567 = 10%

Marketing Campaign Results New visitors from class H that did not receive the mailing Semana Week Clientes New Visitorsnuevos 13 256 14 264 15 212 16 204 TOTAL 936 Connection rate of twins of 936 heavy users without mailing 18.300 = 5%

Conclusion Natural connecting rate ~ 1.050 new customers /month ~ 2% of web site users Response rate after mailing to twins of heavy users = 10% Natural connecting rate of twins of heavy users (i.e. without receiving mailing) = 5%

Sistema de Pronóstico de Ventas utilizando Redes Neuronales y su Aplicación en la Cadena de Suministros de un Supermercado

Motivación del Problema Qué productos pido? Cuánto pido? Necesidad de pronóstico de ventas en el corto plazo

19-03-1998 26-03-1998 02-04-1998 09-04-1998 16-04-1998 23-04-1998 30-04-1998 12-03-1998 05-03-1998 26-02-1998 19-02-1998 Motivación del Problema Ventas... De qué dependen? Ventas pasadas 30.000 Precios Campañas Publicitarias 25.000 20.000 15.000 10.000 5.000 0 Día Monto ($) Estacionalidad Festivos 01-01-1998 08-01-1998 15-01-1998 22-01-1998 29-01-1998 05-02-1998 12-02-1998 Clima Venta de productos similares

Motivación del Problema Cómo administrar el inventario? Muy poco Quiebres de Venta. Clientes insatisfechos Mucho Costos de capital Desarrollar mejores técnicas de pronóstico y de acuerdo a esto gestionar nuestro inventario Aplicaciones exitosas con Redes Neuronales (ICA Handlarna Suecia, Cadena farmacéutica EE. UU.) Existen las tecnologías y conocimientos

Alcances del Proyecto Se acotará el ámbito de estudio a: Local La Pintana: Supermercado Tradicional con 4.500 m 2 Un subconjunto de productos: 50 PLU s más vendidos en el local (representan el 23,18% de las ventas) Con datos desde el 12/09/2000 a 31/07/2001

Knowledge Discovery in Databases: KDD 4.Interpretación y Evaluación 2.Selección y Preprocesamiento 3.Data Mining p(x)=0.02 Knowledge Conocimiento 1.Consolidación de los Datos Patrones y Modelos Warehouse DW Datos preparados Datos Consolidados Fuentes de Datos

1.Consolidación de los Datos Datos de diferentes fuentes: ORION Unidades Vendidas en local La Pintana desde 01/07/00 al 31/07/01 para los 50 PLU s más vendidos AC Nielsen ==> Precios semanales de los productos en el local de estudio y la competencia del micromercado (Santa Isabel, Ekono y Lider)

16-JUL-2001 26-JUN-2001 06-JUN-2001 17-MAY-2001 27-APR-2001 07-APR-2001 18-MAR-2001 26-FEB-2001 06-FEB-2001 17-JAN-2001 28-DEC-2000 08-DEC-2000 18-NOV-2000 29-OCT-2000 09-OCT-2000 19-SEP-2000 30-AUG-2000 10-AUG-2000 21-JUL-2000 01-JUL-2000 120 100 80 60 40 20 0 1.Consolidación de los Datos: Café 170 grs. Verano 118389

1.Consolidación de los Datos: Cerveza 1 Lt. 16-JUL-2001 26-JUN-2001 06-JUN-2001 17-MAY-2001 27-APR-2001 07-APR-2001 18-MAR-2001 26-FEB-2001 06-FEB-2001 17-JAN-2001 28-DEC-2000 08-DEC-2000 18-NOV-2000 29-OCT-2000 09-OCT-2000 19-SEP-2000 30-AUG-2000 10-AUG-2000 21-JUL-2000 01-JUL-2000 2000 1000 0 Navidad Año Nuevo 18 Sept Verano 114464

1.Consolidación de los Datos Características del día. Variables binarias (0,1) pago :Días de pago de fin de mes. quincena :Días de pago de quincena prefest :Días anteriores a feriados feriado :Días festivos patrias :Días de fiestas patrias santa :Días de semana santa vacation :Días de vacaciones (Enero y Febrero) verano :Días de meses estivales (desde 01/10 al 31/03) a_nuevo :1 de Enero. Único día del año donde los supermercados no venden.

Knowledge Discovery in Databases: KDD 4.Interpretación y Evaluación 2.Selección y Preprocesamiento 3.Data Mining p(x)=0.02 Knowledge Conocimiento 1.Consolidación de los Datos Patrones y Modelos Warehouse DW Datos preparados Datos Consolidados Fuentes de Datos

2.Selección y Preprocesamiento En la vida real los datos no están como quisieramos De los 50 PLU s originales hay 3 correspondientes a promociones De los 47 PLU s restantes 9 presentan ausencia de datos de más del 25% en la serie de tiempo LIMPIEZA DE DATOS!!!

2.Preprocesamiento Las ventas se escalaron entre 0 y 1 En base a los precios se crean las siguientes variables: PA(NºPLU)=precioPLU_Economax PB ( N º PLU ) = PC ( N º PLU ) = precioplu _ Economax MAX ( precioplu _ micromercado) precioplu _ Economax MIN( precioplu _ micromercado) Estas variables también se reescalan entre 0 y 1

Knowledge Discovery in Databases: KDD 4.Interpretación y Evaluación 2.Selección y Preprocesamiento 3.Data Mining p(x)=0.02 Knowledge Conocimiento 1.Consolidación de los Datos Patrones y Modelos Warehouse DW Datos preparados Datos Consolidados Fuentes de Datos

3.Data Mining: Enfoques de Solución Modelos Ingenuos (enfoque actual) Modelos Box Jenkins SARIMAX (p,d,q) (sp,sd,sq) Y Redes Neuronales Perceptrón Multicapas (MLP)

Análisis de Series de Tiempo Box, Jenkins (1976) MA(q) (FIR) AR(p) (IIR) ARMA (p,q) q t p t q n n t n e b e b e b Xt = + + = =... * 1 1 1 t p i i t i e x a Xt + = = 1 * t q n n t n p i i t i e e b x a Xt + + = = = 1 1 * *

Requisitos de ARMA Al menos 50 observaciones La serie debe ser estacionaria Modelos Box Jenkins

Modelos Box Jenkins Para convertir una serie no estacionaria en otra estacionaria se puede: Aplicar transformaciones logarítmicas Diferenciar la serie (X t -X t-1 ) ARIMA(p,d,q) donde d es Nº de términos diferenciados Seasonal ARIMA: SARIMA (p,d,q) (sp,sd,sq) SARIMAX con X variables externas (regresores)

Modelos Box Jenkins Time series transformation (from non-stationary to stationary) : apply logarithmic transformations differentiate the series (Xt - Xt-1) 700 600 500 400 300 200 X(t) X(t+1) X(t+1)-X(t) 100 0-100 0 20 40 60 80 100 120 140 160-200

Redes Neuronales Modelos de conectividad Resuelven problemas de: Clasificación de patrones Aproximación de funciones Clustering Optimización Memoria asociativa Predicción o pronóstico y k = f ( n i= 0 w ik x i )

Aplicaciones de Redes Neuronales Clasificación: Detección de Fraude Predicción de Fuga de Clientes Predicción de Compra de productos (marketing directo) Regresión: Estimación de riesgo de clientes (scoring) Pronóstico de índices financieros y bursátiles (monedas, metales, stock markets, bonos, etc.)

MLP para forecasting

Overfitting o Sobreajuste Sobreajuste de la red a los datos del problema y no al problema en sí Conjuntos de Entrenamiento y de Testeo

ARIMA v/s MLP Modelo Estadístico (ARIMA) Modelo lineal: asume un comportamiento de la serie a priori La modelación requiere que la serie sea estacionaria Requieren de conocimientos en Estadística e interacción con el usuario en la modelación El modelo entrega conocimiento e información en sus parámetros Bajo peligro de sobreajustar el modelo Redes Neuronales (MLP) Modelo no lineal: más grados de libertad para el modelo No impone requisitos estadísticos a la serie de tiempo a analizar Requieren menor interacción con el usuario Difícil lectura del modelo (caja negra) Fácil de sobreajustar el modelo a los datos

Desempeño del pronóstico: medidas de error Error Porcentual (Error porcentual absoluto medio) Error Normalizado (Error cuadrático medio normalizado) k k y k y k y N ) ( )) ˆ( ) ( ( 1 2 2 2 2 )) ˆ( ) ( ( 1 )) ( ) ( ( )) ˆ( ) ( ( k y k y N k y k y k y k y k k k = σ

Aplicación a PLU 100595 (Aceite Vegetal 1 Lt.) 56 MON 53 TUE 50 WED 47 THU 44 FRI 41 SAT 39 SUN 36 MON 33 TUE 30 WED 27 THU 24 FRI 21 SAT 19 SUN 16 MON 13 TUE 10 WED 7 THU 4 FRI 1 SAT 400 300 200 100 0 Date 100595

Modelos Tradicionales y MLP 100595 Conjunto de Entrenamiento Conjunto de Testeo Error Porcentual Error Normalizado Error Porcentual Error Normalizado ARIMA 36.21% 0.3301 40.49% 0.6090 Ingenuo 44.28% 0.6972 56.83% 1.2481 Ingenuo Estacional 64.67% 1.2212 45.75% 1.9217 Media Incondicional 59.98% 0.7759 48.54% 0.9689 100595 Conjunto de Entrenamiento Conjunto de Testeo Error Porcentual Error Normalizado Error Porcentual Error Normalizado MLPtw21 32.93% 0.4633 31.85% 0.4973 MLPtw14 31.15% 0.3115 34.64% 0.5703 MLPtw7 30.00% 0.3092 35.44% 0.5490 MLPtw6 32.45% 0.3761 33.53% 0.5112 MLPtw5 30.26% 0.3526 35.61% 0.5540 MLPtw3 29.61% 0.3002 34.36% 0.5281 MLPtw1 30.00% 0.3405 35.31% 0.5340 MLPtw0 34.12% 0.4760 31.80% 0.6244

En Resumen... Se realizaron pruebas con otros cinco productos, y se obtuvo que: ARIMA mejora los pronósticos obtenidos por métodos ingenuos Generalmente se obtienen mejores resultados con Redes Neuronales (RN) que con ARIMA ARIMA entrega un modelo comprensible y buenos resultados, pero con costos no despreciables (requerimientos estadísticos, y de conocimientos del usuario) RN obtienen los mejores resultados de forma más automática, pero con modelo tipo black box

Sistema de Reposición Periódica Reposición cada P días, con tiempo de entrega de L días. Con: INVENTARIO OBJETIVO T T=m +zσ m : demanda promedio durante P+L días (del sistema de pronóstico) Z σ: stock de seguridad (nivel de servicio*desviación ventas)

Reposición de Inventarios N ivel de I nv entari o Di ario PL U 100 595 800 700 600 500 400 300 200 100 0 19/09/0 0 03/10/00 17/10/00 31/10/00 14/11/00 28 /11/00 12 /12/00 26/12/00 09/01/01 23/01/01 06/02/01 20/02/01 06/03/01 20/03/01 03/04/01 17 /04/01 01 /05/01 15 /05/01 29/05/01 12/06/0 1 26/06/0 1 10/07/01 24/07/01 Unidade s Días Nive l d e Inventario Inventario Objetivo Quiebres de venta: 1% con 5 días de alcance en inventario (Antes: 6% de quiebre con 30 días de alcance) Aburto, L., Weber, R. (2006): Improved Supply Chain Management based on Hybrid Demand Forecasts. Applied Soft Computing, Elsevier, in press

Redes neuronales: Self-organizing feature maps de Kohonen Kohonen feature map Aplicación: Clustering Kohonenmap e1 e2 weight w input vectors 3 1 2 N... w1,1w1,2 w1,m wn,1 wn,2 wn,m Herramienta: DataEngine x1 x2 xm... 1 2 M

Web Intelligence in a bank www.tbanc.cl: first Chilean virtual bank Written in Spanish. 217 static web pages. Approximately eight million web log registers from the period January to March, 2003.

? Visitor browsing behavior

Visitor Behavior - Basic statistics Only 16% of the visitors visit 10 or more pages and 18% less than 4. The average number of visited pages is 6. Finally, applying various filters, approximately 400,000 vectors were identified.

Visitor browsing behavior Three variables are considered: the web page path, its content and the time spent when it is visited by a visitor. The visitor behavior vector is defined and a similarity measure between visitor session is introduced. ( p i, ti ) v r = [( p, t 1 )...( p, n t 1 n Where is a component that represents the page content, th its path and percentage of time spent in the i page visited. The vector maintains the page visit order. )]

= = L k c k c k t t t t L h h p p dp p p dg sm k k k k 1,, 1 ), ( ), min( ), ( ), ( β α β α α β β α β α r r Comparing browsing behavior Then the similarity measure is: where is an indicator of visitor interest is the page distance and dg is a graph distance, i.e., how similar are the paths between two sessions and dp is a page distance between the content of the visited pages. ), min( α β β α k k k k t t t t ), (,, c k c k p p dp β α

Vector space model M = ( m ) = f (1 + ij ij swi TR )*log( Q n i ) sw: special words array TR: Total special words p i ( m,..., m 1i Ri ) p j ( m,..., m 1 j Rj ) dp( p, p ) = cosθ = i j R k= 1 m ki m kj R R 2 ( m ki ) k= 1 k= 1 ( m kj ) 2 θ p i p j

Comparing sequences The sequence of a navigation can be represented by a graph. Each page is identified by an identification number. 1 2 3 4 5 6 7 S 1 S 2 = (1,2,6,5,8) =(1,3,6,7) 8 G G 1 2 E( G E( G = {1 2,2 = {1 3,3 1 2 ) = {1,2,6,5,8} ) = {1,3,6,7} 6,2 5,5 6,3 7} 8} We need to know how similar or different are both sequences!! dg( G 1, G 2 ) = 2 E( G E( G 1 1 ) IE( G ) + E( G 2 2 ) )

Applying neural networks for clustering For example, Self Organizing Feature Maps. Schematically, a SOFM is presented as a two-dimensional array in whose positions the neurons are located. Each neuron is constituted by an n-dimensional vector, whose components are the synaptic weights. The notion of neighborhood among the neurons provides diverse topologies. In this case a thoroidal topology was used, which means that the neurons closest to the ones of the superior edge, are located in the inferior and lateral edges.

Results by Neural Network Neurons winner frecuency 5 4 3 2 1 0 0 51015 Neurons Neuronsaxisi axe i 20 2530 0 5 10 15 20 25 30 Neurons axe j Neurons axis j

Results from Business point-of-view

Web site recommendations Offline. Structure. Changes in the internal (in the same site) and external (outside the site) links. Content. Mainly words to improve usefulness of the page for the visitor. Online. Principally about the pages that the visitor would be interested in visiting.

Conclusions Based on our results we propose changes on the web site, e.g.: Direct links from cluster 2 to cluster 4 (as can be seen, both are interested in Remote Services) Improved links inside each cluster Future work: analysis of effectiveness of changes by e.g.: Increased average number of clicks in visitor sessions Increased total time a visitor spends on the web site Increased revenue.

Common Research with The University of Tokyo Velásquez, J. D., Yasuda, H., Aoki, T., Weber, R., (2003): Using the KDD Process to Support Web Site Reconfigurations. The 2003 IEEE/WIC International Conference on Web Intelligence, October 13-16, 2003, Halifax, Canada, 511-515 Velásquez, J. D., Yasuda, H., Aoki, T., Weber, R. (2004): A new similarity measure to understand the visitor behavior in a web site. IEICE Transactions on Information and Systems, Vol. E87-D, No. 2, February, 389-396

Herramientas de Data Mining Clementine Darwin DataEngine Decisionhouse IBM Intelligent Miner KnowledgeSEEKER SAS Enterprise Miner......... Más información: www.kdnuggets.com

Experiencias 1/2 Tiempo proyectos necesitan más tiempo que estimado Calidad de los datos muy importante para lograr resultados válidos Cantidad de datos en general hay muchos datos disponible pero no siempre para apoyar la toma de decisiones (base de datos transaccional / bodegas de datos)