Skip to content

Final project for my postgraduate studies. In a complex, multistep process of loading CAR, how to visualize and measure impact of a single change to this process?

Notifications You must be signed in to change notification settings

emkaminsk/CAR-Postgraduate-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

The code was created as solution to a problem and as a final project for postgraduate course on University of Technology in Poznań in 2017.

The business problem is: in a complex, multistep process of loading CAR, how to visualize and measure impact of a single change to this process?

Basic terms:

  • CAR - Customer Analytical Repository. Corporate Datawarehouse DataMart, system that is loaded via a process composed of circa 850 SQL queries. The system of loading CAR is an acyclical chain of phases. The purpose of CAR is to deliver a business multilevel description and image describing customer via his: products, trends, behaviors, preferences, potencials and activities. Examples: average account inflows, number of internet banking successful logins.

  • CAR Area - a business area relating to a specific concept in banking, also a data table in CAR containing fields describing the concept per ID (per customer, account etc)

  • SQL Part - a single SQL query

  • CMT - Customer Mapping Table - SQL documentation and at the same time technically a formal description that is processed in the loading phase of ETL in order to extract the instructions for the software

  • characteristic - a unique metric defined in CAR

    The tool solves the problem in the following way:

    • step 1: parser of CMT files that extracts key parts of SQL queries (such as SELECT fields, FROM tables, fields in WHERE and GROUPBY conditions) and exports them.
    • step 2: create a graph database containing all the SQL elements extracted before with edges symbolizing useful connections (type of relation between field and table and relating tables to tables)
    • step 3: a mechanism that shows consequences / impact a given change in SQL query may have on other The scripts use among others: openpyxl, pathlib, networkx, re, matplotlib and pyqt5 libraries.

The code implements an original solution to the problem based on research and experiences of the author.

About

Final project for my postgraduate studies. In a complex, multistep process of loading CAR, how to visualize and measure impact of a single change to this process?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages