A minimal framework for managing migrations
TransPyData implements a generic pipeline to perform migrations. It has 2 main components. First one is TransPy
class, which executes the migration pipeline according to a configuration. Second the data services implementations (IDataInput
, IDataProcess
and IDataOutput
), this services manages how data is gathered, processed and sent to the new destination.
The TransPy
class manages the migration pipeline. It needs to be provided with an instance of:
IDataInput
: Manages the gathering of source data.IDataProcess
: Manages data transformation and filtering prior to pass it to the data output.IDataOutput
: Manages data sending to the new destination.
NOTE: Data services overview below
Apart from the data services there are other optional configurations:
trans_py = TransPy()
config = {
'datainput_source': [], # If working with single record pipeline this should be an iterable of data to feed IDataInput
'datainput_by_one': False, # Enable single record pipeline on input
'dataprocess_by_one': False, # Enable single record pipeline on processing
'dataoutput_by_one': False, # Enable single record pipeline on output
}
trans_py.configure(config)
The values in the snippet are the defaults, so by default the migration will move all migration data through the pipeline at once.
When all data services have the "by_one" flag to False
the migration will move all data at once through the pipeline. So the TransPy
instance will call the method get_all
of IDataInput
configured to get all input data, with the response will call process_all
of IDataProcess
, and with the response of IDataProcess
will call send_all
of IDataOutput
. Finally a list with IDataOutput
results is returned by TransPy
.
If "by_one" flags are True
the data are "queried" by one and moved through all the pipeline. The IDataOutput
return are accumulated and returned as list at the end of the processing, so the TransPy
return type is the same.
There are some additional cases, what if datainput and dataprocess are in "by_one" mode and dataoutput not? In this case the data is gathered and processed one by one, at the end of processing (IDataProcess
) the results are accumulated and the IDataOutput
is called with all data. Similar case when dataprocess and dataoutput are in "by_one" mode, data is gathered all at once and then piped one by one through IDataProcess
and IDataOutput
.
under construction
To start a migration create an instance of TransPy
and configure it. At least instances of IDataInput
, IDataProcess
and IDataOutput
needs to be provided. Prior to starting the migration the data services might need to be configured too. Here is an code example:
import json
from transpydata import TransPy
from transpydata.config.datainput.MysqlDataInput import MysqlDataInput
from transpydata.config.dataprocess.NoneDataProcess import NoneDataProcess
from transpydata.config.dataoutput.RequestDataOutput import RequestDataOutput
def main():
# Configure imput
mysql_input = MysqlDataInput()
config = {
'db_config': {
'user': 'root',
'password': 'TryingTh1ngs',
'host': 'localhost',
'port': '3306',
'database': 'migration'
},
'get_one_query': None, # We'll go with all query
'get_all_query': """
SELECT s.staff_Id, s.staff_name, s.staff_grade, m.module_Id, m.module_name
FROM staff s
LEFT JOIN teaches t ON s.staff_Id = t.staff_Id
LEFT JOIN module m ON t.module_Id = m.module_Id
""",
'all_query_params': {} # No where clause, no interpolation
}
mysql_input.configure(config)
# Configure process
none_process = NoneDataProcess()
# Configure output
request_output = RequestDataOutput()
request_output.configure({
'url': 'http://localhost:8008',
'req_verb': 'POST',
'headers': {
'content-type': 'application/json',
'accept-encoding': 'application/json',
'x-app-id': 'MT1'
},
'encode_json': True,
'json_response': True
})
# Configure TransPy
trans_py = TransPy()
trans_py.datainput = mysql_input
trans_py.dataprocess = none_process
trans_py.dataoutput = request_output
res = trans_py.run()
print(json.dumps(res))
if __name__ == '__main__':
main()
Full working example could be found at examples/mysql_to_http/
, there is a docker-compose to launch an instance of mysql and a webserver.
For now you can check the interfaces IDataInput
, IDataProcess
and IDataOutput
to see what needs to be implemented in a custom data service.
(I'll improve this section in the future)