Skip to content

Populate all MySQL database tables in a single step.

License

Notifications You must be signed in to change notification settings

Tinram/MySQL-Filler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MySQL Filler

Fill all MySQL database tables.

Purpose

Populate all the tables of a MySQL database with a specified number of rows of junk data, and optionally jumble the foreign keys to create sufficient fake key relationships between tables for SQL joins.

Background

Some tasks require populating databases for testing. Database dumps are huge and the data usually verboten without anonymisation (i.e. contains Personally Identifiable Information). My old script fills database tables – useful for schema design but not queries.

Example Databases

database support notes
basketball ✔️ substantial intersection table
classicmodels ✔️
employees (old) ✔️
Joomla (old) ✔️ 60 tables
Northwind ✔️
Sakila ✖️
tpcc ✖️ partial with option fudges
world ✔️
WordPress (old) ✔️

Sakila uses sophisticated spatial data types.

Old schemas reference.

Requirements

  • An empty or truncated or throwaway database schema already present in the MySQL server.
  • Sufficient privileges granted for the connecting user.

Limitations

  • No support of MySQL spatial data types.
  • Capped length of longer data types.
  • Composite primary keys can be troublesome.

Usage

Import a database schema-only file into MySQL or use an existing 'throwaway' database already active within MySQL.

Do not use this package on a database that you care about: MySQL Filler will so surely trash it.

Ensure SELECT, INSERT, UPDATE, DROP privileges for the connecting user (DROP is required for table wiping.)

In config.py, edit the database credentials, and the MySQL Filler options required, then:

python3 main.py

For multiprocessing support (PROCS = <num_cpu_cores>) and a significant speed increase, copy the config.py file imports and global variables into src/mysql_filler.py and run as a standalone script.
It's un-pythonic and ugly, but executes multiprocessing more reliably.

Options

NUM_ROWS = 10                     # number of rows to add to all database tables
PROCS = 1                         # number of processes to spawn

JUMBLE_FKS = True                 # toggle random jumbling of foreign keys for joins
FK_PCT_REPLACE = 25               # percentage of NUM_ROWS of foreign keys to jumble

STRICT_INSERT = False             # toggle INSERT IGNOREs for duplicate hits / bypass strict SQL mode (warnings versus errors)

PROCESS_INT_FKS = True            # process (True) or skip (False) integer foreign keys (TPCC schema with tinyint PKs)
COMPOSITE_PK_INCREMENT = False    # skip (False) or increment (True) composite primary keys (TPCC schema)

COMPLEX_JSON = False              # False for simple fixed JSON data; True to generate variable JSON data

BYTES_DECODE = 'utf-8'            # character set used for byte data type conversion
MAX_PACKET = False                # True maximises the packet size (root user only)

DEBUG = False                     # debug output toggle
EXTENDED_DEBUG = False            # verbose debug output toggle

TRUNCATE_TABLES = False           # toggle truncation of all database tables (instead of populating)

Example Run

Using the simple MySQL world database,
import the database from the compressed file:

$ tar -xzOf world-db.tar.gz | mysql -h localhost -u root -p

Allocate a user with the required privileges:

mysql> GRANT SELECT, INSERT, UPDATE, DROP ON world.* TO 'general'@'localhost' IDENTIFIED BY 'P@55w0rd';
mysql> FLUSH PRIVILEGES;

Edit the config.py file and wipe all data that the world database ships with by setting TRUNCATE_TABLES to True:

TRUNCATE_TABLES = True
python3 main.py
Truncating all tables of `world` database ...

Change TRUNCATE_TABLES = False and execute main.py again:

python3 main.py
localhost
world
+10 rows ...
mysql> USE world;
mysql> SELECT ID, CountryCode, country.Code, Code2 FROM city INNER JOIN country ON country.Code = city.CountryCode;

Unlikely to be any results returned by adding just 10 rows; perhaps one 'lucky' row returned for this database in 100 rows added.

NUM_ROWS = 200
...
JUMBLE_FKS = True

Execute main.py

localhost
world
+200 rows

`city`
`country`
`countrylanguage`

foreign keys jumbled

A few results now returned from the previous query, and a CountryCode can be selected:

SELECT ID, CountryCode, country.Code FROM city INNER JOIN country ON country.Code = city.CountryCode WHERE CountryCode = 'ewu';
+----+-------------+------+
| ID | CountryCode | Code |
+----+-------------+------+
| 74 | ewu         | EWU  |
+----+-------------+------+
1 row in set (0.00 sec)

Meaningless data, but the foreign keys are starting to gain relationships, and so SQL joins between tables are now realised. More rows can be added to increase the number of results.

Speed

Speed with multiprocessing (combining config.py into one script) is okay. Speed never was on the agenda.

For serious speed, there's Percona's Go-based mysql_random_data_load. Currently, this tool fills one table at a time – fast – yet somewhat laborious for databases with lots of tables, whereas I wanted all database tables populated with one command.

MariaDB

MariaDB has limited support. It has more restrictions on key constraints and is less forgiving than MySQL 5.7 or 8.0

Other

Tested using MySQL 5.7 and 8.0, and MariaDB 10.4

This package cannot hope to support all variations (good and bad) of MySQL schemas.

For example, adding 1,000 rows to the following real-world table is not going to run smoothly:

+----------------+------------+------+-----+---------+
| Field          | Type       | Null | Key | Default |
+----------------+------------+------+-----+---------+
| google_channel | char(3)    | YES  |     | NULL    |
| locale_id      | tinyint(3) | YES  | UNI | NULL    |
+----------------+------------+------+-----+---------+

– because of the restricted range of TINYINT values, the number of rows, and the unique key.

Composite primary keys can cause trouble. However, the config.py options allow overrides to enable at least basic table population.

License

MySQL Filler is released under the GPL v.3.

Releases

No releases published

Packages

No packages published

Languages