Import and export data from SilverStripe in various forms, including CSV. This module serves as a replacement/overhaul of BulkLoader functionality found in the framework.
- Raw data is retrieved from a source (
BulkLoaderSource
). - Data is provided as iterable rows (each row is heading->value mapped array).
- Rows are mapped to a standardised format, based on a user/developer provided mapping.
- Data is set/linked/tranformed onto a placeholder DataObject.
- Existing record replaces placeholder, or placeholder becomes the brand new DataObject.
- DataObject is validated and saved.
- All results are stored in
BulkLoader_Result
.
Users can choose which columns map to DataObject fields. This removes any need to define headings, or headings according to a given schema. Users can state if the first line of data is in fact a heading row. Mappings are saved for the next time an import is done on the same GridField.
This is a grid field component for users to selecting a CSV file and map it's columns to data fields.
$importer = new GridFieldImporter('before');
$gridConfig->addComponent($importer);
The importer makes use of the CSVFieldMapper
, which displays the beginning content of a CSV.
A BulkLoaderSource
provides an iterator to get record data from. Data could come from anywhere such as a CSV file, a web API, etc.
It can be used independently from the BulkLoader to obtain data.
$source = new CsvBulkLoaderSource();
$source->setFilePath("files/myfile.csv")
->setHasHeader(true)
->setFieldDelimiter(",")
->setFieldEnclosure("'");
foreach($source->getIterator() as $record){
//do stuff
}
- Saves data from a particular source and persists it to database via the ORM.
- Determines which fields can be mapped to, either scaffolded from the model, provided by configuration, or both.
- Detects existing records, and either skips or updates them, based on criteria.
- Maps the source data to new/existing dataobjects, based on a given mapping.
- Finds, creates, and connects relation objects to objects.
- Can clear all records prior to processing.
$source = new CsvBulkLoaderSource();
$source->setFilePath("files/myfile.csv");
$loader = new BetterBulkLoader("Product");
$loader->setSource($source);
$loader->addNewRecords = false; // an option to skip new records
$result = $loader->load();
Often you'll want to confine bulk loading to a specific DataList. The ListBulkLoader is a variation of BulkLoader that adds and removes records from a given DataList. Of course DataList iself doesn't have an add method implemented, so you'll probably find it more useful for a HasManyList
.
$category = ProductCategory::get()->first();
$source = new CsvBulkLoaderSource();
$source->setFilePath("productlist.csv");
$loader = new ListBulkLoader($category->Products());
$loader->setSource($source);
$result = $loader->load();
You can provide a columnMap
to map incoming records to a standard format.
$loader->columnMap = array(
'first name' => 'FirstName',
'Name' => 'FirstName',
'bio' => 'Biography',
'bday' => 'Birthday',
'teamtitle' => 'Team.Title',
'teamsize' => 'Team.TeamSize',
'salary' => 'Contract.Amount'
);
This column map is generated by the CSVFieldMapper
control inside the GridFieldImporter
component.
Mappable fields will be scaffolded if you do not define them yourself. This includes fields that are on relations, so that relations can be linked up.
It is likely there will be fields you don't want mapped, in which case you should specify a mappableFields
array on your loader:
$loader->mappableFields = array(
'FirstName' => 'First Name',
'Surname' => 'Last Name',
'Biography' => 'Biography',
'Birthday' => 'Birthday',
'Team.Title' => 'Team'
);
You may want to perform some transformations to incoming record data. This can be done by specifying a callback against the record field name.
$loader->transforms = array(
'Code' => array(
'callback' => function($value, $placeholder) {
//capitalize course codes
return strtoupper($value);
}
)
);
Incoming records without required data will be skipped.
$loader->transforms = array(
'Title' => array(
'required' => true
)
);
Note that empty records are skipped by default.
The bulk loader can handle linking and creating has_one
relationship objects, by either providing a callback, or using the Relation.FieldName
style "dot notation". Relationship handling is also performed in the transformations
array.
You can specify at the BulkLoader level if records will be created and linked, then you can also specify the behaviour for each field. The default behaviour is to both link and create relation objects.
Here are some configuration examples:
$loader->transforms = array(
//link and create courses
'Course.Title' = array(
'link' => true,
'create' => true
),
//only link to existing tutors
'Tutor.Name' => array(
'link' => true,
'create' => false
),
//custom way to find parent courses
'Parent' => array(
'callback' => function($value, $placeholder) use ($self){
return Course::get()
->filter("Title", $value)
->first();
}
)
);
Note that $placeholder
in the above example refers to a dummy DataObject that is populated in order to then be saved, or checked against for duplicates. You should not call $placeholder->write()
in your callback.
In the same way that you may use a ListBulkLoader
to constrain records to a given DataList, you may also want to constrain the relation records to a List.
$loader->transforms = array(
//link and create courses
'Course.Title' = array(
'list' => $self->Courses()
)
);
Duplicate checks are performed on record data, mapped into the standardised form.
You can perform duplicate checking on data fields:
//course is a duplicate when title is the same
$loader->duplicateChecks = array(
"Title"
);
Or on a relation:
//course selection is a duplicate when course is the same
$loader->duplicateChecks = array(
"Course.Title"
);
Duplicates can also be found using a callback function:
$loader->duplicateChecks = array(
"FooBar" => array(
"callback" => function($fieldName, $record) {
if(!isset($record["FirstName"]) || !isset($record["LastName"])){
return null;
}
return Person::get()
->filter("FirstName", $record['FirstName'])
->filter("LastName", $record['LastName'])
->first();
}
)
);
If you are importing instances of SiteTree, you can have those pages automatically published using this configuration:
$loader->setPublishPages(true);
Some simple yaml config options to help with swapping out all the importer functionality.
ModelAdmin:
removelegacyimporters: true
addbetterimporters: true
Remove only the scafolded (non-custom) importers:
ModelAdmin:
removelegacyimporters: scaffolded
If you are writing relation objects during loading, and they fail validation, the loader will simply ignore that relation object.
If you have mapped multiple fields mapping to the same relation, you may get situations where the incorrect existing relation object is joined. The first field that is mapped is the same field used to find the relation. For example, you'll likely want a Title to be used to find/create a relation, and then an Amount will be added to that same relation, rather than finding/creating a relation by an Amount, and setting the Title.
Define the correct ordering using the mappableFields array to fix this.
Please do contribute whatever you can to this module. Check out the issues and milestones to see what has needs to be done.
MIT
Jeremy Shipman (http://jeremyshipman.com)