-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data inconsistency (served_by, operated_by) #1263
Comments
Hi @slvlirnoff. Hmm. I know that transitland might keep around old stop served_by route relationships, but I haven't seen it fail to create new ones. If this is the case it is possible that it is caused by weirdness with the gtfs_id - the import process does try to match entities based on if the gtfs_id was seen in the previous import, assuming some level of ID stability. I will take a look and see if I can find the culprit. Alternatively I will provide a little script that can rebuild the served_by relations based on the current schedule_stop_pairs. |
Yes, that could be it. Switzerland official public transport feed is likely to have a route that will suddenly have the gtfs_id of another route from a previous version. The other aspect this feed have (also in common with the flixbus feed where I have seen similar issues) is that there are many routes that have similar name and potentially similar geographic bounding box (resulting maybe in an identical onestop_id). |
Update: it seems to not be enough and after a new gtfs feed version imported it starts to happen again. I'm pretty sure it's in the right direction. I guess the matching to existing entities is too lax, but I'm not sure where to look in the code. Hello @irees I've finally narrowed it down! It happens when several routes have the same name (in the same feed, across feed) in the same area. For instance in switzerland we have buses and train that might have the name ("5" for instance) and in several cities you have a different route "5" and also potentially a train "5" across these cities too all in the same feed (or across different feeds). I don't know exactly how it happens, but eventually these routes are all under a very generic geohash (like 3 letter 'u0q' which would be a bounding box across switzerland) so have the same id "r-u0q-5" for instance. Then I start to have bus routes that have the wrong transport type or that serve stops from other routes. To fix it, I've put in my setup the gtfs id of the route within the name, but I guess a proper fix would be in how the geohash is computed. I've looked into the code but couldn't find the problem. Potentially the train is integrated first, then all the bus routes are 'within' the same geohash and are detected as a different route pattern instead of a new route. |
After further digging in this particular case (switzerland feed), it seems that the feed provider generate the route id in some kind of sequential manner and use the same ids for different routes over time. The graph importer get eventually confused in find_by_eiff, returning wrong routes to update. For now I've addressed it by adapting the graph and schedules importer to generate a new gtfs route_id which is relatively unique and stable for that feed. I guess a better fix would be some kind of 'tags' on the feed that prevent it from re-using old eiff and systematically deleting previous routes imported from feed (like it's done for rsp). (Also playing with germany feed (https://gtfs.de/) it's even worth here, the agency ids changes on each feed version.) |
Hi all,
I had several occurence of data inconsistency in routes and stop, in particular the relationship served_by or operated_by for stops and routes.
This is disturbing the valhalla fetch transit tool, because some routes aren't including in the bounding box. Mainly because relation route -> serves -> stops isn't correct and the route doesn't serve all the stops that have schedule_stop_pair associated to the route.
It seems to happens mostly after several update of a feed with new feed versions.
I have seen both routes that miss some served stops and routes that indicate serving stop that they don't serve. My guess is that the inconsistency are link to routes that have a similar route name, or identical gtfs_id than another route from a previous import.
Any ideas where to dig to fix this? Or potentially to rebuild the relationship based on schedule_stop_pairs origin/destination? Is the relationship using gtfs_id/name in any way that could cause an issue due to previous feed_versions?
Best,
Cyprien
The text was updated successfully, but these errors were encountered: