Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data inconsistency (served_by, operated_by) #1263

Open
slvlirnoff opened this issue Oct 30, 2018 · 4 comments
Open

data inconsistency (served_by, operated_by) #1263

slvlirnoff opened this issue Oct 30, 2018 · 4 comments
Assignees

Comments

@slvlirnoff
Copy link

Hi all,

I had several occurence of data inconsistency in routes and stop, in particular the relationship served_by or operated_by for stops and routes.

This is disturbing the valhalla fetch transit tool, because some routes aren't including in the bounding box. Mainly because relation route -> serves -> stops isn't correct and the route doesn't serve all the stops that have schedule_stop_pair associated to the route.

It seems to happens mostly after several update of a feed with new feed versions.

I have seen both routes that miss some served stops and routes that indicate serving stop that they don't serve. My guess is that the inconsistency are link to routes that have a similar route name, or identical gtfs_id than another route from a previous import.

Any ideas where to dig to fix this? Or potentially to rebuild the relationship based on schedule_stop_pairs origin/destination? Is the relationship using gtfs_id/name in any way that could cause an issue due to previous feed_versions?

Best,
Cyprien

@irees
Copy link
Member

irees commented Oct 31, 2018

Hi @slvlirnoff. Hmm. I know that transitland might keep around old stop served_by route relationships, but I haven't seen it fail to create new ones. If this is the case it is possible that it is caused by weirdness with the gtfs_id - the import process does try to match entities based on if the gtfs_id was seen in the previous import, assuming some level of ID stability. I will take a look and see if I can find the culprit. Alternatively I will provide a little script that can rebuild the served_by relations based on the current schedule_stop_pairs.

@irees irees self-assigned this Oct 31, 2018
@slvlirnoff
Copy link
Author

Yes, that could be it. Switzerland official public transport feed is likely to have a route that will suddenly have the gtfs_id of another route from a previous version.

The other aspect this feed have (also in common with the flixbus feed where I have seen similar issues) is that there are many routes that have similar name and potentially similar geographic bounding box (resulting maybe in an identical onestop_id).

@slvlirnoff
Copy link
Author

slvlirnoff commented Mar 6, 2020

Update: it seems to not be enough and after a new gtfs feed version imported it starts to happen again. I'm pretty sure it's in the right direction. I guess the matching to existing entities is too lax, but I'm not sure where to look in the code.

Hello @irees I've finally narrowed it down! It happens when several routes have the same name (in the same feed, across feed) in the same area.

For instance in switzerland we have buses and train that might have the name ("5" for instance) and in several cities you have a different route "5" and also potentially a train "5" across these cities too all in the same feed (or across different feeds).

I don't know exactly how it happens, but eventually these routes are all under a very generic geohash (like 3 letter 'u0q' which would be a bounding box across switzerland) so have the same id "r-u0q-5" for instance. Then I start to have bus routes that have the wrong transport type or that serve stops from other routes.

To fix it, I've put in my setup the gtfs id of the route within the name, but I guess a proper fix would be in how the geohash is computed. I've looked into the code but couldn't find the problem.

Potentially the train is integrated first, then all the bus routes are 'within' the same geohash and are detected as a different route pattern instead of a new route.

@slvlirnoff
Copy link
Author

After further digging in this particular case (switzerland feed), it seems that the feed provider generate the route id in some kind of sequential manner and use the same ids for different routes over time. The graph importer get eventually confused in find_by_eiff, returning wrong routes to update.

For now I've addressed it by adapting the graph and schedules importer to generate a new gtfs route_id which is relatively unique and stable for that feed. I guess a better fix would be some kind of 'tags' on the feed that prevent it from re-using old eiff and systematically deleting previous routes imported from feed (like it's done for rsp).

(Also playing with germany feed (https://gtfs.de/) it's even worth here, the agency ids changes on each feed version.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants