Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated ids found in: fare_products - The returned object is not a tidygtfs object #39

Open
coding-to-music opened this issue Jul 15, 2023 · 2 comments

Comments

@coding-to-music
Copy link

Hello,
I was running this project:

https://github.com/coding-to-music/r-stringlines-nyc-mta-gtfs-train-visualization

And I saw an error using tidytransit with the MBTA GTFS feed, a new file fare_products.txt

#34

produces this error when running the R program:

# This may be unrelated, not sure:

Error in UseMethod("group_by") :
  no applicable method for 'group_by' applied to an object of class "NULL"
In addition: Warning message:

# This is the actual error:

In gtfs_to_tidygtfs(g, files = files) :
  Duplicated ids found in: fare_products
The returned object is not a tidygtfs object, you can use as_tidygtfs() after fixing the issue.

To fix, back up the zip file so you have an original copy:

cp MBTA_GTFS.zip MBTA_GTFS_original.zip

Now remove the offending file from the zip file

zip -d MBTA_GTFS.zip fare_products.txt

Now the file can be used as normal

I was able to produce many stringlines, after the fare_products.txt was removed
https://github.com/coding-to-music/r-stringlines-nyc-mta-gtfs-train-visualization/tree/main/stringlines

@rymarczy
Copy link
Contributor

rymarczy commented Jul 16, 2023

This appears to be an error within the tidytransit package.

tidytransit believes the primary_key field for the fare_products.txt table is the fare_product_id column:

  # fare_products
  m$fare_products <- spec_setup_fields(
    c("fare_product_id", "fare_product_name", "amount",
      "currency"),
    c("req", "opt", "req", "req"),
    c("character", "character", "numeric", "numeric"), # TODO currency should be handled with integers
    "opt",
    "fare_product_id") ## primary_key ##

Per GTFS documentation, https://gtfs.org/schedule/reference/#fare_productstxt, the primary key is a combination of fare_product_id and fare_media_id:

fare_products.txt

File: Optional

Primary Key (fare_product_id, fare_media_id)

fare_media_id can also be NULL in the fare_products.txt table, so tidytransit would also have to handle that.

@coding-to-music
Copy link
Author

It's an interesting question - MBTA is not responsible for Tidytransit - and Tidytransit is trying to be compatable with all the transit systems in the world - not sure if the files are expected to be automatically importable - Tidytransit is not able to read documentation, the files are expected to be self-importable. Having a unique index sequence id column could solve the problem. Otherwise people are not going to be able to use Tidytransit for MBTA and will spin for hours/days trying to figure out the problem. But technically it's not MBTA's problem if a third party can't read the files...
It is interesting that all the many other files in the MBTA.zip are able to be read. Anyway, just fyi about this issue. Thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants