-
Notifications
You must be signed in to change notification settings - Fork 7
Related work
- http://linkeduniversities.org/lu/index.php/datasets-and-endpoints/
- http://data.linkededucation.org/linkedup/catalog/browse/
Analytics data of data.gov.uk: http://data.gov.uk/data/site-usage. They collect similar data with Google Analytics for the Swiss open data data portal, but currently it's not clear if this is going to published and how. Additionally they setup logging on Amazon S3 for all the primary data that is hosted there. They it's important to not just look at the site statistics, but also at the downloads of the primary data. Typically a developer of an app or visualization is using the portal one time to find the link to the interesting data. From that point on, she might only use the direct link, thus not showing up anymore in the logs of the portal. Stefan Oderbolz 22 Jan ckan-discuss
http://thomaslevine.com/open-data/
http://wiki.lib.sun.ac.za/index.php/OpenData
"Using http://data.police.uk/about/ as an example of how to do a really good statement about open data quality", JeniT
"the web is powered by feedback loops between people and information", Jon Kleinberg
The Amsterdam Manifesto on Data Citation Principles
There have been some discussions about the semantic web reaching the "slope of enlightment" on the lifesci list.
http://semwebquality.org has some very nice materials regarding data quality.
http://qualitywebdata.org/ by Michael Hausenblas, but hasn't been active for a while.
A very nice taxonomy of quality aspects by Hartig and Flemming.
The Pedantic Web Group works to get errors and bad practices fixed by engaging the data publishers.
Helena Deus announced her Survey of Linked Data Quality Metrics
Bernard Vatant's discussion about unresolvable vocabulary in LOD.
Survey of ontology libraries by Natasha Noy.
Aidan Hogan, Jürgen Umbrich, Andreas Harth, Richard Cyganiak, Axel Polleres, Stefan Decker "An empirical survey of Linked Data conformance" Journal of Web Semantics 2012 (to appear). http://sw.deri.org/~aidanh/docs/ldstudy12.pdf http://www.sciencedirect.com/science/article/pii/S1570826812000352?v=s5
http://slidewiki.org/application/questionnaire.php
http://en.wikipedia.org/wiki/OntoClean
http://blog.hubjects.com/2012/03/lov-stories-part-2-gardeners-and.html
"There should be official sem web tests, badges and achievements based on passing things like this. Described in linked data of course!" - Melvin Carvalho 4 Jan 2013 on public-lod
http://sindice.com/developers/publishing
http://nanopub.org/wordpress/?page_id=57
Clinical Quality Linked Data release from HDI 2011 mentioned at hhs challenge
survey of ontology use by Paul Warren Knowledge Media Institute (announcement)
Stian's responses to Sarven's provenance dataset: http://www.w3.org/mid/51364C90.4060501@csarven.ca
Why Linked Data is Not Enough for Scientists
The Amsterdam Manifesto on Data Citation Principles
http://sites.tufts.edu/liam/deliverables/prospectus-for-linked-archival-metadata-a-guidebook/
http://www.dbis.informatik.hu-berlin.de/fileadmin/research/papers/conferences/2009-ldow-hartig.pdf
Nine simple ways to make it easier to (re)use your data
http://www.data.gov/blog/under-hood-open-data-engine
Beyond Data: Building a Web of Needs
Jul 15 2013
Dear Timothy Lebo,
With its increased rate of adoption, Linked Data is becoming a valuable commodity in numerous domains across the web. But, how valuable is Linked Data after all? How much did it cost to create and publish a dataset as RDF? What is the value of a dataset? To gather information on details of the creation of datasets and then estimate the value of Linked Open Data in terms of time and money, we are conducting a survey. We came across your dataset at [1]. Thus, we would like to request you to fiil out the survey at: http://goo.gl/dLAl8.
This survey contains 23 questions and will take about 10-15 minutes to complete. The results of this survey will be summarized and used to estimate the value of Linked Data and will be made accessible to the survey participants as well as the general public. Please note: if you have more than one dataset that you have published, please fill the questionnaire separately for each of the datasets.
Thank you very much for your time.
[1] http://datahub.io/dataset/twc-ieeevis
Regards,
Ms. Amrapali Zaveri
University of Leipzig - Department of Computer Science
Paulinium 618, Augustusplatz 10, 04109 Leipzig, Germany
http://aksw.org/AmrapaliZaveri
Helena's survey
A lot of open data isn't openly licensed: http://thomaslevine.com/!/open-data-licensing/
QALD-4, the fourth in a series of evaluation campaigns on multilingual question answering over linked data http://sourceforge.net/mailarchive/message.php?msg_id=31925311
http://linter.structured-data.org/
http://vmwebsrv01.deri.ie/sites/default/files/publications/paperiswc.pdf
SPARQL Endpoint Status (sparql-es)
- http://sparqles.okfn.org/iswc2013/ data from paper
- http://www.google.com/url?q=http%3A%2F%2Fsparqles.okfn.org&sa=D&sntz=1&usg=AFQjCNF8JP0m0cavehfwp6GywspeHMcebA
http://graphite.ecs.soton.ac.uk/prov/
http://mappings.dbpedia.org/server/statistics/en/?show=100000
OOPS! OntOlogy Pitfall Scanner http://oeg-lia3.dia.fi.upm.es/oops/index-content.jsp
http://wiki.publicdata.eu/wiki/CSV2RDF_Application
https://github.com/cbaillie/QualityAssessmentFramework
- Tomas Knap presented a poster on ODCleanStore at ISWC 2012. Some more documentation is here.
Mondeca hosts a dashbord showing SPARQL endpoint status for all SPARQL endpoints mentioned in http://thedatahub.org/group/lodcloud (Dr. Pierre-Yves Vandenbussche).
SEALS is a rather complete infrastructure, but focuses on tools not data.
-
García-Castro R.; Esteban-Gutiérrez M.; Gómez-Pérez A. "Towards an Infrastructure for the Evaluation of Semantic Technologies". eChallenges e-2010 Conference (e-2010). pp. 1-8. Warsaw, Poland. 27-29 October 2010.
-
Jiao's validator
LODStats is a python based triple streaming processor. Does it give consumers a voice, or are they just centralized?
- Jan Demter, Sören Auer, Michael Martin, Jens Lehmann: LODStats – An Extensible Framework for High-performance Dataset Analytics, submitted to ESWC2012
Linked Open Vocabularies http://labs.mondeca.com/dataset/lov/details/vocabulary_geosp.html accepts new vocabularies to evaluate at http://labs.mondeca.com/dataset/lov/suggest/ and has documentation for how to publish vocab.
http://ckan.org/2012/01/09/qa-on-thedatahub/
http://lod2.eu/Project/WIQA.html
http://swse.deri.org/RDFAlerts/ (Aidan Hogan) superseded by http://inspector.sindice.com/
https://github.com/cygri/make-void died in 2010; it is limited to files that fit in memory.
RDFStats http://rdfstats.sourceforge.net/ died in 2012
http://www.cs.ox.ac.uk/isg/tools/LogMap/ matches two given ontologies.
http://code.google.com/p/py-triple-simple/ by Janos Hajagos paper, slides cites Joslyn BTC 2010, which does "predicate bigrams".
https://github.com/kwijibo/void-import-to-thedatahub at http://keithalexander.co.uk/void-import-to-thedatahub/ imports void into ckan.
http://www.w3.org/2009/sparql/sdvalidator and http://validator.linkeddata.org/vapour give EARL with conneg.
http://hcls.sindicetech.com/explore/
http://linkeddata.informatik.hu-berlin.de/uridbg/
LOV:
State of LOD: http://www4.wiwiss.fu-berlin.de/lodcloud/state/#terms links into ckan.
http://ckan.org/2011/07/05/google-refine-extension-for-ckan/
CKAN already supports describing datasets in RDFa by using some commonly used vocabularies (e.g. DCat [1]). See an example by taking the URL of any dataset in CKAN and pasting it into the RDFa Distiller [2]. Given that the European Comission ADMS Working Group has recently published the related Reposiotry, Asset, Distribution (RADion) vocabulary [3][4], I wonder if CKAN should support this vocabulary in RDFa as well. What do you think about RADion? Were any of you involved in its development? [1] http://www.w3.org/TR/vocab-dcat/ [2] http://www.w3.org/2007/08/pyRdfa/ [3] https://joinup.ec.europa.eu/asset/radion/home [4] http://www.w3.org/ns/radion Augusto Herrmann Open Data Team - dados.gov.br
http://oeg-lia3.dia.fi.upm.es/oops/index-content.jsp
query by temporal factors later, for example: 1) how the graph changed between 20th October 2012 to 30th October 2012. I want to see all updates. 2) Snapshot of a particular node on 20th July 2012, 25th July 2012, etc.:
- http://mementoweb.org
- http://arxiv.org/abs/1003.3661
- https://datatracker.ietf.org/doc/draft-vandesompel-memento/ (There is a DBpedia archive accessible via the Memento protocol: http://mementoweb.org/depot/native/dbpedia/ )
EARL [2] ReSpec HTML+RDFa rollup reports
For help in publishing great data, be sure you visit the Sindice Web Data Inspector. The Web Data Inspector will assist you by providing interactive data visualization and validation services. 1
http://www.bioontology.org/wiki/index.php/Ontology_Metrics?pop=true
http://aers.data2semantics.org/yasgui/ pulled the list of LOD SPARQL endpoints from http://semantic.ckan.net/sparql (which is no longer supported).
ckanext-qa
http://purl.org/openorg/corrections
http://lod.openlinksw.com and http://lod.openlinksw.com/sparql hosts 51 Billion+ Triples culled from across the LOD Cloud. Basically, all the datasets that OpenLink can get our hands on. -Kingsley Jul 2013
There are quite a few testing frameworks available but I think only two major once make sense for ckan. The first is Twill and the second is WebTest.
Docs: http://twill.idyll.org/
- more straight forward language
- record sequences of actions
- uses only beautiful soup
- poor docs
- not actively maintained (however, there is a retwill fork on github)
Docs: http://webtest.pythonpaste.org/en/latest/index.html
- actively maintained
- integrated into major python web frameworks, recommended for pylons
- you can choose between lxml html, lxml xml, beautiful soup, pyquery and json
- good documentation
- No real webtesting (in an actual browser) since js is ignored
- sometimes a little bit difficult to understand how to select links/ forms.
Overall, I think WebTest is the way to go which is why I added a few quick examples tat demonstrate how to use forms, click
and xpaths/ pyquery. Pull request: https://github.com/okfn/ckan/pull/130
The ticket for the whole thing is here: http://trac.ckan.org/ticket/2934
So I was quite happy with the way I was writing UI tests for ckanext-cmap: using paste.fixture.TestApp to request pages, then using BeautifulSoup to parse the results. Note the most concise thing in terms of saving on typing but simple enough.
WebTest is based on paste.fixture.TestApp but apparently parts of it have been rewritten to use WebOb. As far as I can see the interface of the TestApp and Response objects are pretty much the same as those from paste. The documentation looks better or at least easier to find. The response object has builtin convenience methods for getting BeautifulSoup, ElementTree, LXML, or PyQuery parsed copy of the body, which as far as I know paste's Response object didn't have, but it was only one line of code to get it yourself anyway.
We would have to add webtest and I guess one of BeautifulSoup, ElementTree or LXML to pip-requirements-test.txt (anyone have a preference?)
So WebTest looks like just the thing to me.
I think it would probably be worthwhile for Dominik to write WebTest tests for a couple of parts of CKAN and get them reviewed and merged into master, and then copy a couple of examples from them and paste them into a new section in the CKAN Coding Standards, explaining what our best practice is for doing UI tests. I think that's important because using the right tool won't stop us from writing terrible tests with it.
P.S. For testing JavaScript, which WebTest doesn't do, there is actually a JavaScript test framework builtin into CKAN now, that came with the big demo merge.
P.P.S. Dominik I see that you're calling these "integration" tests, in CKAN currently these kind of UI tests (that use paste.fixture.TestApp) are called "functional" tests (see ckan/tests/functional). So maybe you just want to call yours functional tests, or maybe there's a distinction to be made between functional tests that test the contents of individual pages in detail and process or integration tests that test clicking through multiple pages but without checking the contents of each page in thorough detail.
http://www.semantic-web-journal.net/content/quality-assessment-methodologies-linked-open-data
http://dataweb.medialab.ntua.gr/
http://www.w3.org/2013/04/odw/
http://www.iaria.org/conferences2013/ICIW13.html
http://data.semanticweb.org/usewod/2013/
http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2013_01_17
ISWC 2012:
http://www.iaria.org/conferences2013/CfPWEB13.html
18th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2012) National University of Ireland Galway Quadrangle October 8-12, 2012. http://ekaw2012.ekaw.org
1st International Workshop on Learning Analytics and Linked Data (#LALD2012) in conjunction with the 2nd Conference on Learning Analytics and Knowledge (LAK’12)
- Workshop website: http://lald.linkededucation.org/
I-SEMANTICS 2012
- Quality of Semantic Data on the Web
- Provenance information for the Web of Data
- Large scale ontology inspection and repair
- Co-reference detection and dataset reconciliation
- Maintenance of Linked Data models
- Trust, privacy and security in Semantic Web applications
Linked Data on the Web (LDOW2012) at WWW http://events.linkeddata.org/ldow2012/
LDOW2012 workshop
- evaluating quality and trustworthiness of Linked Data
International Conference on Dublin Core and Metadata Applications 2012
- Metadata quality (methods, tools, and practices)
SePublica2012 an ESWC2012 Workshop
- Provenance, quality, privacy and trust of scientific information
Journal of Web Semantics Special Issue on Evaluation of Semantic Technologies. Special Issue on Visualisation of and Interaction with Semantic Web Data. Special issue of the International Journal on Semantic Web and Information Systems http://www.ijswis.org/?q=node%2F41 Editors: Matthew Rowe , Aba-Sah Dadzie
http://ontologymatching.org/publications.html
- wang & strong (1996 – beyond accuracy) Recommended by Helena
chimaera papers - KSL-99-17 has more details on the tests.
KSL-00-08 http://ksl.stanford.edu/KSL_Abstracts/KSL-00-08.html
McGuinness, D. L.; Fikes, R.; Rice, J.; & Wilder, S. An Environment for Merging and Testing Large Ontologies. Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning (KR2000), Breckenridge, Colorado, April, 2000. KSL-00-09 http://ksl.stanford.edu/KSL_Abstracts/KSL-00-09.html McGuinness, D. L.; Fikes, R.; Rice, J.; & Wilder, S. The Chimaera Ontology Environment. Proceedings of the The Seventeenth National Conference on Artificial Intelligence (AAAI 2000), July 30-August 3, 2000. KSL-99-17 http://ksl.stanford.edu/KSL_Abstracts/KSL-99-17.html Fikes, R. & Rice, J. The Stanford KSL Knowledge Base Merging Critical Component Experiment. Knowledge Systems Laboratory, October, 1999.
http://openorg.ecs.soton.ac.uk/wiki/Namespace#Linked_Open_Data for referring to Timber's 5-star scheme came up in Edinburg last May.
We stubbed in something http://logd.tw.rpi.edu/lab/project/logd_internaltional_ogd_catalog/metadata_design
http://eprints.soton.ac.uk/340068/
Ontology Support for Influenza Research and Surveillance, Joanne Luciano, PhD, Lynette Hirschman, PhD, Marc Colosimo, PhD. Approved for Public Release; Distribution Unlimited. 28 April 2008 Case Number 08-0738 http://www.ebi.ac.uk/industry/Documents/workshop-materials/DiseaseOntologiesAndInformation190608/The%20Influenza%20Infectious%20Disease%20Ontology%20(I-IDO)%20-%20Joanne%20Luciano.pdf
THE EVALUATION OF ONTOLOGIES: Toward Improved Semantic Interoperability Leo Obrst, Werner Ceusters, Inderjeet Mani, Steve Ray, Barry Smith in C. Baker and K.-H. Cheung, ed., Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences, New York: Springer Verlag, 2006, 139-158. Chapter 6 http://ontology.buffalo.edu/smith/articles/EvaluationOfOntologies.pdf
A SURVEY OF ONTOLOGY EVALUATION TECHNIQUES Janez Brank, Marko Grobelnik, Dunja Mladenić, http://eprints.pascal-network.org/archive/00001198/01/BrankEvaluationSiKDD2005.pdf