Data Discovery/Metadata

Return to main BRAIN U19 Data Science Consortium page

Subgroup to work on projects related to data discovery and metadata. In particular, to explore the possible use of open source data catalog code to create a BRAIN Initiative Data Discovery Portal.

Subgroup Lead: Alisa Surkis

NIH Liaisons: Susan Wright (NIDA), Elizabeth Powell (NIAAA) (elizabeth.powell3@nih.gov)

Email list: alisa.surkis@nyulangone.org; susan.wright@nih.gov; elizabeth.powell3@nih.gov; kevin.read@nyulangone.org; petersen.peter@gmail.com; arokem@gmail.com;

 

April 13, 2019 PI Meeting

 

 

Initial meeting 2/1/2019

Meeting summary:

  • Hosting: Ideal for NIH to host Data Catalog, but if that is not possible, seems likely that an institution could host
  • Scope of Data Catalog: Makes sense to start with U19 data science consortium, as smaller group that is already working together rather than more limited scope of NWB or larger BRAIN Initiative
  • Metadata Customization: would make sense to be addressed by each of the U19 subgroups/also looking to the NIH funded archives
  • Questions as to whether the Data Catalog would be an interim tool and whether it is too simple, and we should set our sites on building other tools with more functionality?
  • Point made that people don’t share easily so it would be a lot of work to get good metadata into the Data Catalog, though would be different for cases like NWB where Data Catalog could scrape the metadata
  • Discussion of whether it was possible that NIH might fund the Data Catalog, as it would be a great service to U19s, but also question raised as to whether there was sufficient value added
  • Discussion of other archives as alternative to Data Catalog, i.e. flywheel, openneuro -- tools that provide analysis platforms, and give people a motivation to use them, and then sharing is trivial in the end
  • No one platform will emerge as the only platform, so Data Catalog would serve a purpose in having one place that points to all these other platforms, and that was already encoded so that datasets in Data Catalog are findable through Google Dataset search
  • Questions as to whether the Data Catalog has UIDs for entries, but it does not since it is not hosting data -- don’t want to create confusion by either having UIDs for data itself and for metadata record, or by having a UID for a metadata record and creating the impression it is for the data
  • Discussion of whether Data Catalog can do data validation, but does not hold data so it can’t. But discussion of whether archives do that, and that the Data Catalog could reflect whether a dataset was in an archive which does data validation by including that information in the metadata.
Table sorting checkbox
Off