CADAC/DesignDiscussion07AUG07

Design Discussion

This was discussion the included some thoughts on future implementations, so I didn't want to lose it. There were a couple of follow up emails that I should probably add as well. Part of this discussion was just making sure we were all on the same page.

Data Publication

First, a general comment on the idea of publishing to the CADAC archive. Right now, the role of the archive is help users share data at the workshop. Publishing datasets that are linked to publications is a long term goal; before we get there, we need to make sure the community is on board. And, if the LRAC doesn't get approved, we won't have an archive to publish from--so, we need to be careful about promising people that the CADAC is an eternal resource.

The ideas about the data in your email make this sound very formal, like a peer-reviewed publication process. I think this aspect of the CADAC is for the future, when the data is tied to a publication. When that need is there, we can come up with a process to handle it.

Goals

Here's my thoughts on what the current goals are.

First Stars

  • Get users adding data.
  • Get users logging in and using the data.

This Year

  • Get the LRAC approved.
  • Build community support.

Next Year

  • Evaluate community support.
  • Build a publication mechanism.
  • Get more funding.

Naming Scheme

astro-ph Style

Alex Lazarian suggested to have some kind of archival number attached to every item in the data collection, so when people publish papers that made use of the CADAC, they can easily acknowledge not only the CADAC in general, but specifically the items in the data collection they used. He thinks it is important, because people who publish data very early may want to be properly acknowledged when someone uses their data for publications, if not even be coauthors. It would be great to have a simple archive number following the date of submission like astroph. Does SRB create something of that sort?

Not as far I know, and this doesn't really follow the process that will occur with data in the CADAC. With arXiv.org, users submit their works when everything is complete. With the CADAC, users will first add data for analysis and sharing, and then choose some of it for publication. If we get to point of supporting publication, there will probably be an area set aside for linking datasets into. In fact, if we wanted, users could reuse the astro-ph number in CADAC.

Suppose you had a directory (MHDturb) with some data in the SRB, and this data supported some mythical paper, astro-ph/0808.0626:

 /home/padoan.cadac/CADAC/MHDturb

Now, suppose we had set up a /astro-ph directory in the CADAC for people to link data sets to. Then you could link your dataset into this directory:

 $ Sln /home/padoan.cadac/CADAC/MHDturb /astro-ph/0808.0626

Further suppose (this is starting to sound like a math proof) that we got the support to build the VOSpace interface to the SRB. We would then be able to provide a URL to the data:

 vos://cadac.sdsc.edu/astro-ph/0808.0626

The answer is, finally, that first we need to support a natural naming scheme, and if we need to, we can do something with publications in the future.

Notifications

Finally, when a new item is added to the data collection, there should be an automatic email notification to all users.

I don't think this is right thing to do at the start. Users may add and rearrange data quite a bit at the beginning, and a flurry of emails about every file would get old. Now, in line with the thoughts of the future, if we do have a formal publication process for data, we could probably add on an RSS feed, email notifications, and other fancy stuff.

Feature Creep

Tomorrow I will have a phone conversation with Bob Fisher who apparently has a lot of experience with similar initiatives from other fields, including CFD, and told me he has many suggestions about things that work and do not.

OK, but remember, but if we start to experience feature creep before we have any users, we're in trouble.

Follow Up

I will email you what I learn from him as soon as possible. The number of users is slowly growing. I will send a first list to Mahidhar as soon as I have 20 names, hopefully tomorrow.

This is good. I have the information about the mailing list, so we should be able to start using it tomorrow. Adam at SDSC is going to get the GID straightened out for the project directory, so we can start building the software and reading data.

How about a phone call on Thursday? By then I can give you your usernames and passwords for the SRB, and point you to some documentation.