Decadal Review Tea-Leaf-Reading

Monday, March 1, 2010 on 11:40 pm | By Peter | Tags: | No Comments

I was a little surprised to find out from Casey Law that Nature recently ran a news piece about the astronomy Decadal Review process — and they didn’t just report on it, they actually got a group of prominent astronomers together for dinner to discuss their impressions of how well the review process works and what its outcome might be this time around. Unfortunately, the article’s behind a paywall, so you probably can’t read it, but here’s a link to it if you have a university Internet connection.

I’m not the most gung-ho SKA supporter on the planet, but I was surprised to see how low it ranked in the Nature group’s list of priorities. As far as I can tell it was never mentioned during the hour-long panel discussion (paywalled transcript link) and the group ranked it last out of seven projects that they’d recommend for funding. Ouch.

You can rationalize a bit by saying that the SKA is more than a decade off, and the group even talked about how major projects have recently taken two iterations of the Decadal Review to get significant momentum behind them, while this is the first go-round for the SKA. Still, maybe this is a sign that the SKA community needs to get more aggressive about its PR. On the more personal front, this is a little scary for hopes that the Decadal Review report will push funding for ATA expansion as preliminary SKA work.

Another thing that disappointed me was that the group didn’t spend any time discussing “state of the profession” issues like the ones discussed in the whitepaper I helped write. This didn’t surprise me at all, but I still wish more people out there (especially the leaders of the community) would be more engaged in thinking about, well, the state of the profession. We can certainly continue to limp along as we have for a long time, but I feel like there’s a tremendous room to improve the system, if only a bit more money, time and effort would be put in that direction.

A Nice Null Result

Tuesday, February 23, 2010 on 11:31 pm | By Peter | Tags: | Comments Off

OkCupid is a dating site that takes a pleasantly data-driven approach to the online dating game. If you read their blog, you run into a lot of interesting and surprising facts that the OkCupid staff have pulled out of their databases.

Like many dating sites, OkCupid gives its users a list of questions to answer about themselves. For each question, however, it also lets you specify what your ideal partner’s response would be, and how important that response is to you. (You might not always want your partner to have the same response as you — for instance, “Are you sexually dominant?”) This lets OkCupid rate the compatibility between two members according to their personal standards, not just according to what the employees at OkCupid think makes for a good couple.

You can then do interesting aggregate statistics by breaking the users down into groups and then looking at the average compatibility between different groups. On this post on the OkCupid blog, they did various breakdowns and visualized the results on grids like the one below. Each group has its own row and column, and the intersection of a row and column gives the average compatibility between the two groups. A greener color means above-average compatibility, and a redder color means below-average compatibility. For instance, here are the compatibilities of racial groups:

A dating compatibility graphic

Dating compatibility between racil groups, taken from blog.okcupid.com

This is somewhat heartening: in theory, people of all races should be able to get along pretty well in relationships. (Unsurprisingly, in practice, this is untrue. Different racial pairings on OkCupid reply to each other’s messages at rates that vary wildly from what their compatibility scores would imply, indicating that people’s personal attitudes affect things strongly. See the blog post for more info.)

Anyway, here’s that null result that I referenced in the title:

A dating compatibility graphic

Dating compatibility, taken from blog.okcupid.com

From a sample of 500,000 users, 144 comparisons, all within 0.5% of the mean value. That’s a null hypothesis that I can get behind.

(As a side note, the fact that OkCupid lets people rate the personal importance of others’ answers to various questions allows them to easily discover which questions are the most effective for testing compatibility. Apparently “How often do you shower?” is one of the best.)

For Twitter or for Worse

Monday, January 18, 2010 on 7:20 pm | By Peter | Tags: , | Comments Off

Well, I’ve started using Twitter under the username pkgw, and I’ve been kind of enjoying it. Right now my timeline is kind of a grab-bag of mostly personal items with some work-related things; I haven’t seen any obvious examples of good ways to keep the two somewhat separate. (As a sidenote, “timeline” seems like sort of an odd word choice to me compared to “feed” or something like that.)

The main thing that I don’t like about Twitter is that its associated vocabulary is so infantile; I feel humiliated whenever I have to say the word “tweet” aloud. But “blog” has started to feel like a a real word, so maybe “tweet” isn’t far behind.

The other thing is that, of course, Twitter is a closed-source web service and I have no idea what their user data policies are. Like it or not, most people who microblog use Twitter, and the social aspects of the service are important, so I’m willing to put up with that. Unlike my email or this blog, I don’t think that I’d be brokenhearted to lose all of my tweets (they are mostly ephemera) so this doesn’t bother me as much as it might for other services.

Broadband Spectra Paper Out!

Tuesday, December 1, 2009 on 5:13 pm | By Peter | Tags: | Comments Off

After two and half years, my paper on the FIR/radio correlation and broadband spectra of galaxies with the ATA has been accepted to the Astrophysical Journal and posted to arxiv! If you’ve been wondering what I’ve been doing for the past 30 months, you can find out here.

The Python Standard Libraries: Not Very Good

Saturday, November 28, 2009 on 3:33 pm | By Peter | Tags: | 2 Comments

Thomas Vander Stichele, whom I have never met but works in some of the areas of Linux that I used to be involved with, writes about some ugly code he found in the Python standard libraries, and says

I usually tend to think of Python as the discerning gentleman’s programming language: well-behaved, well-documented, people take care of the code written. I like the batteries-included approach and assume that the battery code in the standard library is well-written…

I have to say that I have no idea what he’s talking about. The Python standard libraries are terrible. Everywhere you look there are inconsistent APIs and coding styles, redundant or missing functionality, widely ranging quality in the documentation, and poorly-engineered solutions. Consider the description of the subprocess module:

… This module intends to replace several other, older modules and functions like:

  • os.system
  • os.spawn*
  • os.popen*
  • popen2.*
  • commands.*

That is, the Python standard libraries contain at least four different APIs for invoking subprograms, and I have to say that I still don’t think latest iteration is great. The os module, which provides core functionality,  just exposes the standard POSIX API without making any efforts to map it into the language helpfully or abstract very well for non-POSIX systems. Google indicates that urllib is pretty widely hated. StringIO vs cStringIO, pickle vs. cPickle, about a dozen different database and XML modules … it’s a mess.

Now, of course, this is all just saying that the Python standard libraries are what they are: they were written by many different people in many different styles at different times, and they were developed quickly to get functionality out there so people could use it. I think that is very Pythonic: Get Stuff Done and if not everything is perfect, so be it. But I have to say, it’d be nice if the standard libraries were better. They’re bread-and-butter APIs, and it would, you know, be nice if they were carefully and thoughtfully designed. The fact that they aren’t turns out not to be a deal-breaker, but it remains unfortunate.

Status Update

Tuesday, November 10, 2009 on 5:02 pm | By Peter | Tags: , | Comments Off

Emotions that RFI currently inspires in Peter:

  • sadness

SKAremongering

Saturday, October 17, 2009 on 2:13 pm | By Peter | Tags: , | Comments Off

A while back an interesting paper appeared on astro-ph, the astrophysics preprint server: “Large Instrument Development for Radio Astronomy“, by J. R. Fisher and others who appear to be radio engineers at the National Radio Astronomy Observatory. It’s a whitepaper submitted to the Astro2010 decadal review and I think it’s fair to summarize it as a shot at the Square Kilometer Array concept.

The whitepaper doesn’t explicitly name the SKA but that’s clearly what it’s about. The basic argument is of the “let’s not be hasty” form — it takes time to develop new technologies, combining multiple scientific goals in one observatory is difficult, the costs of complex designs can quickly get out of hand, and so on. The SKA concept, which envisions serious progress in many areas of radio astronomical engineering and pretty much aspires to be the Greatest Radio Telescope For Everyone Evar as well as By Far The Most Expensive Radio Telescope Evar certainly calls for lots of new technologies and a complex design.

I’m sympathetic to pretty much all of their arguments. Building the largest radio telescope ever with new, incompletely-understood technology would, I think, be a really bad idea. From what I’ve seen, “all-in-wonder” designs for any technological system are usually a red flag — flexibility almost always comes at the cost of clean, simple, and correct functionality for any particular purpose. World-class hardware and software engineering is hard and takes time.

That being said, their arguments are the same ones that are always used to discourage innovations. “Oh, it’s risky, it’ll take a long time, we understand the existing stuff so much better.” These arguments are often valid, but technology wouldn’t be much fun if they were always heeded. Of course, “fun” shouldn’t be the operative word when you’re talking about a multi-billion-dollar investment. But in the case of the SKA, there’s no way to build the telescope without requiring some innovation. In the terminology of the whitepaper, it’s just a question of how much risk you’ve retired before you start building it.

In the particular case of the SKA, I’m not sure what I think. One thing is the fact that there’s a long development path leading to the SKA itself — pathfinders and prototypes and precursors, oh my. A lot of work is already happening to build and test the kinds of systems that would be involved in the SKA. It’s not as if ground is going to be broken on the final thing before a detailed design has been worked out and thought through. There won’t be any prototype fully-functioning observatories with thousands of antennas, but I think the basic issues involved in scaling up a large array to a huge array are low-risk: a lot of the requirements are parallelizable, and if you understand a good-sized batch of antennas well, you can understand how a much larger batch is going to behave.

On the other hand, a project as big as the SKA tends to develop its own inertia. If, after another decade of work, there’s trouble on all of the engineering fronts, no one’s going to want to (or even be able to?) just abort the whole project. For small R&D projects, you can say, “OK, it didn’t work, that’s good to know,” but when you’re planning to spend a few billion dollars, the plug just doesn’t get pulled. And in that case, you could be spending lots of money on a telescope that will be merely OK when that money could have been spent on several projects that would have each been great.

Maybe it’s best to think of the SKA like a space mission. Space missions are expensive and risky, so you always see that they deploy pretty unexciting telescope technologies; they get their science leverage from whatever application-specific advantage is provided by being in space. The SKA is also expensive and risky since the upfront capital investment will be so large. So in all probability it will likewise deploy technology that will be boring by the time SKA construction starts; this is OK since the SKA gets its science leverage by being really huge. In this case, the question is, is it possible to deploy boring technology in the SKA model? I think so, since the key pieces are the antennas, feeds, and communications links — those are the things that you really, really don’t want to have to replace en masse. And those are eminently testable in smaller configurations. So hooray, the SKA will work out fine! Good thing we figured that out.

Public Talk, October 21

Thursday, October 1, 2009 on 10:38 pm | By Peter | Tags: | Comments Off

I’m giving my radio astronomy talk yet again, to the San Francisco Amateur Astronomers on Wednesday, October 21st, at 7:30 PM at the Randall Museum. The abstract is identical to that of September’s talk. If you’ll be in town, you should come on by!

Public Talk, September 11

Thursday, September 10, 2009 on 12:06 pm | By Peter | Tags: | Comments Off

Well, I didn’t do a very good job of giving advance notice, but I’m giving a public talk tomorrow to the Peninsula Astronomical Society in Los Altos. The topic and description are the same as that of the talk I gave in June:

Exploring the Invisible Universe: The Past and Future of Radio Astronomy

Visible-light astronomy has been practiced for millennia. Astronomical observations of radio waves are, in comparison, still a novelty. Over its short lifespan, however, the field of radio astronomy has still managed to produce some of the most impressive results of modern science, including the discovery of extrasolar molecules and the detection of cosmic microwave background radiation, the key piece of evidence for the Big Bang. In this talk I’ll discuss the basics of radio astronomy, what can be seen in the radio sky, and the different ways in which astronomy is done at optical and radio frequencies. I’ll also talk about what we can expect from radio astronomy in the near and not-so-near future: an exciting convergence of recent technological advances promises do as much for radio astronomy as the invention of the CCD has done for visible-light astronomy. Special focus will be put on the contributions of Bay Area institutions, including UC Berkeley and the SETI Institute.

I was pretty happy with the previous version of the talk, so I’ll only make a few changes to it. Should be a good time.

Parallel Computing Bootcamp

Friday, August 28, 2009 on 3:53 pm | By Peter | Tags: | Comments Off

Last week I attended the 2009 Short Course on Parallel Programming, a boot camp put on by the Berkeley EE/CS Parallel Computing Lab. The “parallel” in the names refers to computers in which you have several independent processors that work on computational problems cooperatively and simultaneously. Pretty much everyone agrees that this is the future of computing, since chip manufacturers have just about maxed out the raw computing power of single processors. To satisfy the never-ending lust for higher-performing computers, then, they’re all racing to find ways to pack more processing cores into a smaller area with faster communication between them. Most people expect that in the not-too-distant future, typical desktop computers will contain something like 64 computing “cores” that run simultaneously. Unfortunately, it turns out that writing high-speed code to take advantage of processor parallelism is really hard. The Parallel Computing Lab is basically entirely funded by companies like Microsoft and Intel to figure out how make it possible for non-superstar programmers to write good (or even merely non-buggy) parallel programs. The point of the boot camp was to spread some of the wisdom that the Parallel Lab folks have gained to a broad spectrum of programmers, both scientific and commercial, who think they might need it.

This was the first edition of the boot camp, so there were some rough edges to it, but it was overall a worthwhile experience. (Though that’s not a terribly high bar when registration is free for UC Berkeley students and the course took place just a few buildings over from mine.) Not knowing much about parallel computing, it was good to get an overview of the state of the art and the basic frameworks that are out there. It was somewhat comforting to see that there don’t appear to be any major important conceptual foundations to parallel programming that I’m unfamiliar with.  On the other hand it would have been nice to have discovered that while I wasn’t looking someone had figured out how to make it really easy to write fast, correct parallel programs.

My personal interest in this topic, besides a desire to stay informed about these sorts of things, is that I’d like to learn more about how to analyze radio astronomy data in a parallel computing environment. A lot of the work that I do involves a lot of waiting around for observations to be processed, and it’d be a big win to be able to significantly decrease the amount of time spent doing that. One challenge particular to radio astronomy is that a lot of the problems discussed by parallel computing people are all about crunching a relatively small set of numbers: simulating particles in a box, doing linear algebra, etc. A lot of the radio astronomy applications, on the other hand, involve running through large datasets, so you want to read them from and write them to disk as fast as you possibly can.  The Berkeley theory group and Radio Astronomy Lab have won a contract to build a pretty good-sized parallel computing cluster, so there will be an opportunity to explore these things over the next few years. Doing so would require all-new software, but I think it’s a really interesting problem with the potential for big gains. As the computer scientists say, “a quantitative change of an order of magnitude is a qualitative change” — if you can make a tool faster by a factor of 10, that really changes the ways in which you use it. I’m hopeful that I’ll get the chance to spend some more time on this over the next few years.

Vacation; The Omnivore’s Dilemma

Sunday, August 9, 2009 on 1:18 pm | By Peter | Tags: | Comments Off

For the past few weeks I’ve been traveling in China and not been up to any work. I did manage to experience the total solar eclipse in Shanghai, however. I use the word “experience” because it was cloudy and raining so it would be a little inaccurate to say that I “saw” it, but it was impressive all the same. I’ll post some pictures in a few days when I get back to Berkeley.

I read Michael Pollan’s The Omnivore’s Dilemma during the many long train and plane rides in the trip. I thought it was pretty good. Three notes:

  1. Scientifically, I was interested in the argument that chemical fertilizers represent a large petroleum-based energy input into the food chain that’s a historical novelty. Pollan mentions that modern farmland is idle for maybe half the year, when its monocultured crop is out of season. Traditionally-managed farmland is planted year-round and hence collects about twice the solar energy per year via photosynthesis. This difference in energy input is possible because modern farmland benefits from the additional energy input of chemical fertilizers, which do the job of nitrogen-fixing plants in traditional farmland. These fertilizers are a hidden, nonlocal, and petrochemically-based source of energy that you don’t see in traditional agriculture.
  2. Long before I read the book, its title bothered me because it wasn’t obvious what the omnivore’s “dilemma” was — what is the “omnivore” choosing between? “Dilemma” clearly means “two lemmas”, two possibilities. It turns out that this is a pure misnomer. The effect described by the term (which is not originally Pollan’s) is apparently also known as  “the omnivore’s paradox”. It refers to one stress of being an omnivore: if you can eat anything, you have to spend a lot of time worrying about what to eat. (Whereas koalas, to use one of Pollan’s examples, just know to munch on eucalyptus all day.) I guess I’m a little pedantic but this confusion over the title (also knowing the capsule summary over the book) actually discouraged me from reading it to a small extent.
  3. Someone has probably posited this before, but there’s probably a kind of 90-10 (or maybe even 95-5) rule for popular nonfiction books of the form of The Omnivore’s Dilemma: ones like Blink, Stumbling on Happiness, and so on. Specifically: 90% of the major content can be summarized in 10% of the words. I don’t want this to sound like a criticism: it’s important to build these arguments in detail, and a lot of the import of these arguments doesn’t settle in until you’ve seen them approached from many angles. But I do wonder whether there’s a 100-20 corollary to the rule (you could fit 100% of the content in 20% of the space). One relevant point is that the “natural” size of published books along these lines is around 200 pages. If you researched a topic, wrote it up, and finished with a 50 page manuscript, you’d pretty much have to add 100 pages of extra material to get something of publishable length. If you had a decent grasp of your topic, this material certainly could be useful and interesting, but in a sense it’s still filler.

Slides from EBAS Talk

Tuesday, June 16, 2009 on 11:47 pm | By Peter | | Comments Off

And I do mean slides. I’ve uploaded a PDF version (15 MB) of the slides, but they are essentially useless without knowing the plan of the talk, since there’s virtually no text on the slides. I found that this situation arose pretty naturally from my strategy of outlining the talk on paper first and then assembling supporting graphics afterward.

I’ve tried to be as thorough as possible about the image credits, but many of the images are probably incorrectly attributed or not permitted to be republished. That’s life out on the wilds of the internet, though. My diagrams and the talk as a whole are licensed as … let’s say a Creative Commons 3.0 Attribution Share-Alike license.

AAS #214 poster

Monday, June 15, 2009 on 10:35 pm | By Peter | Tags: , , | Comments Off

Six months after the last AAS, I gave another poster at the 214th AAS meeting in Pasadena. The topic this time was the launch of the AGCTS project. The abstract for the poster isn’t on ADS quite yet, but I think it should show up here. Its session identifier was #601.03.

Here’s a PDF of this poster (2.5 MB). As before, this is unpublished and unreviewed, and it describes extremely preliminary work.

The Next Big Thing

Wednesday, May 20, 2009 on 10:59 pm | By Peter | Tags: , | Comments Off

Over the course of this spring Geoff and I have converged toward a vision of what my next project will be. This is an interesting subject since (barring catastrophes) the next project will be the basis of my thesis. Or, perhaps a better way to phrase it is that the project that I start next will be the jumping-off point that leads to all of the pieces that make up my thesis.

This project is … drumroll … the ATA Galactic Center Transient Survey, or AGCTS for short. Over the course of the next six months, the ATA is going to observe the Galactic Center (GC) nearly every night. We’ll stack up all of those observations to get a really good map of the GC region. Then we’ll subtract off each night’s data individually from the stack and see what’s left over. Most of the time, it’ll just be noise, but if any sources appear, disappear, or fluctuate strongly, they’ll pop out in the subtracted image.

That’s the basic outline, at least. The actual procedures are a bit more complicated, and there are countless ways to bell-n-whistlify the way you search for transient events. I’m not going to go into any more detail at the moment, though. This is partially because the information would be overkill, and partially because we haven’t worked out all of the specifics just yet. The particulars of the project will undoubtedly evolve as it transitions from concept to reality.

The project is shaped by a few key constraints. People are pretty excited about detecting transient astronomical phenomena, since they haven’t been able to do so very effectively until recently. And the GC is a good place to look for them, since it’s extremely crowded (compared to the rest of the galaxy) and it’s got that big ol’ black hole sitting in the center. Both of those factors could quite plausibly lead to more things crashing into one another, exploding, or otherwise going bump in the night. Finally, the actual observing plan is dictated by the fact that the SETI Institute is going to be running a survey of the GC this summer. Due to the ATA’s fancy commensal-observing capabilities, we can take data while they’re doing their search. The SETI folks are interested in looking at the GC for precisely the same reasons we are, and the practicalities of their search mesh well with our needs. We could plan to survey some other part of the sky, but we might as well piggyback on the SETI Institute’s observing program.

An initial image of the GC region made with the ATA.

A preliminary image of the GC region made with the ATA. Click twice for the full-size version.

Above is an image that I made tonight with some preliminary data. I think it looks pretty good, although there’s much improvement to be made — the ripple patterns are indicative of imperfections in the data analysis, and the magnitude of the background noise is much larger than it should be. But it’s a better-looking image than I expected to get from a night’s work. Hopefully this bodes well for the rest of the project.

I’ll be showing a poster announcing the AGCTS at the Summer AAS conference in Pasadena in a couple of weeks. There certainly won’t be any concrete science to talk about, but we can describe our plans to the community and see how much interest we can stimulate.

Public Talk, June 6

Tuesday, May 19, 2009 on 9:33 pm | By Peter | Tags: | Comments Off

I’ve agreed to give a talk on June 6 to the Eastbay Astronomical Society. The title and abstract are:

Exploring the Invisible Universe: The Past and Future of Radio Astronomy

Visible-light astronomy has been practiced for millennia. Astronomical observations of radio waves are, in comparison, still a novelty. Over its short lifespan, however, the field of radio astronomy has still managed to produce some of the most impressive results of modern science, including the discovery of extrasolar molecules and the detection of cosmic microwave background radiation, the key piece of evidence for the Big Bang. In this talk I’ll discuss the basics of radio astronomy, what can be seen in the radio sky, and the different ways in which astronomy is done at optical and radio frequencies. I’ll also talk about what we can expect from radio astronomy in the near and not-so-near future: an exciting convergence of recent technological advances promises do as much for radio astronomy as the invention of the CCD has done for visible-light astronomy. Special focus will be put on the Allen Telescope Array, a new telescope jointly operated by UC Berkeley and the SETI Institute, which exemplifies some of these advances.

This will be my first public talk as well as my first hour-long talk. I think I know what I want to talk about, and I’m pretty sure I can string it all together in a non-boring way, so I’m looking forward to it. I’m not so much looking forward to how much time it’ll take to prepare everything, but I want this to be good so I’ll take the time to do it right.

Innovations in Outreach

Monday, April 27, 2009 on 2:13 pm | By Peter | Tags: | Comments Off

My colleague Steve Croft got sent a “Flat Stanley” a little while ago. I hadn’t heard of the Flat Stanley Project before — it’s an outreach initiative in which kids print out a cartoony paper man, color him in, and mail him off somewhere interesting. “Stanley” is given a tour of what goes on and the kids get a report of what he experienced. Steve wrote his report as a blog post.

Sketching Out Robust URLs in the Literature

Friday, April 24, 2009 on 9:38 pm | By Peter | | Comments Off

For a while now, lots of scientific papers have included links to websites that provide important non-textual supplements to papers: large tables or images, software source code, and so on. My impression is that most people are aware that can be problematic. URLs tend to go stale after only a few years, and I suspect that this is especially true of the ones published in the scientific literature, since people spend a lot of time moving from institution to institution as postdocs. (It would be an interesting exercise to go through some set of papers on the arXiv and assess the half-life of a published URL.) But most people (rightly) don’t worry about these things much and we’re steadily building up a literature that, in many important aspects, has an expiration date.

It would be nice to do a bit better job of this. This would naturally, I think, be the responsibility of the journal publishers — they have an interest in the durability of their product, and they can enforce standards when it comes to these things. And indeed they seem to have converged on an “online supplement” / “electronic edition” model where online resources are alluded to in the published paper but no URL is given. I assume this is to give them the flexibility to change the URL in the future, which is something I can’t fault them for wanting to do. Who wants to commit to running a webserver at a specific domain name hosting specific files in the year 2060?

But the publishers don’t seem to insist that all important material be hosted by the journal. Source code, in particular, is often hosted at external sites. This makes sense: source code tends to get updated, and while it’d be valuable to host a static copy of code as it was used to create a published work, it’s likely more valuable have a changing website giving the latest information about and updates to that code. (If I had my druthers I’d also require that any paper giving the results of running any software had to publish the source to that software as well, but that’s not happening anytime soon.) So I can see where the publishers are coming from. They (rightly) don’t want to get into the business of hosting websites for random scientists’ projects.

So, how could we provide robust links in the literature to evolving content? We want something that will last even if the person in charge of a project changes institutions, or if a new person takes over a project, and ideally something that should last on a 50-year timescale. And it shouldn’t require scientists to have to know too much about web stuff. (But I don’t think you can get away with knowing nothing. If you’re publishing something in a paper you’re making a commitment to it and you should understand what you’re committing to.)

I think the simplest solution is to take an approach similar to that used by the PURL people. In that system, you register some kind of permanent URL and tell the registrar the “real” URL of your information. Requests for the permanent URL are forwarded to the real URL. And, vitally, if you have to move the “real” URL, you can do so without breaking the permanent URL.

It seems to me that it would make more sense, however, to do this in DNS, rather than with HTTP redirects as PURL does. You could have some toplevel site, let’s call it example.org, and let people register myscience.example.org. (I came up with an astronomy-centric name for such a service that I  thought was good but, alas, it was a porn site.) The key here is that myscience.example.org can be set to resolve to any IP address that you want, and if you can get a webserver under your control (not too much to ask, I think), then you can configure it to respond to requests for myscience.example.org with your particular content. As with PURLs, if you need to move your hosting, you change the IP address and configure a new webserver. This difference makes the service much easier to run — all you need to do, at a minimum, is maintain control of the domain name and update your DNS records as needed. You don’t have to deal with traffic spikes caused by your registrants. A completely user-unfriendly implementation wouldn’t even need its own webserver.

Several elaborations can be envisaged:

  • Namespacing probably needs to be dealt with — it’s unfair if I can get my hands on stars.example.org and hold onto it forever. There are various ways you could deal with this.
  • You could associate each subsite with the paper that it was published in and the particular URLs that were published. You could then periodically check that those URLs were live and nag the registrant if not.
  • If registrants worry about being scooped, you could let them assert priority in secret. They send you a request with a one-way hashed  version of the name of subsite they intend to register. Because of the one-way function, the registrar can know if anyone else has requested the same name without having to know what the actual name is. Then when your paper is released you can reveal your domain name and have documentation of when you first (secretly) requested it. Would require social protocols to prevent abuse.
  • You could go into web hosting and host the sites for people that didn’t want to deal with the server stuff.

You could also go into email hosting. A similar problem to broken URLs is broken email addresses — papers are published with the authors’ email addresses, and these often go bad as people move around. You could grant people someusername@email.example.org and forward it to their current institutional address. (This looks a little classier than publishing your gmail.com address in a paper.) And again as they move around, they could update the address that gets forwarded the mail. Because you have to run a forwarding mail server, this requires more effort to implement than the DNS system, but it’s not a big deal.

This latter bit, however, gets into all the usual tricky identity questions. When do you expire usernames? What do you do if your name changes? How are two people with the same name distinguished?

This latter issue is an annoying one in science. How do we distinguish papers published by different J Smiths? (Or P Williamses — not that I’m bitter.) How can you know that John Smith changed her name to Jane Smith? Is the J Smith at Institution A the same person as the J Smith at Institution B? Far too much insider knowledge is required to be able to answer these questions reliably. These are well-known to be tough issues to deal with. It’d be nice to see more effort to tackle them in the relatively small and well-behaved scientific circle, though. The arXiv is starting to make an effort, but a lot more people will need to get involved for it to really take off.

(The arXiv “author identifier” thing is what got me musing about this to begin with. There are two issues with their work that I’ve noticed so far. Firstly, of course, this is specific to arXiv. Nowadays most stuff appears on it, but I’m not sure how well it can scale. Secondly, for a paper to appear as “yours” in their author identifier list [e.g. mine], you have to be an “owner” of that paper, which typically only the first author is. They seem to want all authors to be listed as owners, and I can see how that is perhaps what makes the most sense formally, but practically I don’t think it happens much. Maybe community standards will change in that regard.)

(Also, in the course of reading about that feature, I noticed that the development and management of the arXiv are totally opaque. I think they’re motivated by a desire to not spend all their time dealing with crazy people, but I still find that very disappointing. Crazy people don’t care about the source code to arXiv, but the source completely closed, there are no ways for people to join in, and there isn’t even communication about the development plans! Not cool.)

Who’s The Scientist?

Sunday, March 8, 2009 on 10:22 am | By Peter | Tags: | Comments Off

I found this cute post linked to from the Berkeley Science Review blog –  it’s a set of pictures and descriptions of scientists made by a group of seventh graders, before and after a visit to FermiLab. Before: “hair standing straight up and a mean wicked laugh”. After: “Actually, scientists are normal people that lead normal lives.” There must have been some orchestrated brainwashing on the visit they went on, but I can’t fault the FermiLab folks for doing a little PR.

What caught my eye was how most of the students drew scientists as being of their own gender. In particular, most of the female students drew pictures of female scientists. Certainly not what I’d have expected to have seen, but a pleasant surprise.

Status Update

Sunday, February 15, 2009 on 11:20 pm | By Peter | Tags: , | Comments Off

After writing down the AAS-related information, I thought I’d give a little status update on the broadband spectra project.

I’ve started drafting a paper describing my work. The past couple of weeks have involved a lot of writing — I’ve gotten through the less-interesting parts and now am on the verge of starting to work on the meaty analysis and interpretation.  This means that I’ll probably shift back into the less-writing mode as I take the time to make sure every part of the project is done correctly and convincingly.

Unfortunately I don’t have a great conception of how long it will take to cross all of those t’s and dot all of those lower-case j’s. There are few things that I know I need to do, but there will probably also be ones that come out of nowhere — you write a paragraph, allege something, and say, “Hmm, I guess I should actually justify that.”

I’ve started the practice of writing down my goals for the week on Monday morning and crossing them off as the week progresses. So far I’ve done a good job of doing everything that I want to, mainly because I make a point of being exceedingly unambitious in my goal selection. At some point in the future, maybe I’ll ramp up the goals that I set to encourage myself to work longer hours, but for now I want to feel that I have a good peg on my baseline productivity. And, to be honest, I don’t think that a weekly schedule will be a strong enough force to get me to work harder than I feel like. If I ever have a pressing deadline, though, I think it’ll be useful to work up a weekly schedule and have a good sense of where I need to cram in order to get things done. Fortunately, for the time being, deadlines aren’t my problem. I am very glad of that.

AAS #213 poster

Sunday, February 15, 2009 on 11:02 pm | By Peter | Tags: , | Comments Off

I gave a poster at the 213th AAS meeting about my broadband spectra project. The poster abstract is now listed on ADS.

Here’s a PDF of the poster (2.8 MB). The expected caveats apply: this is unpublished, preliminary work, and it represents results that will be, and in fact already have been, superseded. But it captures what I presented at the conference.

Next Page »

Powered by WordPress and Nifty Cube with Recetas theme design by Pablo Carnaghi.
Entries and comments feeds. Valid XHTML and CSS.