It looks like you're new here. If you want to get involved, click one of these buttons!

- All Categories 2.4K
- Chat 502
- Study Groups 21
- Petri Nets 9
- Epidemiology 4
- Leaf Modeling 2
- Review Sections 9
- MIT 2020: Programming with Categories 51
- MIT 2020: Lectures 20
- MIT 2020: Exercises 25
- Baez ACT 2019: Online Course 339
- Baez ACT 2019: Lectures 79
- Baez ACT 2019: Exercises 149
- Baez ACT 2019: Chat 50
- UCR ACT Seminar 4
- General 72
- Azimuth Code Project 110
- Statistical methods 4
- Drafts 10
- Math Syntax Demos 15
- Wiki - Latest Changes 3
- Strategy 113
- Azimuth Project 1.1K
- - Spam 1
- News and Information 148
- Azimuth Blog 149
- - Conventions and Policies 21
- - Questions 43
- Azimuth Wiki 717

Options

I have pushed out today a project I've been working on for a little over a month, a semi-mathematical tutorial on mark-capture methods, sometimes called "multi-list methods". The principal means of announcement is at my blog, but I have put up the code both in my own non-Github repository and at the Azimuth-accessible repo.

I don't dwell on this at the blog, but I do a bit in the accompanying tutorial paper: This was only possible because Yauck, Rivest, and Rothman, in the related paper which was published in the *Journal of the American Statistical Association*, made all their data available to the publish. Thus I was able to use it for a reanalysis, a practice which has its scientific and statistical uses, but also lets people use actual data for teaching and other purposes. Such publication of data sets is now the norm in statistical journals, and in many scientific ones, but in some fields, such as Internet measurement, which I criticize, it is typically not done. I cite papers there which used mark-recapture for Internet measurement purposes, but you can't say much more about them since their datasets are not available.

Yauck, Rivest, and Rothman, thankfully, chose to publish in *JASM*, which is not the typical outlet for an Internet measurement paper. Moreover, it's a tribute to their work, principally, I believe, that by Rivest, that the method of *estimating equations* they published was worthy of *JASM* rather than, say, being sent off to *Journal of Computational and Graphical Statistics*.

The technique I pursued there, for open populations, and ones where probability of capture is not uniform, that developed by R. Tanaka, is not well known in the the mark-recapture literature. This is striking, since it is just regression, and the means of generalizing off of Tanaka's insights are straightforward. People in the business go off and create specialized likelihood and probability models and, then, when a dataset is in hand, they try all of them, and the ones which seem to work well are embraced as an accurate depiction. As a Bayesian I have some issues with this kind of multi-model shotgun approach, but, that said, my own work the the Tanaka technique is not Bayesian, nor is the **R** package, *segmented*, by Professor V. M. R. Muggeo, a Bayesian approach.

Tanaka's technique is worthwhile because it is so transparent compared with some of the others.

These methods have importants uses in public and health policy, for instance, counting numbers of intravenous drug addicts: See the references in the tutorial, including papers by Lavallée and Rivest (yes, the same), and by Bird and King.

This is also pretty much the kind of work I did while at Akamai, although not only using mark-recapture methods. There are ways of using series to estimate sizes of subpopulations, and classifying them, too. I'm working several projects to get some of the techniques I development out in public, since the ones I used to work are unfortunately now lost to the dark caves of proprietary information.

hello world×