Archive for January, 2012|Monthly archive page

An open-source place matcher for genealogy

I’ve developed an open-source place matcher for genealogy. It takes place texts provided in GEDCOM files and matches them to standardized place names. It’s pretty good :-). I’ll be giving a talk on it at RootsTech next week. The source code and database are freely available on Gitbub.

Thanks go to Ryan Knight for creating a web application to demonstrate it.

Advertisements

Open-source name variants database

I’ve developed an open-source name-variants database. We’re currently using it at WeRelate.org. This is a better algorithm than Soundex for matching variant names like Ann, Anna, Annah, Anne, Annie, etc. It results in a 28% reduction in missed variants compared to soundex, based upon a set of 100,000 pairs of names provided by Ancestry.com. I’ll be giving a talk on it at RootsTech next week. The source code and database are freely available on Gitbub.

A robust open-source GEDCOM parser

I just posted the first cut of an open-source GEDCOM parse. The parser parses GEDCOM files into a de facto object model, which is able to represent nearly all of the tag sequences found in real-world GEDCOM files. The object model includes common custom tags; other tags are represented as extensions. The object model has a JSON representation, and the toolkit includes a GEDCOM exporter. This makes it possible for anyone to read a GEDCOM file, manipulate its contents, save it to JSON, and export it back to a GEDCOM file, without loss of information for the vast majority of GEDCOMs.

Ryan Knight has created a demo server to show it off.

For more information, see the Github repo.