New website: Genealogy Gophers

I’m launching a new website today: Genealogy Gophers. The goal is to be Google Books for genealogy:

  • Completely free. The site is supported by ads and “Google Consumer Surveys” – people are asked to answer a few market-research questions once a day in order to read and download the books.
  • Only genealogy-relevant books: 40,000 and growing. These out-of-copyright books have been obtained from FamilySearch, the Allen County Public Library, the Mid-Continent Public Library, and several other libraries.
  • Advanced search technology allows searches for people: the names, dates, places, and relatives associated with them, not just words.
  • Search results include snippets from the book pages so it’s easy to quickly scan the results and find the most-relevant ones.

The website is in beta. I soft-launched it at RootsTech and got some terrific feedback. We were able to find information on ancestors of 30-40% of the people who visited the booth. In a few cases it was information they’d never known before! I’ve spent the last two weeks improving the site and am finally read to officially launch. Over the coming months I’ll be continuing to improve the search algorithms and adding another 60,000 books.

I’m hoping this is a big hit. Free access to records in exchange for answering a few survey questions could open a new avenue for more subscription-free genealogy websites.

Here’s the press release. My good friend Bob Sherwin wrote it!

Mocks for RootsCity

I believe it would be interesting to the youth if family history were more like facebook, where you have profile pages for each of your ancestors and you write posts about them.  
To make it easy for youth and keep them engaged I added challenges, which direct the youth to look for records about their ancestors in specific collections. When you find a record about your ancestor, you “tag” your ancestor in the record. Every ancestor has a number (score) next to their name that indicates the number of posts (records, pictures, or stories) that they have been tagged in.  I believe that tagging your ancestors in records, pictures, and stories will form the basis of a fun/joyful genealogy experience for youth.
In order to make the experience more social I added the need for you to invite someone else to “confirm” a record attachment.  Hopefully the youth would invite their family and maybe even their friends to review and confirm the records they have found.

I believe this also helps youth get started right away doing evidence-based genealogy documenting their research. The posts act as evidences, people in posts are personas, and tagging people in posts attaches personas to individuals.
Here is a link to some mocks:
I’m nowhere near a competent graphic designer, and the mocks are pretty low-fidelity, but I hope that they convey the ideas. This is what I want to build later this year as a free open-source project. I’d love to get feedback / constructive criticism of the ideas.

Starting RootCity

I saw nearly two dozen vendors at RootsTech last week, demoing websites or mobile apps or both.  Most of them need to build the same basic framework: import a gedcom, allow someone to add to the tree, list the people in the tree, create a pedigree view, etc.  Once they have this basic framework in place, then they can start to work on whatever it is that makes their offering unique.

Another challenge they have is advertising. Advertising your genealogy website / app can be expensive. 
It seems that we would get a lot more participation in people developing family history software if there were an open platform for family trees, much like wordpress is an open platform for blogging. This platform would contain the basic framework for importing, exporting, viewing, and editing family trees, with hooks that would allow 3rd-party plugins to be added, just like wordpress and mediawiki do. And if the platform catches on, we’ll have a common place where people can choose which plugins they want to use (and pay for if the plugins have subscription costs) from among a list of available plugins that work with the platform, which helps to address the advertising problem.
I’m planning to start work on such a platform, and I’m wondering if anyone else would be interested.
A few specifics:
* Single-page application using javascript and AngularJS.  I’ve found this combination to be incredibly productive.
* Use local storage for offline access.
* Mobile-first using  check it out – it looks very promising.  
* Back-end data stored in or maybe when it is ready.  Images stored on S3 or maybe a user’s google drive or dropbox or flickr account if we could get that to work.
* Interface with FamilySearch using a “remote tracking branch” so people can track the version of a person at FamilySearch, compare it against their own version, and easily copy information back and forth.  I’m not planning to interface with any other major players because I’m not aware of any other major players with an API that allows updates.
* Keep the basic platform simple: gedcom import and export, list, pedigree, and descendancy views, and editing/adding people with names, facts, and relationships.
* Create hooks for plugins to add additional views and actions.
* Experiment with an evidence-based model exposed to the user as “profiles” (persons), “posts” (evidences), and “tagging people in posts” (personas). Think of the model as Facebook for dead people, where posts are pictures, stories, or records.
* Target market is youth in particular and people who may not normally consider themselves genealogists.
* Software is open-source with free hosting at, similar to how wordpress is open-source with free hosting at

New FamilySearch strategy for Passport

Passport ( is an authentication library for node.js.  Passport is similar to everyauth, but in my opinion it’s cleaner.  The author, Jared Hanson, just added an authentication strategy for FamilySearch (

What’s amazing, and the reason that I love being a developer, is that Jared added this out of the goodness of his heart after seeing a tweet that I sent to Tim Shadel about his pull request ( to add support for FamilySearch to everyauth.  Tim’s pull request still hasn’t been integrated into everyauth, but Jared added support for FamilySearch into Passport in under a week.

More on the place matcher

I recently wrote a long comment concerning developing a place database and I think it’s worth repeating here:

There are a lot of online gazetteers: lists several.

I looked at a number of these when creating the place database for, which is now available as a free download:

It includes both current and historical places, alternate names, many places list both their historical and modern jurisdictional hierarchies, and many places include coordinates.

* Geonames: Lots of places, modern only (or mostly), most places are geographic features like lakes and rivers, but places are in a flat hierarchy — that is, cities in England did not list the county they are in. Having a hierarchy is pretty important – how do you know which Sutton in England to match when the user says “Sutton, Bedfordshire, England”? There are a dozen different Sutton’s in their database for England, and you don’t have any way to determine which Sutton is in Bedfordshire, except by calculating shortest distance from each Sutton to the centroid listed for Bedfordshire – not very reliable. Because of the lack of hierarchy, I ended up not using this resource. I wasn’t aware that they had included historical support, though it appears still in the very early stages. They’ve added an “isHistorical” flag for names that are no longer used, and are considering adding fromPeriod and toPeriod. Until they add jurisdictional hierarchies to their database, they won’t have even scratched the surface of historical issues though.

* Getty Thesaurus of Geographic Names: Smaller than Geonames, around 1.7M names for 992K distinct places, mostly modern, though more historical places than Geonames, most places are geographic features, places are in a hierarchy(!), data compiled from about a dozen different sources: mainly NGA/NIMA but also Rand McNally, Encyclopedia Britannica, Domesday book, generally lists places under the jurisdictional hierarchy they appeared in about 12 years ago. I got permission to include their populated places and political jurisdictions into the WeRelate place database. More information: and

* Alexandria Digital Library Gazetteer: I obtained a license to this as well, but after reviewing it, it seemed similar to Getty so I did not use it.

* Family History Library Catalog: The only resource I was able to find with historical places. Most (but not all) places are listed according to the jurisdictions they were in just prior to WWI. There are some duplicates: some places listed under Galicia are repeated under Poland for example. I crawled the the FHLC place database back in 2005 and included it in the WeRelate place database.

* Wikipedia: Both current and historical places. A terrific source of information, but difficult to extract. I extracted 10’s of thousands of places (certainly not all of them, but the ones that had decent templates for extraction) back in 2005 and included them in the WeRelate place database. A side-benefit of incorporating Wikipedia is that the database includes links back to the wikipedia articles, which often have helpful historical information. (Though the links aren’t included in the extract on github; I’ll fix this shortly.)

* updated database of places they’ve extracted from Wikipedia. Includes about 80,000 current and historical places. I’d love to integrate this into the WeRelate place database, though it will be a big project (see below).

* OpenStreetMap: has coordinate information for modern places, and places are arranged into a hierarchy(!), I’d like to use this to fill in missing coordinates into the place database at

* not a place database per se, but a fantastic source of information for how jurisdictions have changed over time. I used this and wikipedia and Encyclopedia Britannica when compiling the WeRelate place database (see below).

The big challenge when creating a place database is not getting the data — as you can see, there are many sources for that. It’s merging data together from multiple sources *without creating duplicates*. You want to say that City X in Historical Province Y from the FHLC is the same as City X’ in Modern State Z in Wikipedia. Merging duplicate places is generally harder than merging duplicate people, because place names can change dramatically after wars. Even merging Getty and Wikipedia was challenging, because of the changes European countries have made to their jurisdictional hierarchies over the past 10 years due to the EU. I spent months merging Getty, FHLC, and Wikipedia together, and WeRelate users have spent the past seven years continuing to clean it up and organize it better afterward. If you’re going to try to create your own current+historical place database, take the merge-time into account. Or just use the free one I posted on github.

I recently matched 7.5M places appearing in the 7K gedcoms submitted to WeRelate over the past five years to see what kinds of problems were occurring most frequently:

* We don’t have comprehensive coverage for US townships. This is on my short-list of things to add.
* We still have duplicate places in Eastern Europe due to FHLC having duplicates that were not caught.
* We still don’t have all of the historical and modern places in Europe merged (though many have been merged).
* We don’t have all of the historical jurisdictions listed.
* We’re missing some places (though not that many).

I just posted this a couple of weeks ago, so there may still be some rough edges. I know of at least one other organization who’s using it already, and I’m talking with several other organizations who are interested. I’m making it freely available so that others don’t have to go through the pain that I did.

An open-source place matcher for genealogy

I’ve developed an open-source place matcher for genealogy. It takes place texts provided in GEDCOM files and matches them to standardized place names. It’s pretty good :-). I’ll be giving a talk on it at RootsTech next week. The source code and database are freely available on Gitbub.

Thanks go to Ryan Knight for creating a web application to demonstrate it.

Open-source name variants database

I’ve developed an open-source name-variants database. We’re currently using it at This is a better algorithm than Soundex for matching variant names like Ann, Anna, Annah, Anne, Annie, etc. It results in a 28% reduction in missed variants compared to soundex, based upon a set of 100,000 pairs of names provided by I’ll be giving a talk on it at RootsTech next week. The source code and database are freely available on Gitbub.

A robust open-source GEDCOM parser

I just posted the first cut of an open-source GEDCOM parse. The parser parses GEDCOM files into a de facto object model, which is able to represent nearly all of the tag sequences found in real-world GEDCOM files. The object model includes common custom tags; other tags are represented as extensions. The object model has a JSON representation, and the toolkit includes a GEDCOM exporter. This makes it possible for anyone to read a GEDCOM file, manipulate its contents, save it to JSON, and export it back to a GEDCOM file, without loss of information for the vast majority of GEDCOMs.

Ryan Knight has created a demo server to show it off.

For more information, see the Github repo.

WeRelate mentioned in the Wall Street Journal

Read the article.  Not the quote I would have chosen, but it emphasizes WeRelate’s focus on sources. WeRelate received a lot of new traffic and activity because of it.

Evidence and sources makes genealogy both more fun – finding your ancestors in sources is fun, and accurate – recording your sources makes your work verifiable.  I believe future genealogy programs should focus more on finding and recording evidence.

Let’s create tools to make genealogy more like a game

Doing genealogy would be less expensive and more fun if more people got involved.

Recently an LDS Church leader encouraged more youth to work on genealogy, because they’re more familiar with technology that is now practically required to do genealogy. But we don’t need to encourage youth to get on Facebook or play FarmVille, and even many older non-tech play social games. Why? Because they’re fun.

A big problem with people starting to do genealogy is that they don’t know how to begin. In games this is called “onboarding” – what happens during the first few minutes of play. Games focus on onboarding and on directing the player to increasingly challenging experiences that make playing the game fun. The current flock of genealogy programs are largely dressed-up database managers. They don’t lead you to do what to do next, and don’t reward you in some way when you do it. We need to make doing genealogy more like playing a game. The process of look for a record, attach it to the tree, get rewarded, and get direction on the next record to look for needs to be a core part of the experience.

%d bloggers like this: