User:Jrm03063/Wikimedia Alignment Project

Contents

About Wikidata

The Wikimedia Foundation supports the various language versions of Wikipedia in a number of direct and indirect ways. Provision and support of the MediaWiki Software being relatively well known. Of particular interest to us however, is the means by which the different language versions of Wikipedia are able to refer to different language pages for the same subject.

Visiting the English Wikipedia language page for Charlemagne, in the left hand column, we find a sub-heading labelled "Languages". There, a separate link appears for each language version of Wikipedia that contains a biographical page for Charlemagne. While each language hosting a page for Charlemagne could refer to all the other language versions of Charlemagne - that would create a hopeless management problem.

Instead, the Wikimedia Foundation makes use of the Wikidata knowledge base to consolidate information on what versions of Wikipedia host a page for Charlemagne. The Wikidata page for Charlemagne (Q3044) presents a collection of statements that reference information related to Charlemagne. One section late on the page relates particularly to Wikipedia - where all the language versions of Wikipedia having an article on Charlemagne are listed once. Individual language versions of Wikipedia are aligned by virtue of the shared Wikidata page and identifier.

In addition to the ability to uniquely identify references to people in different language versions of Wikipedia - Wikidata also defines a number of useful genealogy relationship properties - Mother, Father, Sibling, Spouse and Child (as well as others).


Relevance for WeRelate

In the same way that Wikipedia uses Wikidata to consolidate references to pages on a given subject, WeRelate too has obtained its own identifier property on Wikidata.

WeRelate also aligns with Wikimedia on the basis of Person pages are tagged with an identifier flagging their corresponding Wikimedia identity. This is exactly what the Wikidata Template was created to accomplish. Looking at the WeRelate page for Charlemagne, the "Reference Number" fact has been established with an active reference to the Wikidata ID for Charlemagne. Indeed, since Wikidata IDs are unique per item of information (in this case, the instance of Charlemagne) - a WR Person page search using the Wikidata ID as a keyword takes us quickly to the page for a corresponding person in WeRelate.

Since WeRelate is intended to both accept and export data in GEDCOM format - addition of the Wikidata ID to WeRelate Person pages means that exported GEDCOM files will contain Wikimedia alignment information as needed.


Compare and Contrast - WeRelate and Wikidata

WeRelate and Wikidata both provide structured representations of genealogy information. Programs can readily be created to read and understand genealogy relationships between different people. Introduction of the Wikidata ID into WeRelate - means that we can compare and contrast genealogy relationships between people represented in WeRelate and in Wikidata.

A simple program has been created to automate this process (find it at github). It operates as follows:

  • Search WeRelate for all Person pages that contain a Wikidata reference
  • Read each WeRelate page found in the previous step, extracting -
    • The Wikidata ID
    • The gender of the person/page
    • The list of Family pages on which this person is indicated as a child (should be only one)
    • The list of Family pages this person is associated with as a parent
  • Report simple errors -
    • If page has unspecified gender
    • If more than one WeRelate page corresponds with the same Wikidata ID (usually an error - but a number of previously unidentified duplicates have been found).
  • Using the list of Family pages found in the previous step, detetermine for each Person with a Wikidata ID -
    • If father or mother are present and associated with a Wikidata ID
    • If spouses are present that are associated with a Wikidata ID
    • If siblings are present associated with a Wikidata ID
    • If children are present associated with a Wikidata ID
  • For the list of Person pages with Wikidata IDs, read the corresponding Wikidata page. Recover genealogy relationships and compare them with those already computed from the information on WeRelate. Report discrepancies or consistency.


Right Now (14 May 2018)

  • Wikidata identifiers have been placed on WeRelate Person pages starting in about May of 2016. This amounts to revisiting all the Person pages that have been previously associated with an English Wikipedia page. It was also quickly realized that use of the Wikidata identifier allows for WikiMedia association for people who do not yet have an English Wikipedia biographical page.
  • At the moment, there are ~31,250+ Person pages tagged with a Wikidata ID. Of those, ~22710 contain the wikipedia notice indicating they are associated with an English Wikipedia page. So about ~5650 pages, not currently associated by English Wikipedia, have been tagged with a Wikidata ID. That number includes pages that formerly existed in English Wikipedia, pages that existed which were not previously tagged, and new pages created with tags as part of this latest effort.
  • A search for Person pages with the wikipedia notice but no Wikidata tag only yields a handful of Person pages that include non-biographical material from Wikipedia. So the effort to revisit existing pages, that included English WP biographies, is complete.
  • Of the ~31,250 WeRelate Person pages tagged with a Wikidata ID, ~28535 are "adjacent" (in terms of Wikidata Genealogy claims) with another tagged Person page. ~20662 are exactly consistent with Wikidata, in terms of the relationships found. The rest either lack relationships present on Wikidata, have relationships not present on Wikidata, or both.
  • About 9000 ancestry-related contributions have been made to Wikidata. Added claims for Father, Mother, Sibling, Spouse, Unmarried Partner or Children - but also removal of some plainly incorrect claims and some identities that needed to be merged.
  • ~24,000 of the tagged WeRelate Person pages - are referenced by their corresponding Wikidata page.