Hello All,
I have a list of 5 datasets that include various numbers of records of individuals' information.
What I have so far is an object oriented program that treats individual records as unique objects (i.e., call it "Individuals") with 4 properties useful for finding its potential match in any of the datasets.
I want to be able to start at the top of the group (of 5 datasets) with dataset 1 taking one-record-at-a-time, then check for any matching records to a record (one-record-at-a-time) in dataset 2, and then check further down the group if
the same record matches one in dataset 3, and so on ... until the end (i.e., dataset 5).
One might view this problem like running a relay, where individual 1 (from dataset 1) passes a baton (maybe an ID label?) to the next person (from its match in dataset 2), and so on ... until the end of the race is reached (i.e., dataset 5).
Passing some sort of "baton" makes sure that the desired output (i.e., sequences of binary 1 [yes's] and 0 [no's] that tell whether or not an individual is present in any of the datasets further down the group) doesn't contain repeat sequences
or sequences corrupted by "similar-looking" records.
I know that all of this sounds extremely cryptic, but I could really use some help with finding ideas on where to start implementing a matching-records', tracing-sequences', and constructing-binary-sequences' (i.e., the desired output: a list of binary sequences
representing unique individuals) scheme such as this.
Does anyone have some helpful tips here (especially relating to matching records, tracing sequences, and constructing the output)?
Thank you for taking the time to read this thread, and I appreciate any feedback that you can provide.
-AD-
AndrewDen