sql server - Data Modelling multiple table of same type into a single table to aggregate all the table into one single table -
i have question regarding data modelling. suppose have following tables 3 student tables. source_table1 contains a_id primary key , name attribute. source_table2 has b_id primary key , name & address other attributes.source_table3 has c_id primary key , name, address , age attributes. if want create new table student master records in table, how can that? if creating cross reference table how should approach problem?
integrating data different sources complicated. in end, want end like:
student (student_id pk, name, address, source1_id, source2_id, source3_id)
however, there issues resolve there.
identity
how identify matching records in different sources? looks sources use surrogate identifiers, have no meaning outside context of source databases. you're looking suitable natural key. common denominator among sources student's name, names notoriously poor identifiers.
it can useful test data rather assume or won't work. example, query such as:
select s1.name, count(*) amount student_source_1 s1 inner join student_source_2 s2 on s1.name = s2.name group s1.name having count(*) > 1
repeated (student_source_2, student_source_3) , (student_source_1, student_source_3) should give insight size of problem.
you match student_source_2 , student_source_3 based on both name , address. might give better results, or worse if 2 sources have different addresses (or spellings thereof) same student. brings our second concern:
inconsistency
assuming can resolve identity problem, may need deal inconsistent data. if sources 2 , 3 have different addresses same student? how determine correct address?
in cases, sufficient map sources without resolving inconsistencies.
winging in real world
one technique use on harder cases build mapping table hand, e.g.
student_map (student_id pk, source1_id, source2_id, source3_id)
each of source_id columns should have unique constraint, , 3 nullable. first step toward student table above.
i start inserting perfect 1-to-1 matches, left join each of sources mapping table unmatched records. having unmatched source records side-by-side , sorted makes easy visually spot matches. it's tedious , error-prone work, must done regardless. inconsistencies might choose complete/best looking source base, , fill in gaps other sources. if can involve teachers or people familiar actual students, or present them alternatives choose from, means so.
more data can extremely useful. if sources have social security numbers, family information, etc, these can used match students. use number of queries find perfect matches among various pieces of information, , insert mapping table, before doing side-by-side matching.
you may find source has internal consistency problems due poor design - e.g. multiple records same student. may require fixing source data before continuing.
a understanding of relational model of data invaluable kind of work, since you'll identifying candidate keys, following dependencies , encountering anomalies.
Comments
Post a Comment