dimensional modeling - Best practice for natural keys in a dimension that includes data from multiple source table -
i designing dimension data warehouse includes several related attributes various tables. when loading fact tables surrogate keys dimension tables based on keys from source system rather text matching on various attributes. situation 1 i'm facing preferable have several source system key columns in dimension table (one each of relevant tables) lookup on, or create single lookup column using sort of hash or concatenation?
please let me know if need more info.
the best practice column amounts 'source system' , 1 (or more) columns of unified type accommodate native keys of source systems (probably bit of head room future proofing).
a hash or concatenation identify source should seen workaround when can't control data model.
a 'source' column helps lineage.
so suppose have 3 source systems varying 'product code' formats char 8 , 10 , 15 respectively. add columns:
sourceid char(5) - e.g. or further surrogate look-up 'source' table. productcode char(15)
15 = max(8,10,15).
or vchar(20)
depending on whether can expect future acquisition of sources. 20 characters pretty large source identifier. consider practices in relevant problem domain.
never
sourceid char(5) productcode1 char(8) productcode2 char(10) productcode3 char(15)
if because if source 4 shows you're adding columns. because no report usefully displayed that. hard join generic 'common' tables might have deal with. may find waste storage , bloat indexes wasted space of impair performance opting vchar
.
Comments
Post a Comment