With the release of SQL Server 2012, I am looking more into Master Data Services (MDS) and Data Quality Services (DQS). A brief overview of DQS.
You install DQS with SQL, and you have to configure it. The server configuration is a cmd line process that runs to create some databases on your server (DQS_MAIN, DQS_PROJECTS, DQS_STAGING_DATA).
I ran into one issue with the running of the configuration, not sure if this happens everywhere, I am running Windows 8, but nonetheless, I ran into. After running the tool and getting error after error, and trying as admin, etc. I dug deeper into the error message and found that I there were some security/permission issues I had to resolve. It ended up being that I had to change permissions on
to allow write access. Once I did that, the configuration tool worked and I could get into DQS.
DQS gives you a “Data Quality Services Client” to work with. When you open it, connect to the database where you configured the three databases I talk about above. Once you, you have 3 panes.
You have Knowledge Bases, Data Quality Projects, and Administration.
Knowledge Bases: datasets of known data that you can use in your Data Quality Projects. You get a default Knowledge Base – state names and some other data similar to that.
Data Quality Projects: Here is where the magic happens. You can choose some source data (Excel xls – xlsx wouldn’t work or SQL table) and then apply your knowledge base on it. Then you can reimport your data at the end back into SQL or export it, and update your Knowledge Base with learned values.
Administration: Not a ton of options, but you can set some thresholds, and also setup your Azure data market settings.
Azure Data Market https://datamarket.azure.com/browse/Data?Category=dqs – Lots of data you can use to combine with your Knowledge Bases. Much more here and I won’t go into detail – it could be its own post in itself.
As a test, I took an excel file, added a few records with columns first, last, city, state (I actually imported into a staging SQL table to work with it) – But in the state field I put different variations of state, WI, Wis, Wisconsin, MN, Minn, Minn., etc.
I then ran the file through creating a new data quality project and ran it against the default Knowledge Base, and it corrected the values it could. Got a weird error clicking next on the project, it seems the button is touchy. Hopefully they come out with a fix soon.
Once you build up and get your Knowledge Base stable, you can use from SSIS packages or in Master Data Services. I see many useful applications for DQS. Either around your corporate data or pulling in data from Azure data market to cleanse existing data you might have (think: looking up gender from first/last name).
This post is a brief look at DQS and how it works, but there is so much more. I hope to get more in depth in the near future.