SQL Server 2012: Data Quality Services

With the release of SQL Server 2012, I am looking more into Master Data Services (MDS) and Data Quality Services (DQS). A brief overview of DQS.

You install DQS with SQL, and you have to configure it. The server configuration is a cmd line process that runs to create some databases on your server (DQS_MAIN, DQS_PROJECTS, DQS_STAGING_DATA).

I ran into one issue with the running of the configuration, not sure if this happens everywhere, I am running Windows 8, but nonetheless, I ran into. After running the tool and getting error after error, and trying as admin, etc. I dug deeper into the error message and found that I there were some security/permission issues I had to resolve. It ended up being that I had to change permissions on

C:WindowsMicrosoft.NETFramework64v4.0.30319Configmachine.config

to allow write access. Once I did that, the configuration tool worked and I could get into DQS.

DQS gives you a “Data Quality Services Client” to work with. When you open it, connect to the database where you configured the three databases I talk about above. Once you, you have 3 panes.

You have Knowledge Bases, Data Quality Projects, and Administration.

Knowledge Bases: datasets of known data that you can use in your Data Quality Projects. You get a default Knowledge Base – state names and some other data similar to that.

Data Quality Projects: Here is where the magic happens. You can choose some source data (Excel xls – xlsx wouldn’t work or SQL table) and then apply your knowledge base on it. Then you can reimport your data at the end back into SQL or export it, and update your Knowledge Base with learned values.

Administration: Not a ton of options, but you can set some thresholds, and also setup your Azure data market settings.

Azure Data Market https://datamarket.azure.com/browse/Data?Category=dqs – Lots of data you can use to combine with your Knowledge Bases. Much more here and I won’t go into detail – it could be its own post in itself.

As a test, I took an excel file, added a few records with columns first, last, city, state (I actually imported into a staging SQL table to work with it) – But in the state field I put different variations of state, WI, Wis, Wisconsin, MN, Minn, Minn., etc.

I then ran the file through creating a new data quality project and ran it against the default Knowledge Base, and it corrected the values it could. Got a weird error clicking next on the project, it seems the button is touchy. Hopefully they come out with a fix soon.

Once you build up and get your Knowledge Base stable, you can use from SSIS packages or in Master Data Services. I see many useful applications for DQS. Either around your corporate data or pulling in data from Azure data market to cleanse existing data you might have (think: looking up gender from first/last name).

This post is a brief look at DQS and how it works, but there is so much more. I hope to get more in depth in the near future.

#sqlpass 2011 Summit Thoughts

Want to get a quick post out before I forget everything cool I have been thinking about the 2011 SQL Pass Summit in Seattle WA.

The conference started out great with a first timer reception (this was my second year), and I knew a few first timers so that would great.

The first day keynote was good, and then the sessions I hit were on a full range of things, new semantic search stuff with file table, and more.

I missed a couple of sessions on day 1 because I was in a Microsoft led focus group on “BI in the cloud” some very cool ideas thrown out and excited to see what is coming up in the future.

More good keynote on day two, a deeper dive into the stuff in SQL Server 2012 (formerly Denali). Went to some good sessions on Data Quality Services (DQS) and Vertipaq vs OLAP, and Power View (remember the space!) SharePoint and Power View integration, etc. Some very cool stuff coming soon with the 2012 release.

Tabular cubes vs Multidimensional cubes. OLAP vs Vertipaq, etc. Big debates and questions here, when to put things in Tabular vs MDX, etc. DAX solves some hard problems we run into with MDX.

Another thing I went to a couple sessions on was StreamInsight. This is some very cool technology for complex event processing, using .NET and LINQ. The StreamInsight guys are crazy smart. Also integrating StreamInsight with Azure, and doing some crazy processing and analysis of tons of data in near real time. I see this technology making a difference somewhere down the road.

Power View was another big winner from the summit. This integrates with SharePoint 2010 as a Silverlight (SL still isn’t dead?) report builder and viewer. Not even sure they are called “reports” anymore, but “views”. You create views and you can interact/edit them in real time. Export to PowerPoint too. These views work off “tabular” vertipaq BISM (BI semantic model) cubes. This technology is going to change how we deliver information to the business.

Being able to take a PowerPivot (no space!) and reverse engineer it into SQL Server Data Tools (formerly BIDS – in VS2010 shell) to create a BISM tabular cube and publish to SSAS Tabular instance looks great. Solving problems like crazy snapshotting and many to many and time analysis using DAX looks very useful.

Of course, GameWorks appreciation night was fun, and I found a Mcmenamins (Six Arms) up the road from the convention center, so I got get a taste of Ruby again. Good conversation and times at the Taphouse and around the Sheraton and Convention center was good too. Seattle is a nice city and the weather was pretty good I thought. Another great summit.