Categories
Geeky/Programming

Create a Word Cloud From Your Twitter Feed

I love playing with data. My data makes it even more fun. Wordle has been around for a long time, and so has Twitter (in Internet years anyways). I have always been fascinated by word clouds and visualizing text patterns, etc.

I figured that hey, there has got to be some analyzer for your twitter stream, and I am sure there are a ton, but I didn’t stumble upon any with some easy Googling, so I did it the hard way.

First goal? Get your Twitter feed and/or data somehow. Multiple ways to do this, but I stumbled upon a pretty cool site. http://tweetbook.in that let’s you create an eBook from your Twitter feed and favorites. It let’s you publish out as a PDF or XML file, so I figured that would work. It is a busy site and you may have to wait to get in but once you do you just oAuth it up to Twitter and grab your data.

Now, once you have your data, you need to do something with it. The data would be in XML so you need to parse out the data you want, for instance, I wanted to analyze my “favorites” so I wanted to get the text out of the XML. Here is my first favorite on Twitter (by the way, it will only go back 3200, I think – I only have 2600 or so faves)

  881539697
  A "Manager" class is like my grandmother's junk drawer.
  Fri Aug 08 14:23:56 +0000 2008
  web
  jeremydmiller
 

Well I just want to grab that <text> value, and without having to do any programming or powershell or C#, I fired up trusty old LINQPad (more on this tool in future posts for sure). I then just wrote a quick little query against the XML file like so:

var xml = XElement.Load (@"c:fave.xml");

var query =
  from e in xml.Elements()
  select e.Element("text").ToString().Replace("","").Replace("","").Replace("RT ","");

query.Dump();

As you can see, I am just loading up the xml file and doing some text cleanup (removing the xml text blocks and removing RT’s, the old syntax which muddies up the results). Note in LINQPad you need to change the query type to C# Statements instead of the default C# Expression.

Once I had my values in the results I wanted, I did a quick CTRL+A, CTRL+C (it still baffles me how many people don’t know CTRL+A is “Select All”) and then pasted it into notepad++, to view, and cleaned up some html characters there (quotes, etc) and then pasted it into Wordle. Here is what I got back:

wordle_of_my_favorites

You can see I really like to favor SQL Server, Microsoft, iPhone, Blogs, sqlpass, Google, SharePoint, Twitter, and pretty much everything geeky. Pretty dang cool. Why doesn’t Twitter offer something like this? I think it would be cool. What other cool things have you done with your “data” – what cool things would you like to see?

Categories
Geeky/Programming

Programmatically creating Excel (XLS) Files, as XML files – Things to Keep In Mind

I worked on a small project that required to export data to Excel. The spreadsheets needed to be formatted very precisely, and the best way to do this is with the XML format of an excel file. But I have found some gotchas throughout the project, which will cause the .xls files to not load.

First, since it is XML, you need to make sure you handle some special XML characters..

quotes “ should be &quot;

ampersands & should be &amp;

apostrophes ‘ should be &apos;

less than < should be &lt;

greater than > should be &gt;

Now, if you make a function or something in code to do these operations, there is one thing to make sure of. Replace the & first. If you do it last, like the quick and dirty function I wrote first, then you could end up replacing a less than with &lt; and then replacing that new ampersand with &amp; so you end up with &amp;lt; – whoops …

Another few little gotchas.. Excel tab names have to be distinct. If you have duplicates, the spreadsheet will still create just fine, you just wont be able to open it. Another thing with the tabs, you can have a division sign in them – / – same thing will happen.