With the recent attention to Big Data, incredibly large datasets produced as a byproduct of worldwide online activity, there is a concern that messy data will be impossible to resolve at that volume.
I've recently seen Google Refine which puts the power of human meaning making into large datasets. It provides the user the ability to find patterns and apply global changes to them to easily create more robust data.
When I was a computer consultant I was often asked to help clean up dirty data. I knew all sorts of tricks in Microsoft office but they pale in comparison to what I see in Google Refine. I'm going to experiment with it and see where it can be an advantage to Heretics who are looking for Big data as a powerful new tool.
NOTE: Google Refine has been changed to an open source project named Open Refine. You can view the documentation here: https://github.com/OpenRefine/OpenRefine
NOTE: Google Refine has been changed to an open source project named Open Refine. You can view the documentation here: https://github.com/OpenRefine/OpenRefine
No comments:
Post a Comment