Pivoting 3Million user records with respect to FB User and Type, Subtype
Posted: May 1st, 2009 | Author: Anuradha Uduwage | Filed under: Facebook, Java Ruled | Tags: Data Mining, Facebook | No Comments »So far the Our Data Mining Implementation for Facebook Data is going really good, I am currently working on an algorithm to identify 1 and n items sets out of our raw data set using the extraction I build earlier. But I just finished writing a sweet little code to pivot 3 million user records.
Our implementation has a DB.java file that handles all our DB call and direct sql stuff, I know what you thinking we could have use some fancy hibernate but this is fast paced development so we dont have time to work with hibernate stuff.
1 2 3 4 5 6 | // connect to the database, any database connection changes should take // place in DB.java DB db = new DB(); DB db2 = new DB(); db.init(); db2.init(); |
Call a DB.java to get disntinct type sub type facebook groups, and dump them in Java String Vector.
1 2 3 4 5 6 7 8 9 10 11 12 13 | // get the type and sub type Vector<Vector<String>> grpTypeSubType = new Vector<Vector<String>>( db.getTypeSubType()); // printing all the type sub type pairs as column headers //System.out.print("UserId, "); ps1.print("UserId, "); for (Object object : grpTypeSubType) { if(!object.toString().equals(null)) { System.out.print(object.toString() + ","); ps1.print(object.toString() + ","); } } |
Here we go fun begins, this loop ran 38 hours and finished the pivoting for 3m records.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | // get user id vector and find type and subtype for each userid. Vector<Long> users = new Vector<Long>(db.getDistinctUID()); ResultSet fbUsers = db.getUsers(); System.out.println(); ps1.println(); System.out.println("Start Pivot.."); while (fbUsers.next()) { Long userId = fbUsers.getLong(1); //System.out.print(userId + ","); ps1.print(userId + ","); String groupType = null; String typeSubtype = null; int count = 0; for (Vector<String> vs : grpTypeSubType) { groupType = vs.get(0); typeSubtype = vs.get(1); if(groupType != null && typeSubtype != null) { count = db2.getTypeSubTypeCount(userId, groupType, typeSubtype); if (count > 0) { //System.out.print("Y, "); ps1.print("Y, "); } else { //System.out.print("N, "); ps1.print("N, "); } } } //System.out.println(); ps1.println(); } fbUsers.close(); ps1.close(); System.out.println("Done pivot, check the file"); } |
You can get more information on our Data Extraction and FB Data Mining implementation at Google Code under GNU General Public License v3. Also you are more than welcome to use multiple Preprocessed data sets that I formatted to fit in applications like Weka etc.


Leave a Reply