-
Notifications
You must be signed in to change notification settings - Fork 5
Arbitrary Sets
Sometimes you want to look at an arbitrary set of twitter users. This set can come from a list of people or selected or organizations you picked up from a book. In my case, I wanted to relate a list of companies for a MOOC assignment. With nucoll, you can get a quick picture of how the people or organizations relate.
nucoll is able to initialize from tweets from a search query stored in a .qry file. Instead of creating the query file with nucoll tweets you can create manually a list of handles you are interested in. For example, using my text editor, I create a file with the extension .qry called companies.qry which contains the following text.
@nike
@pfizer
@hugoboss
@ikea
@swatch
@pampers
@redbull
@samsungmobile
@total
@kodak
@caterpillarinc
@nutellausa
@loreal
@lacoste
@versace
@nestle
@proctergamble
@dior
@burgerking
@disney
@google
@toyota
@lenovo
@toyota
@lenovo
@cocacola
@nikon
@carlsberg
@tatacompanies
@thenorthface
@michelintyres
@wengerbrand
@arcelormittal
@levis
@ibm
@philipspr
@emirates
@bmw
@monsantoco
@ebay
@abercrombie
In order to process the file with nucoll, proceed in two steps: first build the .dat file and fetch the second-degree relations using fetch as follows.
$ nucoll init -q companies
Note I omitted the .qry extension. Next, fetch the second-degree relations. This process takes is long if you need to process many handles. Twitter imposes a pause of 15 minutes every about 16 handles.
$ nucoll fetch companies
If you notice too many skipped handles, you can always re-run fetch passing a higher count like so:
$ nucoll fetch -f -c 50000 companies
Now we have the companies.dat file and a set of .f files in the fdat directory. We can now proceed to generating the GML file.
$ nucoll edgelist companies
This will generate companies.gml which can be processed with Gephi, for example.