Skip to content

dusicastepic/ADMFifthHomework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Homework 5 - Visit the Wikipedia hyperlinks graph!

In this assignment, we perform an analysis of the Wikipedia Hyperlink graph. In particular, given extra information about the categories to which an article belongs to, we are curious to rank the articles according to some criteria.

For this purpose, we use the Wikipedia graph released by the SNAP group, but the reduced version.

More details about the task on the following link.

snap graph

  1. First, we have downloaded Wikicat hyperlink graph data. It is a reduced version of the one on SNAP. Every row is an edge, the two elements are the nodes (source and destination).
  2. From this page we downloaded:
    • wiki-topcats-categories.txt.gz (the list of the articles which belong to each category)
    • wiki-topcats-page-names.txt.gz (the names of the articles and its identification number)

The main goal was to answer the following research questions:

[RQ1] Build the graph [G=(V, E)] , where V is the set of articles and E the hyperlinks among them, and provide its basic information:

  • If it is direct or not
  • The number of nodes
  • The number of edges
  • The average node degree. Is the graph dense?

[RQ2]

  1. Building Block Ranking

    Based on the implementation of the shortest path algorithm( Breadth First Search algorithm) compare sample number of nodes of C0-input category with all nodes in all the other Ci categories in order to build the block ranking.

  2. Ranking nodes of each category in the created block ranking vector and selecting top 3 and finding article names for it

The repository contains the following files:

  1. Homework 5 - RQ1 pre-check up.ipynb:

    In this notebook, we decided to put just the exploration of research question 1 and the conclusions we made. Based on that we can use the right networkx method in the making of a graph and to double check the results and conclusions we made.

  2. Homework 5 - RQ1 and RQ2.ipynb:

    A notebook with all the steps: reading data and making of a graph, calculating short distance paths, building block ranking and ranking nodes of each category.

Authors are:

  • Dusica Stepic
  • Valerio Antonini

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors