Data Analysis On ARC Data

Visualizing Key Connections: Analyzing Millions of Articles of Australian Research Council

Project Name:

Analyse millions of Australian Research Council (ARC) data using large language models and visualize further in the 3D knowledge graph.

Project Objective:

Visioned by the Research Graph Foundation, the AI Graph project aims to create an improved pipeline to gather, clean, and investigate information from an enormous number of academic papers. The core of the project is its capacity to see this metadata as a linked graph network using the potent Neo4j tool, turning scattered academic data into a coherent, ordered, network with assessability.

Skills:

Technical Skills:

1. Python: Created data pipelines and pre-processed data.

2. Neo4j: Visualized graph databases and identified relationships of the ARC node.

3. REST API: Fetched data from official sites.

4. TailScale VPN: Ensured remote connectivity for remote access.

5. JavaScript 3D library: Built a 3D graphs knowledge graph with the 3D Force Graph JavaScript library.

Non-Technical Skills:

1. Communication: Engaged with supervisors and team members in daily meetings and weekly show-and-tell presentations.

2. Writing: Authored weekly Medium articles and daily LinkedIn posts.

3. Teamwork: Collaborated with supervisors and colleagues, incorporating their feedback to achieve project goals and improve Medium articles and LinkedIn posts.

4. Problem-solving: Increased business acumen by writing code and fine-tuning it to build a 3D knowledge graph.

5. Analytical Skills: Developed analytical skills by pre-processing data and adding relevant properties and features.

Workflow of analysing data

Experience:

Programming experience:

I interned at the Research Graph Foundation, where I worked on the AI Graph project. In the diagram of the above pipeline, first of all, I scrapped all data via REST API, then built data pipelines to pre-process data including data cleaning, formatting, and exporting output in appropriate format. Later, Tag records with AI taxonomy using Mistral (using abstract, title, and keywords) and create a knowledge graph (Research Graph) from the content of the metadata records. Finally, Visualisation of the knowledge graph using 3D Force-Directed Graph, where I highlighted important relationships and showed communication between organizations by showing which AI nodes were relevant to them. These experiences helped me gain a unique perspective on graph databases which I never work before and taught me to leverage state-of-the-art technology for our projects to optimize project results.

Writing Experience:

Throughout this internship, I am not just working on the coding part, but also, I also started building academic writing that would help my future goal to pursue PhD from an esteemed university. I am enjoying discussing with my senior members in writing and editing content so the layman can read get benefits from it and provide a comprehensive understanding of difficult LLMs discovery such as diffusion models, Retrieval Augmented Generation, and so on in an easy way. This experience was very enjoyable, and intense, with a plethora of learning, and many fruitful outcomes that helped me thrive in a professional and academic

Achievements:

1. Developed Advanced Data Pipelines: Efficiently scraped and pre-processed data from the Australian Research Council using Python and REST API.

2. Implemented AI Taxonomy Tagging: Utilized Mistral to tag records with AI taxonomy based on abstracts, titles, and keywords.

3. Created 3D Knowledge Graphs: Built and visualized intricate academic networks using Neo4j and a 3D Force-Directed Graph JavaScript library.

4. Fostered Team Collaboration: Actively sought and integrated feedback from supervisors and colleagues to achieve project goals.

5. Refined Problem-Solving Abilities: Overcoming challenges in data processing and visualization, enhancing my problem-solving skills.

6. Expanded Technical Expertise: Gained hands-on experience with tools like Neo4j, Tailscale VPN, and various JavaScript libraries.

7. Contributed to Academic Knowledge: Published informative articles on complex AI topics, making them accessible to a broader audience.

Result:

This is a visualization result: Brown balls have nothing to do with artificial intelligence; the results reveal that green balls are related to AI articles. Their affiliation discloses which companies and investors they know. This also clarifies how much funding the government provides for artificial intelligence research. In the Future, we can extend this visualization far better way showing multiple relationships about not just AI-related content but also specialized fields like Computer Vision, Reinforcement Learning, Deep Learning, etc.

Thank you for your interest!