The "Learning Forest" was developed in the course of an NLP project at DHBW Mannheim by the following team members:
- Alina Buss (4163246)
- Andreas Dichter (6104795)
- Can Berkil (2087362)
- Paula Hölterhoff (9633299)
- Phillip Lange (5920414)
- Simon Schmid (9917195)
Development Teams
- Webapp (Alina Buss, Andreas Dichter)
- Answer-Checker (Can Berkil, Simon Schmid)
- Question-Generator (Paula Hölterhoff, Phillip Lange)
Setup
- install docker (including docker-compose)
- navigate to NLP/webapp
- run "docker-compose up" or "docker-compose up --build"
- If there are any problems, try "docker-compose down --volumes" or "docker kill $(docker ps -q)"
- the build process may take a while
- Once done you can navigate to "localhost:5000" in your browser
Using the webapp
- upload your document (.docx)
- be sure that your document follows the needed structure
- you can also use our example (example-computational-linguistics.docx in the Documentation-Folder)
- the question generation can take 1-3min depending on your hardware
- now you can use the learning and exercise pages as much as you want!
If you want to take a closer look at the database...
- navigate to "localhost:1234" in your browser
- choose "PostgreSQL"
- log in withe following data:
- User: "postgres"
- Password: "securepwd"
Documentation
- Within the documents "upload-prozess.png" and "exercise-prozess.png", you can see a flowchart for the upload- and exercise-process
- The document "Präsentation" is corresponding to the presentation held on 18.01.2022
- There is an example-document to check the upload-process, named example-computational-linguistics.docx:
Answer-Checker
-
In this part the user input is compared and evaluated with the actual solutions (with the help of TF-IDF)
-
The evaluation is then used to decide whether the answer is correct or incorrect (If the score is >= 50% then the answer is right, otherwise the answer is wrong)
-
Libraries used in the process: Pandas, Numpy and Sklearn
-
Process stages:
- Get the answer from the User and the actual answer
- The TfidfVectorizer gets created (it is important that the stop words are rated lower, as they are not meaningful for the calculation of the score)
- The model is applied to the two texts
- The similarity of the texts are then calculated
- Based on the score, it is then decided whether the answer is wrong or right