Data analysis and reports

Here are two excel files with abstracts. You could consider one corpus the training corpus, the other the test corpus which you would only use to evaluate the outcome of screening.Background: CBDI is short for Cross-Border Digital Inclusive Entrepreneurship. The corpus you received is the set of articles that Tony has collected. This makes them all positive examples of papers to include.To train the asreview system you also need negative examples. In fact, you will need at least as many negative examples as positive ones. These can be examples you find yourself. Since the goal of using the CBDI corpus is to find more good references on the topic, we want to train the machine learning model used by asreview on the articles that we know to be good, as well as articles that will turn up in a search for keywords like “digital ecosystem”, but which are not related to the topic. For instance, I found the words digital and ecosystem used in an article on the composition of petroleum, obvious not a relevant article.There are several relevant questions you should be asking yourself:1. How many labeled articles do you need to be able to identify good new articles. Are 50 sufficient? Do you need 100, or do you need 500 examples?2. How do evaluate the quality of the model asreview creates? Asreview has a simulation capability which might be useful for that. One evaluation criterion should be: how many of the articles in the test set did you find?3. What output does asreview produce (eg it includes a ranking of articles) and how can we use them.4. How can you add new articles to a corpus that you have used to train a model in asreview? Presumably, you can export a labeled corpus and add new articles to it.5. What are practical strategies for using asreview (or any other screening tool)? The asreview documentation includes a discussion on that. There are different aspects: how long does it take to screen a corpus, how accurate is the model, etc.