scispacy | SpaCy models for biomedical text processing
allenai.github.io › scispacyscispaCy models are trained on data from a variety of sources. In particular, we use: The GENIA 1.0 Treebank, converted to basic Universal Dependencies using the Stanford Dependency Converter. We have made this dataset available along with the original raw data. word2vec word vectors trained on the Pubmed Central Open Access Subset.
Issues · allenai/scispacy · GitHub
github.com › allenai › scispacyscispacy doesn't mark end of entities correctly. #316 opened on Feb 13 by gitclem. 5. investigate merging parser/tagger datasets and ner dataset in order to not train multiple tok2vecs project. #310 opened on Feb 5 by danielkingai2. try out spacy ray for faster training. #305 opened on Feb 4 by danielkingai2. train a transformer based model.