国产精品天干天干,亚洲毛片在线,日韩gay小鲜肉啪啪18禁,女同Gay自慰喷水

歡迎光臨散文網(wǎng) 會員登陸 & 注冊

STMLST: Serotype Identification By Multi-loci Sequence Typing

2023-07-11 12:06 作者:抗黑眼圈斗士  | 我要投稿

2????? Methods

The analysis procedure of STMLST is depicted in Figure 1. STMLST firstly reformats the input file to FASTA format and maps the input sequences against a alleles sequences database. After parsing the mapping result, STMLST obtains the formatted data that could be used to identify a list of pertinent organisms. STMLST records a high score to an organization if the “Subject sequence length”, “Alignment length” and “Number of identical matches” in the formatted data are equal, and a low score if they are not. At this point, we can get a list containing the organisms and the corresponding scores. The above operations are based on the following principle: if the input sequences have high similarity to the alleles of an organism, the input sequences have a high probability of belonging to that organism. STMLST uses the information of organism with the highest score to construct a search statement and searches the sequence type and serotype database with this statement. Finally, STMLST outputs the subtyping result of the input sequences. Detailed data collection and algorithmic explanation of STMLST are in Methods.

2.1 Data Preprocessing

The data required to run the full functionality of STMLST is divided into three parts: a key alleles database for finding similar key alleles, a sequence type database for finding sequence types based on key alleles, and a serotype database for finding the corresponding serotypes based on sequence types. All three types of data are downloaded from PUBMLST, and the relationship between them is shown in Figure 2. The key alleles database consists of downloaded key alleles from more than one hundred organisms. We write local scripts to download these gene sequences and build a blast index to find similar key alleles by fast alignment. The sequence type database is used to store the mapping of different combinations of key alleles to the sequence types of the organism, which we download and store in the SQLite database using a local script. There is a non-one-to-one mapping relationship between serotypes and sequence types, which we extract from PUBMLST and store in the SQLite database.

Fig. 2.?The data preprocessing for subtyping.



2.2 Identification Strategy

?

We first align the input sequenced sequences with the key alleles database, record the key gene sequences that are successfully aligned with the input sequenced sequences, and mark the records of key alleles into three states according to the different degrees of similarity of the alignment. After all markers were recorded, each candidate organism is given a score based on the marker results. The rules for scoring are shown in Figure 3. According to Equation 2, x represents the number of different alleles that are similar for a given allele, the more the better thus the higher the final score f. θ represents the weights corresponding to different degrees of similarity, with larger θ representing greater similarity and thus higher final score f. The calculated s is the score corresponding to the alignment result of one allele of the organism. According to Equation 2, after accumulating the scores of all alleles to obtain the final score f, STMLST obtained the most likely organism to which the input sequencing data belongs. This organism is then searched in the sequence type database using the key alleles in the records, and the sequence type to which the input sequencing data may belong is obtained based on the mapping relationship between the key alleles and the sequence types. Finally, the possible serotypes are obtained by searching in the serotype database based on the sequence type. Since the data on serotype identification is not yet complete, we have combined it with SeqSero2 as a supplement. We import the serotype identification results of SeqSero2 as a supplement when the data is not sufficient resulting in a null result for Salmonella serotype identification. This measure combines the advantages of two different implementations of the subtyping and could effectively improve identification accuracy.


?s%3D1%2F(1%2Be%5E%7B%CE%B8x%7D%20)? ?(1)

f%3D(100-%5Csum%5Cnolimits_%7Ballele%7D%5E%7Balleles%7D%20s)*w

?

???????????????????? ??(2)

?

Fig. 3.?The scoring process for organisms.



STMLST: Serotype Identification By Multi-loci Sequence Typing的評論 (共 條)

分享到微博請遵守國家法律
勐海县| 芦山县| 鸡泽县| 成安县| 班戈县| 台东县| 河西区| 威宁| 嘉峪关市| 裕民县| 于田县| 淳化县| 清涧县| 同德县| 镇巴县| 建宁县| 邹城市| 泰州市| 漳州市| 安平县| 淳化县| 阿尔山市| 通辽市| 临夏县| 康定县| 东平县| 友谊县| 榆林市| 晋城| 双牌县| 铜山县| 民丰县| 万全县| 贺州市| 四平市| 邛崃市| 林口县| 新津县| 托克逊县| 庆云县| 西吉县|