about WoLF PSORT In Japanese
Last Updated $Date: 2007/08/15 05:27:49 $

Empirical Confidence Statistics By Predicted Utility

The Information shown here

The data linked to in this page is intended to help answer the question:

How reliable is a given prediction?
However, the question that it directly (albeit empirically) answers is:
Given that a protein localizes to site A, what is the probability that it has a predicted utility of some value u or more for site A?

Utility

I borrowed the word utility from decision theory, with the intention that eventually WoLF PSORT could predict the utility of believing that a protein localization to a give site -- which combines the probability that the protein combines to each site with the (in general non-uniform) cost of the various possible mistakes. For example when only interested in predicted secreted proteins, one could consider predicting a mitochondrial protein to be a nuclear protein to be a "right" answer -- since both sites are non-secreted.

In practice I have only used this functionality as one way to address proteins with multiple localizations, by lightly penalizing misclassifications between related sites (localization classes), for example to make predicting nuclear for a protein with dual localization to the cytoplasm and nucleus be more acceptable than to predict mitochondria for that protein. For details please see our APBC06 Paper.

Example graph explained

Near the bottom of this page there is a table containing graphs like the following pair for each localization site.

HistogramSmoothed Proportion

The left hand graph is a histogram plotting the observed frequency of predicted utilities for lysosome for proteins which actually localize there (light bars) vs. proteins which localize to other sites.

The right hand graph is a smoothed curve representing the probability that a protein is a lysosome, given that it has a particular predicted utility for lysosome, under the assumption that the prior probability that a protein localizes to the lysosome is equal to the proportion of lysosome proteins in WoLF PSORT's dataset.

By inspecting these graphs, one may make some conclusions. For example

Links to graphs for each (organism, site) pair


The rows in this table represent the localization site, with links to statistics which were computed on proteins labeled with that localization site in the dataset. Note that in some cases it may be useful to look at sites other than the predicted site -- since the predicted site may not be the true site. The numbers are the number of proteins of the localization site in the dataset.
animal




cysk148
cysk_plas5
cyto1442


cyto_mito18
cyto_nucl246
cyto_pero10
cyto_plas4
E.R.425
E.R._golg9
E.R._mito18




extr3130
extr_plas19
golg100


lyso148
mito938
mito_nucl2
mito_pero15


nucl2682


pero217
plas3195


plant
chlo750
chlo_mito6
cysk41
cysk_plas1
cyto432
cyto_E.R.1
cyto_mito3
cyto_nucl11
cyto_pero1
cyto_plas1
E.R.69




E.R._plas1
E.R._vacu2
extr114


golg29
golg_plas2


mito210




mito_plas1
nucl456
nucl_plas2
pero52
plas165
vacu73
fungi




cysk34


cyto354


cyto_mito8
cyto_nucl91
cyto_pero2


E.R.66








extr140


golg38




mito435
mito_nucl4




nucl666


pero77
plas220
vacu23

seqTeam CBRC AIST Copyright (C) National Institute of Advanced Science and Technology (AIST), Computational Biology Research Center (CBRC). All Rights Reserved.