i brought that up on irc but it was decided that we wanted the work "rank" in there, and "sort into different ranks" didn't accomplish what we wanted. i agree with you but seemingly nobody else does, whateverI have missed the IRC sessions, since I have been working 8-5 this week, so pardon me if this has been gone over, but should there not be some mention of RBY/GSC not really having a BL tier at all? I mean, things like Tentacruel who are BL in RBY (and pretty much BL in GSC too!) just get shoved in OU there!
Also, this is just minor, and I understand you say rank to emphasize that it is a way of separating power and all, but
"Smogon's tier system is used to rank Pokemon into several groups based on their perceived power and usage in competitive play."
I am pretty sure 'rank into' is not quite right, so I would replace it with separate Pokemon into.
Bolded indicates a minor edit. Some of the changes I made were purely because I saw redundant words in a few sentences (I was corrected on this many times in the sections I wrote). This is just my own opinion, of course; people many disagree on how it is presented.<p>The OU tier contains, as already mentioned, the Pokemon that are used commonly in the standard metagame. The reader might be interested to know how the Pokemon in the OU tier are selected to make part of the tier.</p>
<p>For RB, GS and RS, the OU tier is formed from the experience of our community, of which Pokemon are commonly used in Smogon's standard metagame tournaments. In tournaments, people use Pokemon that can compete at the highest level to allow them to win, so naturally they are an excellent means of determining the OU tier.</p>
<p>The OU tier for DP is constructed from the league statistics extracted from the current DP Pokemon battling medium. These statistics provide the number of times each Pokemon was used in the standard metagame league during each month, and those of the three months prior to the OU tier creation are utilised in particular. The Pokemon commonly used by expert players, who are highly-ranked in the league, receive heavier weighting than those used by less experienced opponents. Furthermore, the statistics from the previous month are made to influence the OU tier more than those from the two months before it. This is done to make the OU tier reflect the frequently used Pokemon of the current standard metagame. Since, compared to the previous three generations, the DP standard metagame is still in its infancy, the OU tier for DP is continually updated on a three-month basis.</p>
in my implementation of this there is no need for a descending sort, so i see no need to mention it here.<p>After this is done, each of these overall usages is divided by the total of overall usages, so that each value becomes the probability of that Pokemon being used in battle. These probabilities are then sorted in descending order and made into a cumulative frequency, so that the 30th number, say, would be the probability that one of the top 30 used Pokemon is used in battle. Finally, the OU list is made to consist of all the Pokemon whose cumulative probability of being used does not exceed 0.75. This would mean that, whenever a Pokemon switches in or leads, it has a 75% chance of being a member of the OU tier.</p>
def read_month(month_weight):
for pokemon, value in read_part():
unweighted, weighted = usage.get(pokemon, (0, 0))
usage[pokemon] = unweighted + value, weighted
for pokemon, value in read_part():
unweighted, weighted = usage.get(pokemon, (0, 0))
usage[pokemon] = unweighted, weighted + value * month_weight
read_month(0.5)
read_month(0.30901699437494745)
read_month(0.19098300562505255)
ou = []
uu = []
frequencies = ((pokemon, weighted/unweighted)
for pokemon, (unweighted, weighted) in usage.iteritems())
for pokemon, frequency in frequencies:
if frequency <= 0.75:
ou.append(pokemon)
else:
uu.append(pokemon)
As I understood it, you're dividing the weighted data by the total of the unweighted ones, whereas you should be dividing by the total of the weighted ones. In fact, you don't even need to store the total of the unweighted data at all.
I misunderstood because you used the word "usages". On the the Shoddybattle website, the Unweighted data uses the word "usages". This needs a clarification. I'm dividing by the sum of all of the "points", correct?After this is done, each of these overall usages is divided by the total of overall usages, so that each value becomes the probability of that Pokemon being used in battle.
Okay, this makes more sense with the usages/points clarification. I'll try to do another draft Monday. This actually simplifies the code a whole lot.Then you'll need to sort this data. It won't always be already in descending order, even though the data supplied from Shoddybattle is sorted. There might be slight fluctuations. After sorting, you add them up cumulatively... you don't do it before, and allow those Pokemon having cumulative frequency less than or equal to 0.75.
Pokemon October November December
-----------------------------------------------
Blissey 0.125527966 0.137021774 0.132424493
Bronzong 0.058632949 0.065429965 0.07033436
Cresselia 0.059004315 0.068485692 0.06608003
Garchomp 0.110138149 0.108333635 0.108279658
Gengar 0.083996813 0.086081565 0.087012629
Gyarados 0.088535978 0.075296246 0.076388762
Heracross 0.067997745 0.047958357 0.058406565
Infernape 0.058738583 0.071237375 0.064366575
Metagross 0.065206004 0.057619578 0.060047337
Salamence 0.070112407 0.072867631 0.079579115
Skarmory 0.06250782 0.067108663 0.067332918
Tyranitar 0.086097437 0.083741582 0.071879498
Weavile 0.063503833 0.058817938 0.057868059
Pokemon October November December
-----------------------------------------------
Blissey 0.672782599 0.541069822 0.363901763
Bronzong 0.581749484 0.430581009 0.265206259
Cresselia 0.582451394 0.436697359 0.257060362
Garchomp 0.656185148 0.503182422 0.329058746
Gengar 0.623092434 0.468672139 0.294979032
Gyarados 0.629387007 0.449680307 0.27638517
Heracross 0.598447825 0.391169197 0.241674503
Infernape 0.581949506 0.442045778 0.253705685
Metagross 0.593675436 0.413994993 0.245045581
Salamence 0.601958341 0.445147436 0.282097705
Skarmory 0.588903212 0.433964941 0.259485873
Tyranitar 0.626038774 0.464697691 0.268103522
Weavile 0.590683901 0.41663679 0.240557809
Pokemon Overall
---------------------
Blissey 0.132468379
Bronzong 0.06643159
Cresselia 0.065384585
Garchomp 0.108648891
Gengar 0.086141566
Gyarados 0.078223344
Heracross 0.056574637
Infernape 0.065265362
Metagross 0.060226974
Salamence 0.075590961
Skarmory 0.066315078
Tyranitar 0.077996347
Weavile 0.059201432
Pokemon Overall
---------------------
Blissey 0.132468379
Garchomp 0.108648891
Gengar 0.086141566
Gyarados 0.078223344
Tyranitar 0.077996347
Salamence 0.075590961
Bronzong 0.06643159
Skarmory 0.066315078
Cresselia 0.065384585
Infernape 0.065265362
Metagross 0.060226974
Weavile 0.059201432
Heracross 0.056574637
Pokemon Cumulative
---------------------
Blissey 0.132468379
Garchomp 0.241117269
Gengar 0.327258835
Gyarados 0.405482179
Tyranitar 0.483478526
Salamence 0.559069487
Bronzong 0.625501077
Skarmory 0.691816156
Cresselia 0.75720074
Infernape 0.822466102
Metagross 0.882693076
Weavile 0.941894508
Heracross 0.998469144
# Mapping of {access_id: weighted points}
usage = defaultdict(int)
# How much to weigh each month. The len of this sequence is how many months are expected.
month_weights = [0.5, 0.30901699437494745, 0.19098300562505255]
for month_weight in month_weights:
for pokemon, points in read_part():
usage[pokemon] += points * month_weight
total = sum(usage.itervalues())
# Sort by weight descending.
usage = sorted(usage.iteritems(), key=itemgetter(1), reverse=True)
cumulative_frequency = 0
ou, uu = [], []
for pokemon, weighted_points in usage:
cumulative_frequency += weighted_points / total
if cumulative_frequency <= 0.75:
ou.append(pokemon)
else:
uu.append(pokemon)
Pity I'm receiving such an interesting post this late. :(X-Act:
You know, if the granularity of your stats are whole months, I don't see why you weigh in previous months. If you had daily stats, maybe it would make sense to do it, but over a month you absorb pretty much all variability that needs to be accounted for. Unless you can show that it really improves the tiering, I suggest to cut them off and just use the last month. It will simplify the math. Also, I don't think that using the arithmetic mean on an exponential distribution will give you the intended results: even with lesser weights for previous months, the highest value will absorb too much of the mean and the ranks will change at a snail's pace. I advise using the geometric mean instead.
Edit: you need to normalize the points by the total of the month before you add them up: if for some reason (downtime?) the activity of a month drops to half of what it was the month before, the previous month will count more than the current month...
It would be cool if the tiers could be figured out by clustering (if there were big jumps in the usage statistics). Alas, I plotted the statistics in logspace and it's pretty much a line. There's no place where you could say "ok we cut there". There's no natural clustering of the data, so to cut at 0.75 seems a bit arbitrary.
I'd be interested in trying to cluster the data using more complete statistics: amount of users for pokemon x, amount of kills, longevity of the pokemon, co-occurrence statistics, etc.
def calculate(tierfile, output):
"""
Calculate the correct tiers from `tierfile` and write them in SCMS format to `output`.
`tierfile` should consist of the weighted usages copied and pasted
from http://shoddybattle.com/stats for the last 3 months.
The format is as follows:
<Month 1>
-
<Month 2>
-
<Month 3>
Note: The - is a literal -. It separates the logs from each other.
"""
from collections import defaultdict
from operator import itemgetter
if isinstance(tierfile, basestring):
tierfile = open(tierfile)
# Mapping of {access_id: weighted points}
usage = defaultdict(lambda: 1)
# How much to weigh each month. The len of this sequence is how many months are expected.
month_weights = [0.5, 0.30901699437494745, 0.19098300562505255]
for month_weight in month_weights:
month_total = 0
month_pokemon = {}
# Collection all the Pokemon and their points and accumulate a total.
for pokemon, points in read_part(tierfile):
month_pokemon[pokemon] = points
month_total += points
for pokemon, points in month_pokemon.iteritems():
# Normalize the points by dividing by the total, and then raise to the power of the month weight.
# Accumuluate the product of the months in usage.
usage[pokemon] *= (points / month_total) ** month_weight
# Sort by weight descending.
sorted_usage = sorted(usage.iteritems(), key=itemgetter(1), reverse=True)
cumulative_frequency = 0
ou, uu = [], []
for pokemon, weighted_points in sorted_usage:
cumulative_frequency += weighted_points
if cumulative_frequency <= 0.75:
ou.append(pokemon)
else:
uu.append(pokemon)
# Merge with current tier list? I don't know what we're doing here.
# Write result to `output`
[Uber]
arceus
darkrai
deoxys
deoxys-a
deoxys-d
deoxys-s
dialga
giratina
groudon
ho-oh
kyogre
latias
latios
lugia
manaphy
mew
mewtwo
palkia
rayquaza
wobbuffet
[OU]
<empty>
[BL]
abomasnow
aerodactyl
alakazam
ambipom
arcanine
articuno
azelf
azumarill
blaziken
blissey
breloom
bronzong
charizard
celebi
cresselia
crobat
donphan
drapion
dragonite
dugtrio
dusknoir
electivire
empoleon
entei
espeon
exeggutor
feraligatr
floatzel
flygon
forretress
gallade
garchomp
gardevoir
gengar
gliscor
gyarados
hariyama
heatran
heracross
hippowdon
honchkrow
houndoom
infernape
jirachi
jolteon
jynx
kingdra
leafeon
lickilicky
lucario
ludicolo
machamp
magmortar
magnezone
mamoswine
marowak
medicham
mesprit
metagross
milotic
miltank
mismagius
moltres
ninjask
porygon2
porygon-z
raikou
rampardos
regice
regigigas
regirock
registeel
rhyperior
roserade
salamence
sceptile
scizor
skarmory
slaking
slowbro
slowking
shaymin
shedinja
smeargle
snorlax
spiritomb
staraptor
starmie
steelix
suicune
swampert
tangrowth
tauros
togekiss
torterra
typhlosion
tyranitar
umbreon
ursaring
uxie
vaporeon
venusaur
weavile
weezing
yanmega
zangoose
zapdos
[UU]
absol
aggron
altaria
ampharos
arbok
ariados
armaldo
banette
bastiodon
beautifly
beedrill
bellossom
bibarel
blastoise
butterfree
cacturne
camerupt
carnivine
castform
chatot
cherrim
chimecho
claydol
clefable
cloyster
corsola
cradily
crawdaunt
delcatty
delibird
dewgong
ditto
dodrio
drifblim
dunsparce
dustox
electrode
exploud
farfetchd
fearow
flareon
froslass
furret
gastrodon
girafarig
glaceon
glalie
golduck
golem
gorebyss
granbull
grumpig
hitmonchan
hitmonlee
hitmontop
huntail
hypno
illumise
jumpluff
kabutops
kangaskhan
kecleon
kingler
kricketune
lanturn
lapras
ledian
linoone
lopunny
lumineon
lunatone
luvdisc
luxray
magcargo
manectric
mantine
masquerain
mawile
meganium
mightyena
minun
mothim
mr_mime
muk
nidoking
nidoqueen
ninetales
noctowl
octillery
omastar
pachirisu
parasect
pelipper
persian
pidgeot
pikachu
pinsir
plusle
politoed
poliwrath
primeape
probopass
purugly
quagsire
qwilfish
raichu
rapidash
raticate
relicanth
rotom
sableye
sandslash
scyther
seaking
seviper
sharpedo
shiftry
shuckle
skuntank
solrock
spinda
stantler
sudowoodo
sunflora
swalot
swellow
tentacruel
torkoal
toxicroak
tropius
unown
venomoth
vespiquen
victreebel
vileplume
volbeat
wailord
walrein
whiscash
wigglytuff
wormadam
wormadam-g
wormadam-s
xatu
[NU]
[Limbo]
[NFE]
.
.
.