Wobbuffet, Deoxys-S, and the Smogon Tier list

obi · Mar 9, 2008

When are they going to be moved to OU, assuming they are?

If the tier lists are descriptive (that is, they explain things as people play them), then it wouldn't make sense not to do that, as that has been pretty much firmly ingrained into Shoddy.

If they are prescriptive (that is, they explain how things ought to be), then we'll need to really consider what makes something uber. All the data suggests that neither one is really deserving of being uber.

Mekkah · Mar 10, 2008

Indeed, if they are prescriptive, we need to fix the uber definition topic further. I agree. The same goes for whatever comes out of a potential Garchomp test really.

X-Act · Mar 10, 2008

As I said, I was going to update the tier list come 1st April. I was also going to change the method of choosing the OU list, based on a predictive approach. The current system only looks at what were the usages of the Pokemon and not on what they are predicted to be. Also, I was never completely satisfied with the 75% cut-off point, and wanted to find a more meaningful one. I was researching this during this past weekend, which is why I barely appeared around here, and was going to propose this new method when I put all the math stuff in place.

Deoxys-S would barely end up in OU if you look at the February usage by the way (using the old system).

jrrrrrrr · Mar 11, 2008

X-Act said:
Deoxys-S would barely end up in OU if you look at the February usage by the way (using the old system).

Many people would argue that Deoxys-S and Wobbuffet are lower in usage than they should be because Smogon has yet to take an "official" stance on their tier status. Leaving them in the Uber tier on Smogon really detracts a lot of people from using them (whether or not smogonites actually think they are Uber), and there is really no way of accounting for this with usage numbers alone.

I've heard "Why should I bother making a ladder team with D-S and/or Wobby if I can't even use it to practice for Smogon tourneys?" from multiple people and I have a hard time arguing with them.

If the update was already planned to happen on April 1, then you should keep that deadline. An extra month of statistics can't really hurt, and it would give an adequate amount of time for the debates about d-s/wobby/garchomp to settle down.

X-Act said:
I was also going to change the method of choosing the OU list, based on a predictive approach. The current system only looks at what were the usages of the Pokemon and not on what they are predicted to be.

I'm not sure what you had in mind, but this sounds like a bad idea to me based on this very breif description. OU is by definition the most used pokemon, predictions really shouldn't have too much of a sway in determining their tier status. Remember the Rhyperior and Electivire hype? I'm not saying that these things can't be useful, I was just giving an example of a situation where using "theorymon" would wind up hurting the credibility and accuracy of the list (as I feel using predictions to determine tier status would in this case).

Aldaron · Mar 13, 2008

He isn't using theorymon at all to predict, he is using a statistically-sound algorithm.

X-Act's point is that because we use statistics from previous months to determine what is now OU, we actually have outdated tier lists, which I agree with 100%.

His prediction method, from what I read in his topic, seemed to be fine.

As for Wobbuffet and Deoxys-S. I am a supporter of a descriptive tier list, so I advocate the shift of Wobbuffet and Deoxys-S into OU.

chaos · Mar 13, 2008

So uh... do we actually accept Deoxys-S and Wobbuffet as non-uber? I'm not really interested in what Shoddy considers non-uber. The tier list has an effect on our tournaments and such.

obi · Mar 13, 2008

I personally do. So far, all tests have shown them to not be uber (in fact, Deoxys-S is almost certainly going to be BL next month).

Aldaron · Mar 13, 2008

I understand that Smogon has Wobbuffet and Deoxys-S as Uber.

However, if the tier lists are truly descriptive (as I believe them to be), then we have to use whatever data best describes whatever we are judging, which would currently be the usage statistics from ladder battles on Shoddy. What I am interpreting from these statistics is that both Wobbuffet and Deoxys-S should be moved down to OU.

However, if as Obi said, the tier lists are in fact prescriptive, then another topic needs to be made that establishes our definition of Uber.

All I'm saying is that in the perspective that tier lists are descriptive, we should use the data available to us, which points towards making both Wobbuffet and Deoxys-S OU.

Jumpman16 · Mar 13, 2008

chaos said:
So uh... do we actually accept Deoxys-S and Wobbuffet as non-uber? I'm not really interested in what Shoddy considers non-uber. The tier list has an effect on our tournaments and such.

People like Obi, AA and myself have, to put it cutely, been whispering in Colin's ear about this stuff for a long time, so it's hardly been 100% Shoddy's thinking behind it. If I'm wrong about that, Colin will correct me in this thread!

obi · Mar 13, 2008

Aldaron said:
However, if as Obi said, the tier lists are in fact prescriptive, then another topic needs to be made that establishes our definition of Uber.

http://www.smogon.com/forums/showthread.php?t=37401

Hipmonlee · Mar 14, 2008

I think we have to allow them if the stats support it. After ST4 is the logical time to make the change IMO.

But once it is allowed in tournaments we need to look at what effect they have in that situation. I think there are differences between the style of play in tournaments than on the ladder, and if something is imballancing in either situation it needs to be fixed.

[edit] - having read some stuff misty was saying, I am gonna clear things up.
Ok, we really have 3 options.
1st, we could say "its all too much effort, what we have is fine" and leave it at that. Since this forum exists I assume we can disregard that option.
2nd, we could theorymon it all out, and decide on the best metagame for all concerned. The one thing we all seem to agree on is the fact that we will never agree on a tierlist, so I dont see how that could possibly work.
Or 3rd, we can test.

The problem with testing is I dont think we are gonna be able to come to objective conclusions based on statistics. Like, in ladderbot matches an extremely conservative style is preferable for high level players, because the overwhelming majority of people you are going to play are going to be considerably worse than you at pokemon. I think this sort of thing is what leads to most of my disagreements with Jumpman and obi about tiers and stuff, I always thought of them as people who tried very hard to minimise their chances of losing to poor players. Jump with his use of HP Rock TTar in advance, because against poor players he isnt often in the situation of relying on a flinch to win. Whereas I have always tried my best to do everything I can to maximise my odds of beating other top level players. I like inaccurate moves because often the surprise factor can give me an edge in a tight match.

I think my style of play is more geared toward tournaments, and isnt as effective on something like ladderbot.

So you need to look at both these styles and what effect it has in both cases. This is where a lot of the subjectivity seems to come from in my view.

I think the rankings on Shoddy need tweaking. I think the conservative estimate should be less conservative, and the k value should be lower. I dont know what the K value is set at now, but it definitely needs to be way way lower than it would be in chess. I mean, these are just my impressions, I dont know what the settings are exactly at the moment, but it is important we get this sort of thing right.

I think I have gotten sidetracked here. The point is, if we agree we want testing, then we have to go through with it properly.

To be honest, I have changed my mind. Here is what I think ought to happen. We need to come up with a testing strategy. I mean we are talking about allowing Lati@s and banning Garchomp. We simply cant do that much testing, it is ridiculous. So here is my new improved philosophy. We argue our way to what we feel are our best options, and test them all, and decide which is the best. and by options, I am talking about complete rulesets. We say we will test for this year, with 2 months per ruleset or something. We should try and come up with our best 5 rulesets, other than the one we have been testing for the last 8 months or so..

Have a nice day.

Cathy · Mar 14, 2008

I don't think the k value in the CRE is comparable to the k in Elo but I may be totally wrong here.

The rating isn't too conservative. The arguable problem is that the rating period is too short. Right now, you are expected to play 5-7 matches per day, but statistics I just aggregated the other day show that the average person just plays 1.5 matches per day, so the deviation will seem to increase far too much for the average person. Perhaps a rating period of three days is more appropriate, but less trivial to implement (one day can be done without any external state, but with three days it will take a bit of effort to make sure it always runs -- not particularly difficult, but it's the reason I haven't been experimenting, since it would take effort). After collecting those statistics, though, I'm convinced the rating period should be longer.

For reference, the three constants involved in the ladder rating system are:
k = 4
rating period length = 1 day
system constant = 1.2

Dragontamer · Mar 15, 2008

Elo doesn't have conservative ratings. "K" in Elo is comparable to the RD in Glicko. Basically, it is a value that is automatically computed in Glicko, meaning there is no direct control over that variable.

Colin: do you understand the mathematics behind the System Constant? Because I don't, beyond Glickman's recommendation of "between .3 and 1.2", I don't know anything about that value. While longer periods will help, we might have to change the System Constant. Considering that the System Constant is used in calculating Volatility, and that volatility is used to calculate RD (which is Glicko's "K"), that may be the root of the problem.

I do understand _how_ it is used of course... ultimately 1/system^2 is multiplied with another number in the iterative loop. Meaning we are changing a value from between 70% to ~1100% of its value multiple times in some loop (based on the 1.2 to .3 values he recommends). At least on the surface, tweeking this value may change the volatility of players significantly.

To be honest, I have changed my mind. Here is what I think ought to happen. We need to come up with a testing strategy. I mean we are talking about allowing Lati@s and banning Garchomp. We simply cant do that much testing, it is ridiculous. So here is my new improved philosophy. We argue our way to what we feel are our best options, and test them all, and decide which is the best. and by options, I am talking about complete rulesets. We say we will test for this year, with 2 months per ruleset or something. We should try and come up with our best 5 rulesets, other than the one we have been testing for the last 8 months or so..

I agree. I have my own opinions on testing strategies to avoid, but I do think we should playtest the various metagames. I'm even inclined to playtest metagames and then come up with metrics. That is, use the metagame testing period to figure out what kind of metagame we really want.

jrrrrrrr · Mar 15, 2008

Aldaron said:
He isn't using theorymon at all to predict, he is using a statistically-sound algorithm.

X-Act's point is that because we use statistics from previous months to determine what is now OU, we actually have outdated tier lists, which I agree with 100%.

His prediction method, from what I read in his topic, seemed to be fine.

I understand this now that the topic is up. I was understandably skeptical at first before I read into it. I just wanted to be sure that the algorithm was as solid as it is before blindly supporting it. Since Smogon rules are generally considered standard, it deserves more than just a "hey, this works. Trust me." attitude (which it is getting with the advent of this board). gj X-Act for making the list algorithm.

Hipmonlee said:
I think we have to allow them if the stats support it. After ST4 is the logical time to make the change IMO.

But once it is allowed in tournaments we need to look at what effect they have in that situation. I think there are differences between the style of play in tournaments than on the ladder, and if something is imballancing in either situation it needs to be fixed.

I agree 100%

Aldaron · Mar 30, 2008

OK, I would just like to emphasize one point...We have not even come close to establishing an agreed upon definition of uber!!

How can we decide to ban (in the case of Garchomp) or unban (in the case of Deoxys-S and Wobbuffet) if we haven't even explicitly defined uber?

I am 100%, absolutely against us unbanning Wobbuffet or Deoxys-S in April, regardless of what the "Statistics" show us. Who cares about them when we haven't even determined if the "overcentralization" argument is the viable one? On that note, we haven't even decided what "overcentralization" is, quantitatively speaking. Is 50 OU Pokemon centralized? Is 100? Is 20? If so...why?

I only support the unbanning of Wobbuffet and Deoxys-S if two conditions are met:

1.) We decide whether the lists are descriptive or prescriptive.

2.) That we do declare that statistics are the absolute judge of OU -> Uber and Uber -> OU, and we establish EVERYTHING necessary for the equation, like determining what number of Pokemon in OU means centralized.

Until that happens, I don't see how we can give reasons for unbanning Wobbuffet or Deoxys-S.

X-Act · Mar 30, 2008

Yeah this is actually a slight concern for me lol. 1st April is two days from now, and if I want to update the OU tier, I might as well include or exclude Wobbuffet, Deoxys-S and Garchomp. However, I'm not that worried about this, since I can always modify the tiers normally and then drop or add stuff when we decide on Wobbuffet/Deoxys/Garchomp/Latios/Latias later.

jrrrrrrr · Mar 30, 2008

Aldaron said:
Until that happens, I don't see how we can give reasons for unbanning Wobbuffet or Deoxys-S.

Because ShoddyBattle, where almost all competitive battling relative to Smogon occurs, allows them as OU. The standard game that pretty much everyone plays allows them with little to no evidence that their unbanning is harmful to the game.

They've both failed their shoddy tests, they are now OU there. Exactly what evidence is there to keep them banned ?___?

It's better to unban Wobba and D-S now while we have the chance, since that seems to be not only the majority opinion, but the safe one as well. If a legitimate reason ever pops up, we can always just change it back, as X-Act said.

Aldaron · Mar 30, 2008

Uhh, that's not what I was referring to. I'm not saying anything about the amount of evidence.

I'm asking what relevance this "evidence" has. The "evidence" that the grand masters of Shoddy have provided us is statistical evidence pertaining only to the "overcentralization" argument.

Well that's all well and fine...but I'm asking how that is relevant at all if we haven't even come to an established definition of uber.

jrrrrrrr · Mar 30, 2008

If we don't have a definition of uber then how can we make anything uber? With the way you are wording it, it would follow that nothing should be banned. We should be promoting a diverse game, and unbanning Wobb and D-S has not proven to make the game we have any more unbalanced than it was before their unbanning. As X-Act said, the list can easily be changed later on if something actually comes up to prove them as "broken".

I'm confused as to why you are against their unbanning. You are using a bunch of fancy words that do not actually say anything. Yes, we haven't established a clear definition of "uber", but you can not say that there aren't "uber" pokemon. As with the Lati@s issue, the Uber list is pretty much on a case by case basis. There is no clear-cut definition of uber and there never will be. The usage statistics list is pretty much all we need. Wobb and D-S have both proven to not have enough sway in the game to relegate them back to a different tier. If they do, let's send them back. Until then, I don't see the issue here.

If I'm wrong please let me know, I might not be seeing something here that is apparently obvious to the people who are against it.

Aldaron · Mar 30, 2008

jrrrrrrr said:
If we don't have a definition of uber then how can we make anything uber?

heh, that's my point exactly; we do NOT have a definition, and right now any designations of uber or not uber are purely arbitrary.

With the way you are wording it, it would follow that nothing should be banned. We should be promoting a diverse game, and unbanning Wobb and D-S has not proven to make the game we have any more unbalanced than it was before their unbanning. As X-Act said, the list can easily be changed later on if something actually comes up to prove them as "broken".

All I'm saying is that we have our status quo (which may or may not be perfect, I'm not even commenting on that), and that we shouldn't disturb the status quo unless we have solid reasoning and an agreed upon method. The reason seems solid enough; we wish to objectify the tier lists. OK, but what about the next part? We certainly have no reached an agreed upon definition of uber.

I'm confused as to why you are against their unbanning.

Again, I am not against their unbanning from Uber to OU, but against unbanning anything until we have an established definition of uber. We shouldn't alter the status quo until we do.

You are using a bunch of fancy words that do not actually say anything.

Uh, what lol?

Yes, we haven't established a clear definition of "uber", but you can not say that there aren't "uber" pokemon. As with the Lati@s issue, the Uber list is pretty much on a case by case basis. There is no clear-cut definition of uber and there never will be. The usage statistics list is pretty much all we need. Wobb and D-S have both proven to not have enough sway in the game to relegate them back to a different tier. If they do, let's send them back. Until then, I don't see the issue here.

Uh, what? I agree that it could be a case by case basis. As for your comment about the statistics being enough...that is only if we agree that the overcentralization argument is what is relevant. And if we can't establish a definition of uber...then how the hell do you plan on objectifying the process lol?

jrrrrrrr · Mar 30, 2008

Ah, I was misunderstanding you. I had a feeling I knew what you were getting at, but the way you said it was backwards. Unbanning wobb and d-s isnt unfair. The unfairness happened before d/p even came out. They were banned for essentially no reason, other than the people responsible for the tier lists (read: everyone) simply copy/pasted the uber list from adv into d/p without any discussion.

While I am all for shaking the game up and making it more diverse, I agree with your sentiment that this was kind of done swiftly without any discussion.

I also agree that the uber list has to be descriptive of how people actually play. If we do not accept theorymon arguments as legitimate, why are we doing it for our tier lists?

With that said, it is probably best that we "start over" and determine tier status of every current OU and Uber pokemon, to actually decide on a tier list that people can agree on...which we thankfully have this forum for. For now, I feel that it is best that we leave them unbanned.

obi · Mar 30, 2008

Aldaron said:
All I'm saying is that we have our status quo (which may or may not be perfect, I'm not even commenting on that), and that we shouldn't disturb the status quo unless we have solid reasoning and an agreed upon method.

If the only argument for keeping them banned is maintain the status quo, then I only need to provide one reason for them to be unbanned.

All things being equal, there should be as few banned Pokemon as possible.

Using this metric, there is no evidence to support that unbanning them has lead to any harmful effect, and thus, they should be unbanned on our tiers.

Hipmonlee · Mar 30, 2008

Well I'm not entirely sure that's right. Changing will always have a negative effect of making people change teams blah blah, you have to provide a reason that outweighs that..

Also it seems to me that having similar rules to advance makes things a lot simpler than just having as few rules as possible.

Have a nice day.

Cathy · Mar 30, 2008

I'm not convinced that really should be considered. But even if it should be, Wobbuffet and Deoxys-S have both been unbanned over two months on the Shoddy Battle ladder now, so the number of people who don't have a team for that environment is quite small--in other words the inconvenience described has already been overcome.

Aldaron · Mar 31, 2008

No, what does "quite small" mean? What quantitative proof can you offer that the inconvenience described has already been overcome?

I don't want to make decisions based merely off of your words or beliefs.

Wobbuffet, Deoxys-S, and the Smogon Tier list

formerly david stone

np: Biffy Clyro - Shock Shock

wubwubwub

geriatric

formerly david stone

geriatric

np: Michael Jackson - "Mon in the Mirror" (DW mix)

formerly david stone

Have a nice day

Banned deucer.

wubwubwub

geriatric

np: Biffy Clyro - Shock Shock

wubwubwub

geriatric

wubwubwub

geriatric

wubwubwub

formerly david stone

Have a nice day

Banned deucer.

geriatric

Users Who Are Viewing This Thread (Users: 1, Guests: 0)