博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
One Hot Encoding vs LabelEncoder?
阅读量:6626 次
发布时间:2019-06-25

本文共 1118 字,大约阅读时间需要 3 分钟。

hot3.png

There are some cases where LabelEncoder or DictVectorizor are useful, but these are quite limited in my opinion due to ordinality.

LabelEncoder can turn [dog,cat,dog,mouse,cat] into [1,2,1,3,2], but then the imposed ordinality means that the average of dog and mouse is cat. Still there are algorithms like decision trees and random forests that can work with categorical variables just fine and LabelEncoder can be used to store values using less disk space.

One-Hot-Encoding has a the advantage that the result is binary rather than ordinal and that everything sits in an orthogonal vector space. The disadvantage is that for high cardinality, the feature space can really blow up quickly and you start fighting with the curse of dimensionality. In these cases, I typically employ one-hot-encoding followed by PCA for dimensionality reduction. I find that the judicious combination of one-hot plus PCA can seldom be beat by other encoding schemes. PCA finds the linear overlap, so will naturally tend to group similar features into the same feature.

Hope this helps!

转载于:https://my.oschina.net/tantexian/blog/1922828

你可能感兴趣的文章
Redis操作hash
查看>>
ubuntu使sudo不需要密码
查看>>
How to pass in/out return VB Byte array from a COM Component written in C#
查看>>
轻松搞定个人虚拟桌面部署之5-在客户端测试远程桌面
查看>>
Linux中chkconfig使用介绍
查看>>
二进制方式快速安装MySQL数据库
查看>>
Centos5上部署udev
查看>>
挑战WORD极限排版之模板与加载项
查看>>
Tomcat配置多数据源
查看>>
(转)快速搭建PHP开发环境WAMP+ZendStudio+ZendDebugger
查看>>
js string format
查看>>
httpHandlers和httpModules接口介绍 (3)
查看>>
18、C++ Primer 4th 笔记,复制控制
查看>>
《大话数据结构》第9章 排序 9.1 开场白
查看>>
Xgcalendar 新增Php demo
查看>>
poj2774
查看>>
xsi插件的安装方法
查看>>
查询指定库中所有表
查看>>
黄聪:用php判断当前用户访问网站是否为手机登录
查看>>
Flash AS3 Loader的一些总结
查看>>