[原创]stata数据整理

这里只有作者精心编写的研究经历!
回复
头像
hellohappy
网站管理员
网站管理员
帖子: 280
注册时间: 2018年11月18日, 14:27
Been thanked: 2 time

#1 [原创]stata数据整理

未读文章 hellohappy » 2019年6月03日, 22:19

前言:
    stata导入数据以后,可能还不符合你的预期,这里我整理一下最基础的几个stata命令,rename 重命名、order 更改变量次序、sort 更改变量个体次序、laber 给变量添加标签。

命令介绍:

    rename:

        rename命令用于重命名变量,比如,我要将变量名 aaa 更换成变量名 abc ,命令为

Code: 全选

rename aaa abc
        其他用法请直接查看help文件rename命令的help文件如下:
help文件
Show
Title

    [D] rename -- Rename variable

Syntax

        rename old_varname new_varname

Menu

    Data > Data utilities > Rename groups of variables

Description

    rename changes the name of an existing variable old_varname to new_varname; the contents of the variable are unchanged.  Also see [D] rename group for renaming groups of variables.

Examples

    Setup
        . webuse renamexmpl
        . describe

    Change name of exp to experience and change name of inc to income
        . rename exp experience
        . rename inc income

    Describe the data
        . describe

    order:

        order命令用于更改变量次序,比如,我们通常会把面板数据的时间和个体维度变量放在前面,执行命令

Code: 全选

 order year id
        就会把 year 和 id 这两个变量放在所有变量的前面,这主要是方便我们查看变量视图,它不会影响其他运算结果。注意,这里的year 和 id 是变量名,这里只是举例。
        其他用法请直接查看help文件rename命令的help文件如下:
help文件
Show
Title

    [D] order -- Reorder variables in dataset

Syntax

        order varlist [, options]

    options           Description
    --------------------------------------------------------------------------------------------------------------------
    first             move varlist to beginning of dataset; the default
    last              move varlist to end of dataset
    before(varname)   move varlist before varname
    after(varname)    move varlist after varname
    alphabetic        alphabetize varlist and move it to beginning of dataset
    sequential        alphabetize varlist keeping numbers sequential and move it to beginning of dataset
    --------------------------------------------------------------------------------------------------------------------

Menu

    Data > Data utilities > Change order of variables

Description

    order relocates varlist to a position depending on which option you specify.  If no option is specified, order relocates varlist to the beginning of the dataset in the order in which the variables are specified.

Options

    first shifts varlist to the beginning of the dataset.  This is the default.

    last shifts varlist to the end of the dataset.

    before(varname) shifts varlist before varname.

    after(varname) shifts varlist after varname.

    alphabetic alphabetizes varlist and moves it to the beginning of the dataset.  For example, here is a varlist in alphabetic order:  a x7 x70 x8 x80 z.  If combined with another option, alphabetic just alphabetizes varlist, and the movement of varlist is controlled by the other option.

    sequential alphabetizes varlist, keeping variables with the same ordered letters but with differing appended numbers in sequential order.  varlist is moved to the beginning of the dataset.  For example, here is a varlist in sequential order: a x7 x8 x70 x80 z.

Examples

    Setup
        . webuse auto4

    Describe the dataset
        . describe

    Move make and mpg to the beginning of the dataset
        . order make mpg

    Describe the dataset
        . describe

    Make length be the last variable in the dataset
        . order length, last

    Describe the dataset
        . describe

    Make weight be the third variable in the dataset
        . order weight, before(price)

    Describe the dataset
        . describe

    Alphabetize the variables
        . order _all, alphabetic

    Describe the dataset
        . describe

    sort:

        sort命令主要用于更改数据个体次序,也就是变量排序。比如,

Code: 全选

sort year id
        就是按时间year和个体id排序。

Code: 全选

sort id year 
        就把数据按照个体id和时间year排序。注意,这里的year 和 id 是变量名,这里只是举例。当然你也可以sort一个变量,这样就是按照这个变量的大小或字符次序排序。
        其他用法请直接查看help文件rename命令的help文件如下:
help文件
Show
Title

    [D] sort -- Sort data

Syntax

        sort varlist [in] [, stable]

Menu

    Data > Sort

Description

    sort arranges the observations of the current data into ascending order based on the values of the variables in varlist.  There is no limit to the number of variables in the varlist.  Missing numeric values (see missing) are interpreted as being larger than any other number, so they are placed last with . < .a < .b < ... < .z.  When you sort on a string variable, however, null strings are placed first and uppercase letters come before lowercase letters.

    The dataset is marked as being sorted by varlist unless in range is specified.  If in range is specified, only those observations are rearranged.  The unspecified observations remain in the same place.

Option

    stable specifies that observations with the same values of the variables in varlist keep the same relative order in the sorted data that they had previously.  For instance, consider the following data:
stata的sort命令help文件1.png
stata的sort命令help文件1.png (1.25 KiB) 查看 355 次
stata的sort命令help文件1.png
stata的sort命令help文件1.png (1.25 KiB) 查看 355 次
        Typing sort x without the stable option produces one of the following six orderings.
stata的sort命令help文件2.png
stata的sort命令help文件2.png (3.3 KiB) 查看 355 次
stata的sort命令help文件2.png
stata的sort命令help文件2.png (3.3 KiB) 查看 355 次

        Without the stable option, the ordering of observations with equal values of varlist is randomized.  With sort x, stable, you will always get the first ordering and never the other five.

        If your intent is to have the observations sorted first on x and then on b within tied values of x (the fourth ordering above), you should type sort x b rather than sort x, stable.

        stable is seldom used, and, when specified, causes sort to execute more slowly.

Examples

    Setup
        . sysuse auto
        . keep make mpg weight

    Arrange observations into ascending order based on the values of mpg
        . sort mpg

    Same as above, but for observations with the same values of mpg, keep them in the same relative order in the sorted data as they had previously
        . sort mpg, stable

    List the 5 cars with the lowest mpg
        . list make mpg in 1/5

    List the 5 cars with the highest mpg
        . list make mpg in -5/L

    Arrange observations into ascending order based on the values of mpg, and within each mpg category arrange observations into ascending order based on the values of weight
        . sort mpg weight

    List the 8 cars with the lowest mpg, and within each mpg category with the lowest weight
        . list in 1/8

    Arrange observations into alphabetical order based on the value of make
        . sort make

    For most purposes, this method of sorting is sufficient.  It is possible to override Stata's sort logic.  See [ U ]12.4.2.5 Sorting strings containing Unicode characters for information about ordering strings in a language-sensitive way.  We do not recommend that you do this.

    label:

        label命令主要是用于给 数据集、变量、变量的值,这三种东西贴标签。贴了标签以后,你的回归结果可以用标签替代变量名,你的数据展示表格或者图片,也可以用标签替代变量名,这有利于阅读。这里举例只举最简单的,给变量贴标签,比如要给year 变量贴标签 “从1980到2019年年末” 则使用命令

Code: 全选

 label variable year "从1980到2019年年末"

        就给year变量贴上了标签。
        其他用法请直接查看help文件rename命令的help文件如下:
help文件
Show
Title

    [D] label -- Manipulate labels

Syntax

    Label dataset
        label data ["label"]

    Label variable
        label variable varname ["label"]

    Define value label
        label define lblname # "label" [# "label" ...] [, add modify replace nofix]

    Assign value label to variables
        label values varlist [lblname|.] [, nofix]

    List names of value labels
        label dir

    List names and contents of value labels
        label list [lblname [lblname ...]]

    Copy value labels
        label copy lblname lblname [, replace]

    Drop value labels
        label drop {lblname [lblname ...] | _all}

    Save value labels in do-file
        label save [lblname [lblname...]] using filename [, replace]

    Labels for variables and values in multiple languages
        label language ...    (see [ D ] label language)  

    where # is an integer or an extended missing value (.a, .b, ..., .z).

Menu

    label data 
        Data > Data utilities > Label utilities > Label dataset

    label variable
        Data > Variables Manager

    label define
        Data > Variables Manager

    label values
        Data > Variables Manager

    label list
        Data > Data utilities > Label utilities > List value labels

    label copy
        Data > Data utilities > Label utilities > Copy value labels

    label drop
        Data > Variables Manager

    label save
        Data > Data utilities > Label utilities > Save value labels as do-file

Description

    label data attaches a label (up to 80 characters) to the dataset in memory.  Dataset labels are displayed when you use the dataset and when you describe it.  If no label is specified, any existing label is removed.

    label variable attaches a label (up to 80 characters) to a variable.  If no label is specified, any existing variable label is removed.

    label define defines a list of up to 65,536 (1,000 for Small Stata) associations of integers and text called value labels.  Value labels are attached to variables by label values.

    label values attaches a value label to varlist.  If . is specified instead of lblname, any existing value label is detached from that varlist.  The value label, however, is not deleted.  The syntax label values varname (that is, nothing following the varname) acts the same as specifying the ..  Value labels may be up to 32,000 characters long.

    label dir lists the names of value labels stored in memory.

    label list lists the names and contents of value labels stored in memory.

    label copy makes a copy of an existing value label.

    label drop eliminates value labels.

    label save saves value labels in a do-file.  This is particularly useful for value labels that are not attached to a variable because these labels are not saved with the data.

    See [ D ] label language for information on the label language command.

Options

    add allows you to add # to label correspondences to lblname.  If add is not specified, you may create only new lblnames.  If add is specified, you may create new lblnames or add new entries to existing lblnames.

    modify allows you to modify or delete existing # to label correspondences and add new correspondences.  Specifying modify implies add, even if you do not type the add option.

    replace, with label define, allows an existing value label to be redefined.  replace, with label copy, allows an existing value label to be copied over.  replace, with label save, allows filename to be replaced.

    nofix prevents display formats from being widened according to the maximum length of the value label.  Consider label values myvar mylab, and say that myvar has a %9.0g display format right now.  Say that the maximum length of the strings in mylab is 12 characters.  label values would change the format of myvar from %9.0g to %12.0g.  nofix prevents this.

        nofix is also allowed with label define, but it is relevant only when you are modifying an existing value label.  Without the nofix option, label define finds all the variables that use this value label and considers widening their display formats.  nofix prevents this.

Technical note

    Although we tend to show examples defining value labels using one command, such as
        . label define answ 1 yes 2 no

    remember that value labels may include many associations and typing them all on one line can be ungainly or impossible.  For instance, if perhaps we have an encoding of 1,000 places, we could imagine typing
        . label define fips 10060 "Anniston, AL" 10110 "Auburn, AL" 10175 "Bessemer, AL" ... 560050 "Cheyenne, WY"

    Even in an editor, we would be unlikely to type the line correctly.

    The easy way to enter long value labels is to enter the codings one at a time:
        . label define fips 10060 "Anniston, AL"
        . label define fips 10175 "Bessemer, AL", add
        ...
        . label define fips 560050 "Cheyenne, WY", add

    And, of course, we could abbreviate:
        . lab def fips 10060 "Anniston, AL"
        . lab def fips 10175 "Bessemer, AL", add

    Up to 65,536 associations are allowed.

Examples

    Setup
        . webuse hbp4

    Describe the dataset
        . describe

    Label the dataset
        . label data "fictional blood pressure data"

    Describe the dataset
        . describe

    Label the hbp variable
        . label variable hbp "high blood pressure"

    Define the value label yesno
        . label define yesno 0 "no" 1 "yes"

    List the names and contents of all value labels
        . label list

    List the name and contents of only the value label yesno
        . label list yesno

    List names of value labels
        . label dir

    Make a copy of the value label yesno
        . label copy yesno yesnomaybe

    Add another value and label to the value label yesnomaybe
        . label define yesnomaybe 2 "maybe", add

    List the name and contents of value label yesnomaybe
        . label list yesnomaybe

    Modify the label for the value 2 in value label yesnomaybe
        . label define yesnomaybe 2 "don't know", modify

    List the name and contents of value label yesnomaybe
        . label list yesnomaybe

    List the first 4 observations in the dataset
        . list in 1/4

    Attach the value label yesnomaybe to the variable hbp
        . label values hbp yesnomaybe

    List the first 4 observations in the dataset
        . list in 1/4

    Save the value label sexlbl to mylabel.do
        . label save sexlbl using mylabel

    List the contents of the file mlabel.do
        . type mylabel.do

    Drop the value label sexlbl from the dataset
        . label drop sexlbl

    List the names of value labels
        . label dir

    Run mylabel.do to retrieve the value label sexlbl
        . do mylabel

    List the names of value labels
        . label dir

Stored results

    label list stores the following in r():

    Scalars        
      r(k)                number of mapped values, including missing
      r(min)              minimum nonmissing value label
      r(max)              maximum nonmissing value label
      r(hasemiss)         1 if extended missing values labeled, 0 otherwise

    label dir stores the following in r():

    Macros         
      r(names)            names of value labels

Link:
Hide post links
Show post links


回复